[Update & Disclaimer - September 2008]
The information contained on this page has been kept here for historical purposes. Those wishing to implement functionality similar to the THALI-Tag cloud should investigate using the Diverse Group Tag Cloud plugin.
[/Update & Disclaimer]
THALI-Tags is a project I started thinking about after a discussion with Kathryn Greenhill earlier in the year. She tells me she got the idea after a reading a post by Dave Pattern on his blog. A few short weeks ago I found the time to get started on the project. You can now see it in action on the librariesinteract.info website. The purpose of the THALI-Tags project is two fold:
- Gather the RSS feeds from all of the participating members of the THALI and derive tags for each post; and
- Display those tags in a tag cloud.
Update [27/09/2007]
A new version of the THALI-Tags website went live yesterday. I’ve updated the documentation here to reflect the new changes.
Update [06/10/2007]
The code is now available under a GPL licence and can be downloaded here.
Step 1: Gathering the RSS feeds and deriving tags
There are a variety of ways that this step could have been achieved. The easiest was with the use of the popular Yahoo! Pipes website. The website allows you to create a program to manipulate data retrieved from websites, in this case via RSS feeds, in a graphical and easy to use way. These programs are called pipes. The pipe I created for the THALI-Tags project looks like this:
![]()
(Click on the thumbnail for the larger view)
The pipe consists of five stages, moving from top left, to bottom right and following the pipe between modules:
- Gather all of the feeds from those THALI members that wish to participate. This happens in the “Fetch Feed” module;
- The aggregated feeds are passed into a “Loop” module. The loop module iterates over each item in the aggregated feed passing in the content of each item, the item.description field, into the “Term Extractor” module. This module determines the few most significant words, or phrases, in the text passed into it. The results of the analysis are the basis for the tags;
- The aggregated feeds are then passed into a “Rename” module. This module takes the output from the term extractor module and copies it to the “term_extracted” feed. This was the only way I could get the output of the term extractor module into my feed. Otherwise all I would see was [Array] in the output where I expected the terms to be;
- The aggregated feeds are then passed into another “Loop” module. This loop module iterates over each item and uses the “Item Builder” module to create items with the minimalistic items that I use to create the THALI-Tag cloud;
- The final module is the “Pipe Output” module. This module is the end of the pipe.
The result of the pipe is an RSS feed that contains an item for each item aggregated by the Pipe. Each output item contains:
- The title of the post;
- The URL to the post; and
- The results of the term extraction analysis.
The second step is to take this RSS feed and produce a tag cloud.Step 2: Display the tag cloud
The construction of the tag cloud is achieved using a class that I have written using the PHP programming language. The TagCollection class, as I’ve called it, is responsible for creating the cloud. Conceptually the steps are:
- Retrieve the RSS feed, for this part my class uses the SimplePie library.
- Build a tag cloud using the terms extracted by the Pipe in step 1;
The class also exposes methods to retrieve the collection of tags directly. Using this functionality I can provide the two stage process as seen at the THALI-Tags website. Whereby the first page is the tag cloud, and the second is a list of all of those posts associated with that tag.
A feature added to the THALI-Tags website with the latest update is the creation of an RSS feed that contains the top ten most popular tags. This feed can then be used to display the top ten tags in places such as the side bar of blogs, or a users iGoogle page.
Acknowledgements
I’d like to acknowledge the support of the members of the librariesinteract.info collaborative blog and for allowing me to experiment with their content, and to Kathryn for the original idea.
My thanks also to Jim Bumgardner who wrote the O’Reilly Short Cut ebook “Building Tag Clouds in Perl and PHP“. It was an invaluable resource while trying to work out how to create a tag cloud.





