joemarini.com

Display a List of Categorized Links with PHP, XML, and SAX

Article Summary: demonstrates a simple way to separate content from presentation using PHP's SAX-based XML processing methods to read and display a categorized list of links stored in an XML file. The resulting output is displayed using HTML.

PHP APIs used: file, implode, xml_parser_create, xml_parser_free, xml_parser_set_option, xml_set_element_handler, xml_parse

download Download the code for this tutorial

A little while back I wanted to add a new feature to my site's home page that displayed a list of links to useful sites categorized by topic. It seemed like a simple enough job — just put together some <ul>, <li>, and <a> tags and I would be done.

The more I thought about it, however, the more I wanted to be able to accomplish with this list of links. First, I wanted to minimize the amount of editing I would need to do in order to update the list. Second, I wanted to keep the content of the list separate from its presentation, so that if I ever changed my mind about how it should look, I wouldn't have to change a lot of repetitive HTML and CSS code. Finally, I wanted to make the list accessible from more places than just the HTML version of the site's home page.

The solution I ended up with was to use XML to store the link data, and to use PHP's support for SAX, the Simple API for XML, to process it and output the result. In this article I'll review how the solution works and discuss some of the decisions made along the way.

XML and PHP

It didn't take long to figure out that the right solution was to use XML to store the link data. However, the decision involved more than just choosing to use XML as the storage medium — choosing how to process the XML information and display it in the browser was just as important. In this case, I decided to use the SAX method to process the XML data.

PHP4 provides two main APIs for processing XML: the SAX and DOM interfaces. Each has its own advantages and disadvantages in particular situations. The DOM is the preferred method to use when the XML document must be modified in place or when you need to keep the document around in memory to do more advanced processing of the XML data, but is more complicated to use than the SAX method. The SAX API is better when you only need to run through the XML file once and process each XML tag individually, but does not provide a way to edit the contents of the document. For my purpose, since I only needed to run through the document once and output the results, SAX was the way to go.

How SAX Works

The SAX model of XML processing works by sequentially running through the entire XML file from beginning to end and calling event handler functions for each type of element that is encountered in the XML file. Your code tells the SAX parser what types of XML elements it is interested in, such as tags, character data, entities, etc., and then defines functions that will be called when the parser encounters that type of element in the XML file. You then register these functions with the XML parser, and give it some XML to parse.

While simple (that is, after all, what the "S" in SAX stands for), some of the major downsides of this method should be readily apparent. First, you can't re-visit a part of the XML file that has already been processed. If you need to go back again, you have to start the processing all over from the beginning. Second, you can't modify the XML document that is being processed. Third, the SAX parser doesn't keep track of context for you. For example, you can't ask the parser whether the tag you're about to process is inside of another tag, or whether some other tag has already been processed. That information is long gone by the time your handler gets called, so if you need to keep track of things like that, you have to do it yourself.

Still, the SAX method is highly efficient when all you need to do is examine the contents of the XML file and things like order of processing aren't very important. Since that describes the current need for processing this links file, SAX fits the bill rather well.

Creating the Links File

The format of the XML file that contains the links is fairly straightforward. The <links> tag is the root element of the XML file, which contains a series of <category> tags. Each category tag contains one or more <link> tags that defines each link. The url attribute contains the URL for the link, and the desc attribute contains the link's description:

<?xml version="1.0" encoding="iso-8859-1"?>
<links>
  <category desc="XAML Related">
    <link url="http://longhorn.msdn.microsoft.com" desc="Longhorn SDK Home"/>
    <link url="http://longhornblogs.com" desc="Longhorn Blogs"/>
    <link url="http://www.xamlon.com" desc="Xamlon.com"/>
    <link url="http://www.zaml.com" desc="Zaml.com"/>
    <link url="http://www.xaml.net" desc="XAML.net"/>
  </category>
  <!-- more categories would follow here -->
</links>

When processed, each of these links will be displayed to the user as a clickable hyperlink on the web page. The url attribute will be transformed into an <a> tag, and the desc attribute will become the <a> tag's text content.

Processing the Links File

Processing the links file consists mainly of converting each of the XML tags into a corresponding HTML tag with the right content. Since we're only interested in the content of element tags, the PHP code only needs to define handlers for elements.

The complete code to do the job is shown here:

function startElemHandler($parser, $name, $attribs) {
  if (strcasecmp($name, "links") == 0) {
    echo "<div id='linksList'>\n";
  }
  if (strcasecmp($name, "category") == 0) {
    $desc = $attribs["desc"];
    echo "<p>$desc</p>\n<ul>\n";
  }
  if (strcasecmp($name, "link") == 0) {
    $linkRef = $attribs["url"];
    $desc = $attribs["desc"];
    if ($desc == "")
      echo "\t<li><a href='$linkRef' target='_blank'>$linkRef</a></li>\n";
    else
      echo "\t<li><a href='$linkRef' target='_blank'>$desc</a></li>\n";
  }
}

function endElemHandler($parser, $name) {
  if (strcasecmp($name, "links") == 0) {
    echo "</div>\n";
  }
  if (strcasecmp($name, "category") == 0) {
    echo "</ul>\n";
  }
}

/* create the parser */
$parser = xml_parser_create();
xml_set_element_handler($parser, startElemHandler, endElemHandler);
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);

// read the contents of the links file
$strXML = implode("",file('links.xml'));

// output each link
xml_parse($parser, $strXML);

// clean up - we're done
xml_parser_free($parser);

The first two functions, startElemHandler and endElemHandler, are the functions that will be called by the XML parser to handle the beginning and ending of the XML tags that it encounters during parsing. I'll get back to those in a moment.

The XML parser object itself is created by the call to xml_parser_create(). This returns an object that you use in calls to other XML parser methods. After the parser is created, the call to xml_set_element_handler() is where the two element handler functions are supplied to the parser.

The call to xml_parser_set_option() turns off the case folding option. Case folding basically means applying an upper-casing transform to characters that are lower-case. This option is on by default in the parser, so the code turns it off. If your element processing code doesn't care about the case of the tag names that it handles, you can just leave it on.

The XML file is then loaded into the $strXML variable via the call to the implode() function, which reads the contents of the links file into an array by calling the file() function and concatenates all of the lines into a string. This variable is then given to the xml_parse() method, which starts the parsing process and results in the handler functions being called.

The XML parser will call the element handler functions at two distinct points: once when the opening tag has been processed, and again when the matching closing tag is processed. The handler for the opening tag is called when the ">" character for the tag is reached. At this point, the parser knows the name of the tag, and has also collected all of the attributes and their values into an array. Both of these are sent to the handler function as arguments.

The startElemHandler function transforms each of the incoming XML tags into a snippet of HTML code that will display the list. The top-level <links> tag in the XML file is converted into a <div> here, which will contain the entire list. Each <category> tag is transformed into a <p> with the value of the category's desc attribute serving as the text content, along with an opening <ul> tag to start the list. Finally, each <link> tag is turned into a <li> with a nested <a> tag that holds the value of the url attribute as the link destination. If there is a desc attribute for the link its value becomes the text content of the <a> tag, otherwise the value of the url attribute is used.

The endElemHandler function is used to close off the HTML tags as the end of each XML tag in the links file is processed. When the processing is finished, the xml_parser_free() function call frees the parser from memory and disposes of it.

Conclusion

Although this is a relatively simple example, the concepts presented here carry over to more complex situations as well. The content of the links file has now been effectively separated from how it will be presented to the user. This decoupling allows the XML and presentation parts to be changed independently from one another. We've also seen how the Simple API for XML (SAX) can be used to quickly process an XML data file when it isn't necessary to maintain the contents of the XML document in memory or to edit the document's contents. end of article

download Download the code for this tutorial.

For information about how to obtain permission to re-publish this material, please contact us at info@joemarini.com.