Site-wide RDF metadata

I decided to add RDF metadata to all these pages. Here's how I did it.

Having recently started a bit of a semantic web kick, the next logical step was for me to include rdf metadata for all pages on this site. This is a bit of a hack as xhtml does not support rdf directly so there are several possible ways to go about including rdf in web pages. I don't really like the rfc2731 proposal because while it is html, it's not rdf (or a recognisably sane serialisation of rdf) so I decided to go with a couple of the other approaches. As a result, rdf is embedded in comments in the head of each page and linked from the page also. You can see this if you "view source" on this page.

All of this site uses cheetah templates, and you can include a file from within a cheetah template and it will be rendered as dynamic content. So getting the "rdf in html comments" to work was just a question of adding this snippet to my templates and writing a metadata.rdf template.

<!--
    #include "metadata.rdf"
-->

The metadata.rdf template is pretty straightforward too:

<rdf:RDF xmlns:cc="http://web.resource.org/cc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
#if $varExists('article')
    <rdf:Description rdf:about="$back_link">
        <dc:creator>Sean Hunter</dc:creator>
        <dc:title>$article.title</dc:title>
        <dc:description>$article.precis</dc:description>
        <dc:date>$mtime</dc:date>
        <dc:type>Text</dc:type>
        <dc:format>text/html</dc:format>
        <dc:identifier>http://www.uncarved.com/$article.name</dc:identifier>
        <dc:language>en-GB</dc:language>
    </rdf:Description>
#else
    <rdf:Description rdf:about="$back_link">
        <dc:creator>Sean Hunter</dc:creator>
        #if $current_tagname
        <dc:title>The Uncarved Block/$current_tagname</dc:title>
        <dc:subject>$current_tagname</dc:subject>
        #else
        <dc:title>The Uncarved Block</dc:title>
        #end if
        <dc:description>A collection of articles and software</dc:description>
        <dc:date>$most_recent</dc:date>
        <dc:type>Text</dc:type>
        <dc:format>text/html</dc:format>
        <dc:identifier>http://www.uncarved.com/</dc:identifier>
        <dc:language>en-GB</dc:language>
    </rdf:Description>
#end if
    <cc:Work rdf:about="Web Articles and Software">
        <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.5/" />
    </cc:Work>
    <cc:License rdf:about="http://creativecommons.org/licenses/by/2.5/">
        <cc:permits rdf:resource="http://web.resource.org/cc/Reproduction"/>
        <cc:permits rdf:resource="http://web.resource.org/cc/Distribution"/>
        <cc:requires rdf:resource="http://web.resource.org/cc/Notice"/>
        <cc:requires rdf:resource="http://web.resource.org/cc/Attribution"/>
        <cc:permits rdf:resource="http://web.resource.org/cc/DerivativeWorks"/>
    </cc:License>
</rdf:RDF>

The reason for the #if $varExists('article') branch is that I wanted a single rdf metadata template that worked for my index pages or my article-on-a-single page page. I tried to be as nerdily comprehensive in the metadata I provide as I could be, although I don't currently look up the subject tag for an article and I probably should. Nevertheless this works quite well and the CreativeCommons license metadata validator was able to find and parse the rdf within the page so I took that to mean that tools which look for rdf in html comments would be fine. I still wanted "detached" metadata to work, however, so I could get the metadata for a page in a seperate document. To do this, I had to extend my python blog script which drives the site so that I could get a page in "metadata only" mode and it would return the rdf for any page. That's how I produce the <link rel link in the head of these pages and the "Machine-readable metadata for this page can be found here" link which should be at the bottom of this page.

This approach seems to work although I am still not 100% happy. I would be happier if the w3c rdf validator liked my rdf. Currently it will validate if I paste it via direct entry but not if I ask it to fetch a page in metadata-only mode where it gives me an inscrutable An attempt to load the RDF from URI 'http://www.uncarved.com/?meta_only=1' failed. (Undecodable data when reading URI at byte 0 using encoding 'UTF-8'. Please check encoding and encoding declaration of your document.) Static rdfs seem fine so it must be something to do with the http headers. I thought I had cracked it when I got my script to set the Content-Type to "application/rdf+xml" and it seemed to validate for a while, but I realised that I had mispasted the magic line from my mimetypes file and set the content type to "application/rss+xml" instead. Fixing the Content-Type broke the validation again. If anyone knows how I fix this, please mail me at sean@uncarved.com as I would love to have it working so I can add "rdf" to my validate boast tags at the bottom of each page.

Actually a day or so on and the w3c validator now likes my metadata. I don't think I've changed anything relevant so maybe they are having problems with their service somehow. I'm going to add a validate rss link now...

Technorati tags: rdf semantic web


Unless otherwise specified the contents of this page are copyright © 2015 Sean Hunter. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.