Recently, Ian Hickson announced a new microdata section of the HTML5 spec. Microdata provide a method of annotating HTML content for scripted data extraction.

Microformats and microdata

Microformats allow authors to mark up events, contact information, etc. in a machine-extractable manner, within the constraints of conforming HTML 4 or XHTML 1. They do so by using the language extension mechanisms present in HTML 4: head@profile, @class, meta@name, @rel, and the like.

As I’ve said elsewhere:

Microformats are a great example of a community coming together and taking advantage of HTML’s existing extensibility points[…] microformats thrive within the constraints of HTML’s existing extensibility points.

Ian touched upon this in the WaSP interview:

Microformats [are] natively supported in HTML5, just like [they were] in HTML 4, because Microformats use the built-in extension mechanisms of HTML.

HTML5’s microdata proposal isn’t some kind of competing way to mark up such data, it’s a change to the underlying language extension mechanisms. In the future, when microformats are defined on top of HTML5, they will be able to take advantage of microdata attributes (@item, @itemprop, and the like), its unambiguous data extraction algorithm, as well as other new bits of HTML5 (e.g. the <time> element).

I look forward to future revisions of hCard, hCalendar, etc. built on top of HTML5’s microdata. I expect HTML5’s predefined vocabularies—if/when extracted from the main HTML5 spec—will provide the basis for such reformulations.

RDF and microdata

HTML5 contains the definition of an algorithm for extracting RDF triples from any HTML document, including any microdata items present. Microdata allow for the typing of items by URL, and thus allow authors to express many RDF triples natively in HTML.

RDFa and microdata

RDFa is a way to embed RDF into XML vocabularies. Unlike microformats (and, for that matter, unlike eRDF), it was never designed to work within the constraints of HTML 4. Instead, it was designed as a set of new XML attributes that could be used within XHTML2 documents, and then was back-ported to XHTML 1. Since it wasn’t designed with the HTML Design Principles in mind, it should come as no surprise that RDFa violates several of them, and so isn’t suitable for inclusion in the Web platform. As Ian put it in his WaSP interview:

We considered RDFa long and hard[…], but at the end of the day, while some people really like it, I don’t think it strikes the right balance between power and ease of authoring. For example, it uses namespaces and prefixes, which by and large confuse authors to no end.

Despite RDFa’s deficiencies, it would still be a good thing if implementors like Google and Yahoo had an unambiguous specification of how to process it in the wild. I very much hope that the RDFa community embraces Philip’s effort along such lines.

Comments

  1. “Instead, it was designed as a set of new XML attributes that could be used within XML (including XHTML) documents.”

    That’s not true and presenting the technology falsely.

    RDFa originally started as a spin-off of XHTML2 (which is very similar to XHTML1) and then was applied to XHTML1 (which is basically HTML4). So the technology was designed for use in HTML from the get-go, and certain features that were originally there (such as link and meta in the body) were taken out because they would not work in real HTML browsers.

    The fact that it can be used in other XML technologies is merely a property of XML, and something they kept in mind when speccing it out so as to not unnecessarily lock the technology in to HTML.

    On a side note, at the time RDFa was created, the HTML Design Principles document didn’t even exist. I would also contend your opinion ‘and so isn’t suitable for inclusion in the Web platform’. Even though it is not invented by Ian Hickson, it works fine and is being deployed widely today.

    ~Laurens

    Laurens Holst, 22 May 2009

  2. Hi Laurens,

    RDFa originally started as a spin-off of XHTML2[…] and then was applied to XHTML1[…]

    Thanks for the correction; I’ve updated the text accordingly.

    […] XHTML2 (which is very similar to XHTML1) […] XHTML1 (which is basically HTML4). So the technology was designed for use in HTML from the get-go[…]

    Operationally, on the Web of ~1 trillion text/html documents and applications, XHTML2 is not similar at all to XHTML 1-served-as-text/html—which is effectively all of the XHTML in existence.

    and certain features that were originally there (such as link and meta in the body) were taken out because they would not work in real HTML browsers.

    I’m happy to hear that the RDFa authors took steps to reduce the impedance mismatch between RDFa and the Web. I hope they take further such steps but, in the discussion surrounding RDFa-in-HTML, I get the impression that they’re unwilling to change RDFa-in-XHTML behavior to match whatever RDFa-in-HTML ends up looking like.

    On a side note, at the time RDFa was created, the HTML Design Principles document didn’t even exist.

    While some of the Design Principles are about HTML specifically, many of them are general principles for any technology intended to be part of the Web platform. Think of it as describing the bits that AWWW got wrong. (AWWW itself was intended to describe the Web-as-envisioned-in-1998, and thus doesn’t describe the Web-as-it-exists all that well.)

    Edward O’Connor, 23 May 2009

Add a comment

Posting...