I recently missed the 10th birthday of XML

The birthday on February 10th was celebrated with a blog post about the XML People.

I’d like to add that XML is something like an eccentric coworker who manages to get entangled in everyone else’s work, either by invitation, pushed from above, or sneaked in by someone who knows him or her.

Validating XPath expressions against XML schema

I have come across a situation where I want to make sure that an XPath expression is valid for querying a document that conforms to a specific XML Schema. My first thought was: this must have been done before and I started googling. I found some discouraging discussions. One approach was to create a dummy document from the schema and then try to evaluate the XPath expression. Nice try, but not ideal.

The XPath expressions I want to validate are actually very simple: they are absolute location paths and so far I’ve only seen elements (including namespace prefix) as nodes. I was thinking of using the XPath parsing in REXML but that’s clearly overkill when the XPath expressions are so simple. Keep It Simple, Stupid! I wonder if such a validation could be written in an XSLT that traverses the XML Schema?

Is required in or not?

In an XSLT script I didn’t use <xsl:text> inside of <xsl:message>. It worked fine in xsltproc, but the Java XSLT implementation we use complained. What was wrong then? The text in an <xsl:message> was not inside an <xsl:text> element.

Bad code:

<xsl:message>Don't do like this</xsl:message>

Good code:

<xsl:message><xsl:text>Do like this</xsl:text></xsl:message>

To avoid making the same mistake (or another) I wanted to validate my XSL. Searching for “xslt.xsd” turned up something that seems out-of-date. Some further searching revealed the Validating XSLT 2.0 article that include a link to xslt10.rnc. For some reason Jing didn’t recognize this file until I converted it to XML syntax with Trang. Unfortunately the above mistake was not caught by the RELAX NG schema either!

Actually, the <xsl:message> example at w3schools.com does not use <xsl:text> in their example, so maybe it is simply an implementation issue.

XSLT hacker

So far, this week has been a week of XSLT hacking. My favorite is probably the XSLT script that generates another XSLT script! 🙂

At one point I wanted to copy the default namespace (xmlns=…) from the source document to the destination document.

I did like this to retrieve the URI for the default namespace. Maybe there is a simpler way?

<xsl:variable name="uri" select="string(namespace::*[name(.) = ''])"/>

In order to set the default namespace I found the excellent Creating namespace nodes in XSLT 1.0 blog post. The most relevant part was this hack:

<xsl:attribute
  name="{ concat($prefix, ':dummy-for-xmlns') }"
  namespace="{ $uri }">
</xsl:attribute>

I’d also like to mention that Jing seems to work fine for validating an XML document against an XML schema definition (XSD).

Creating an Atom feed from a web page

Most of my time at the computer yesterday was spent working on an unofficial syndication of the broadcast archive for the Swedish radio show P3 Rytm. It’s actually quite simple… 🙂

  1. Use curl to download the web page broadcast archive for the show
  2. Use iconv to convert the page to UTF-8
  3. Use tidy to convert HTML to XHTML
  4. Use sed to adjust some URLs
  5. Use xsltproc and an XSLT script to transform the relevant parts of the web page to an XML that almost is an Atom feed
  6. Use sed to convert the dates in the XML so they are in the right format. It’s now a proper Atom feed.

Update Of course there are several online services to automate this. (Read the comments too.)

A generic XML visualization with Graphviz?

Why can’t I find an XSLT file that that converts any XML file to a Graphviz DOT file that visualizes elements and attributes? It’s almost like I’m beginning to doubt my Google “skillz”! I’ve found some scripts that are specially targeted to various XML-based formats, but no generic one. Maybe it’s so simple that I could have made it myself on the time I spent searching and writing this?

Apache Ant is regrettably (?) using XML

I’ve done some Apache Ant build file hacking today and that made me recall that the original author of Ant, James Duncan Davidson, actually regretted using XML as the file format.

Mysteriously missing from today’s world wide web, but fortunately captured by the Wayback Machine, the blog entry Ant and XML is worth reading. The entry is concluded like this:

If I knew then what I knew now, I would have tried using a real scripting language, such as JavaScript via the Rhino component or Python via JPython, with bindings to Java objects which implemented the functionality expressed in todays tasks. Then, there would be a first class way to express logic and we wouldn’t be stuck with XML as a format that is too bulky for the way that people really want to use the tool.

Or maybe I should have just written a simple tree based text format that captured just what was needed to express a project and no more and which would avoid the temptation for people to want to build a Turing complete scripting environment out of Ant build files.

Both of these approaches would have meant more work for me at the time, but the result might have been better for the tens of thousands of people who use and edit Ant build files every day.

Hindsight is always 20/20.

The “real scripting language” part sounds just like Ruby’s build program Rake and “a simple tree based text format” sounds very much like YAML, a format well worth considering for hierarchical data structures.