Process XML in Java more easily by... not using Java?

Java and XML are like peanut butter and chocolate, right? They just go together, don't they? Jon Bosak of Sun summed it up most famously as "XML gives Java something to do".

XML gives Java developers something to do
I think I see Sun's master plan: the more work Java developers must do to solve a given problem, the more of them we'll need to have. The more Java developers we have, the more popular Java appears. (The only problem left to solve is how to profit from that, but I digress). The key to this plan's success is lack of syntactic support for XML, and the added bonus is that even more developers will occupy their time building endless numbers of libraries! "Why make trillions when we can make... BILLIONS?!" :-)

There are some decent libraries for handling XML in Java, but most are quite heavyweight, requiring you to create Java classes (in the case of the XML binding variety) or at least a config file (XML-based, naturally!).

There is another way
I've written before about using other languages with Java, and Groovy has some really nice features for handling XML. Groovy compiles to Java bytecode so you can write an XML handling class in Groovy and use it within your Java app. Here's an example that uses Groovy's GPath to read in an XML file from Sun's XML DOM tutorial, and prints some of the elements. Just for fun, we're loading the XML file directly by http:

def slideShow = new XmlSlurper().parse("http://java.sun.com/webservices/jaxp/dist/1.1/docs/tutorial/dom/samples/slideSample01.xml")

println("slideshow author attribute: ${slideShow['@author']}")

def slides = slideShow.slide
for (slide in slides) {
    println("slide title element: " + slide.title)
    for (item in slide.item) {
        println("  item text: ${item}")
    }
}

Here's the Java version as a comparison. Honestly, nobody really uses the DOM API because it's so painful (hence the existence of so many libraries to do the job), but this is the out-of-the-box Java way:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
URL url = new URL("http://java.sun.com/webservices/jaxp/dist/1.1/docs/tutorial/dom/samples/slideSample01.xml");
InputStream stream = url.openStream();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(stream);

Element rootElement = document.getDocumentElement();
String author = rootElement.getAttribute("author");
System.out.println("slideshow author attribute: " + author);

NodeList slides = rootElement.getChildNodes();

for (int i = 0; i < slides.getLength(); i++) {
    if (slides.item(i).getNodeType() == Node.ELEMENT_NODE) {
        Node slide = slides.item(i);
        NodeList slideChildren = slide.getChildNodes();
        for (int j = 0; j < slideChildren.getLength(); j++) {
            Node child = slideChildren.item(j);
            if (child.getNodeType() == Node.ELEMENT_NODE) {
                if (child.getNodeName().equals("title")) {
                    System.out.println("slide title element: "
                            + child.getTextContent());
                } else if (child.getNodeName().equals("item")) {
                    System.out.println("  item text: "
                            + child.getTextContent());
                }
            }
        }
    }
}

I have omitted error handling.

Using XPath in Java can make our job easier when it comes to searching a tree for a node (you wouldn't want to see this done with pure DOM manipulation!), but we'll still still have to deal with an org.w3c.dom.Node object that is the result:

XPath xpath = XPathFactory.newInstance().newXPath();

String expression = "/slideshow/slide[title = 'Overview']";
Node overview = (Node) xpath.evaluate(expression, document, XPathConstants.NODE);

Here's the same thing done in Groovy, a simple one-liner:

def overview = slideShow.slide.find { it.title == "Overview" }

Two interesting things here are the use of a closure, and the use of == which might look wrong to a sharp-eyed Java programmer. Groovy uses == for equality and the method is() to test for identity. This is the opposite of Java but leads to more readable code because typically we compare for equality much more often than identity. And == works with null in Groovy, so your typical Java comparison (x != null && x.equals(y)) becomes x == y in Groovy, which looks more like what we mean.

Creating XML is also easy in Groovy using MarkupBuilder, here is the code to create our sample XML file:

def doc = new StreamingMarkupBuilder().bind {

  slideshow(title:"Sample Slide Show", date:"Date of publication", author:"Yours Truly") {
      slide(type:"all") {
          title("Wake up to WonderWidgets!")
      }
      slide(type:"all") {
          title("Overview")
          item("item") {
              mkp.yield("Why ")
              em("WonderWidgets")
              mkp.yield(" are great")
          }
          item()
          item("item") {
              mkp.yield("Who ")
              em("buys")
              mkp.yield(" WonderWidgets")
          }
      }

      mkp.yield("normal text here")
  }
}

System.out << doc

It's easy to read because it has a visually similar structure to the XML that it's creating, and the syntax is quite terse. Groovy has similar builders for other things besides XML, for example to create SWT and Swing UI's. You can even create your own. More on how this is implemented coming soon...

There are other choices besides Groovy, of course, I haven't tried it yet but Scala looks like it might be a good choice. (For that matter there are other choices besides XML!)

Categories: Java, Software Development

Share

Like this article? I spend a lot of time writing here, it would only take a second to say thanks by sharing...

Comments

Tell me what you think on Twitter or privately by email.