stax   6 comments

The StAX Java API for XML processing is designed for parsing XML streams, just like the SAX API’s. The main differences between the StAX and SAX API’s are:

StAX is a “pull” API. SAX is a “push” API.
StAX can do both XML reading and writing. SAX can only do XML reading.

It is pretty obvious what the difference between a “read + write” capable API vs. a “read” capable API is. But the difference between a “pull” and a “push” style API is less obvious, so I’ll talk a little about that. For a more feature-by-feature type comparison of SAX and StAX, see the text SAX vs. StAX.

NOTE: This text uses SVG (Scalable Vector Graphics) diagrams. If you are using Internet Explorer you will need the Adobe SVG Plugin do display these diagrams. Firefox 3.0.5+ users and Google Chrome users should have no problems.

“Pull” vs. “Push” Style API

SAX is a push style API. This means that the SAX parser iterates through the XML and calls methods on the handler object provided by you. For instance, when the SAX parser encounters the beginning of an XML element, it calls the startElement on your handler object. It “pushes” the information from the XML into your object. Hence the name “push” style API. This is also referred to as an “event driven” API. Your handler object is notified with event-calls when something interesting is found in the XML document (“interesting” = elements, texts, comments etc.).

The SAX parser push style parsing is illustrated here:

SAX parser —> Your App

StAX is a pull style API. This means that you have to move the StAX parser from item to item in the XML file yourself, just like you do with a standard Iterator or JDBC ResultSet. You can then access the XML information via the StAX parser for each such “item” encountered in the XML file (“item” = elements, texts, comments etc.).

The StAX parser pull style parsing is illustrated here:

Your App —> StAX parser

In fact, StAX has two different reader API’s. One that looks most like using an Iterator and one that looks most like using a ResultSet. These are called the “iterator” and “cursor” readers.

So, what is the difference between these two readers?

The iterator reader returns an XML event object from it’s nextEvent() calls. From this event object you can see what type of event you had encountered (element, text, comment etc.). This event element is immutable, and can be parsed around to other parts of your application. You can also hang on to earlier event objects when iterating to the next event. As you can see, this works very much like how you use an ordinary Iterator when iterating over a collection. Here, you are just iterating over XML events. Here’s a sketch:

XMLEventReader reader = …;

while(reader.hasNext()){
XMLEvent event = reader.nextEvent();

if(event.getEventType() == XMLEvent.START_ELEMENT){
StartElement startElement = event.asStartElement();
System.out.println(startElement.getName().getLocalPart());
}
//… more event types handled here…
}

The cursor reader does not return events from it’s next() call. Rather this call moves the cursor to the next “event” in the XML. You can then call methods directly on the cursor to obtain more information about the current event. This is very similar to how you iterate the records of a JDBC ResultSet, and call methods like getString() or getLong() to get values from the current record pointed to by the ResultSet. Here is a sketch:

XMLStreamReader reader = …;

while(reader.hasNext()){
int eventType = streamReader.next();

if(eventType == XMLStreamReader.START_ELEMENT){
System.out.println(streamReader.getLocalName());
}

//… more event types handled here…
}

So, one of the main differences is, that you can hang on to earlier XML event objects when using the iterator style API. You cannot do this when using the cursor style API. Once you move the cursor to the next event in the XML stream, you have no information about the previous event. This speaks in favour of using the iterator style API.

However, the cursor style API is said to be more memory-efficient than the iterator style API. So, if your application needs absolute top-performance, use the cursor style API.

Both of these two StAX API’s will be covered in more detail in later texts. See the table of contents in the right side of this page.

Java StAX Implementation

At the time of writing (Java 6) only the StAX interfaces are bundled with the JDK. There is no StAX implementation built into Java. But, there is a standard implementation which can be found here:

http://stax.codehaus.org/

Posted 2012年01月28日 by gw8310 in java, xml

6 responses to “stax

Subscribe to comments with RSS.

  1. The class javax.xml.stream.XMLInputFactory is a root component of the Java StAX API. From this class you can create both an XMLStreamReader and an XMLEventReader. Here are two examples:

    XMLInputFactory factory = XMLInputFactory.newInstance();

    XMLEventReader eventReader =
    factory.createXMLEventReader(
    new FileReader(“data\\test.xml”));

    XMLStreamReader streamReader =
    factory.createXMLStreamReader(
    new FileReader(“data\\test.xml”));

    XMLInputFactory Properties

    You can set various properties on the XMLInputFactory instance using the setProperty() method. Here is an example:

    factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, true);

    For a full list of properties and their meaning, see the official JavaDoc (in Java 6) for the StAX API.

  2. The class javax.xml.stream.XMLOutputFactory is a root component of the Java StAX API. From this class you can create both an XMLStreamWriter and an XMLEventWriter. Here are two examples:

    XMLOutputFactory factory = XMLOutputFactory.newInstance();

    XMLEventWriter eventWriter =
    factory.createXMLEventWriter(
    new FileWriter(“data\\test.xml”));

    XMLStreamWriter streamWriter =
    factory.createXMLStreamWriter(
    new FileWriter(“data\\test.xml”));

    XMLOutputFactory Properties

    You can set one property on the XMLOutputFactory instance using the setProperty() method. Here is an example:

    factory.setProperty(XMLOutputFactory.IS_REPAIRING_NAMESPACES, true);

    For a full list of properties and their meaning, see the official JavaDoc (in Java 6) for the StAX API.

  3. The XMLEventReader class in Java StAX provides an Iterator style API for parsing XML. In other words, it allows you to move from event to event in the XML, letting you control when to move to the next event. An “event” in this case is for instance the beginning of an element, the end of an element, a group of text etc. In other words, pretty much the same events you would get from a SAX parser.

    You create an XMLEventReader via the javax.xml.stream.XMLInputFactory class. Here is how that looks:

    XMLInputFactory factory = XMLInputFactory.newInstance();

    //get Reader connected to XML input from somewhere..
    Reader reader = getXmlReader();

    try {

    XMLEventReader eventReader =
    factory.createXMLEventReader(reader);

    } catch (XMLStreamException e) {
    e.printStackTrace();
    }

    Once created you can iterate through the XML input from the underlying Reader. Here is how that looks:

    while(eventReader.hasNext()){

    XMLEvent event = eventReader.nextEvent();

    if(event.getEventType() == XMLStreamConstants.START_ELEMENT){
    StartElement startElement = event.asStartElement();
    System.out.println(startElement.getName().getLocalPart());
    }
    //handle more event types here…
    }

    You obtain an XMLEvent object from the XMLStreamReader by calling its nextEvent() method. From the event object you can check what type of event you’ve got, by calling its getEventType() method. Depending on what type of event you have encountered, you will do different actions.

    XML Stream Events

    Below is a list of the events you can encounter in an XML stream. There are constants for each of these events in the javax.xml.stream.XMLStreamConstants interface.

    ATTRIBUTE
    CDATA
    CHARACTERS
    COMMENT
    DTD
    END_DOCUMENT
    END_ELEMENT
    ENTITY_DECLARATION
    ENTITY_REFERENCE
    NAMESPACE
    NOTATION_DECLARATION
    PROCESSING_INSTRUCTION
    SPACE
    START_DOCUMENT
    START_ELEMENT

    XMLEvent Processing

    From the XMLEvent object you can get access to the corresponding XML data. You can also get information about where (line number + column number) in the XML stream the event was encountered.

    You can turn the event object into a more specific event type object, by calling one of these 3 methods:

    asStartElement()
    asEndElement()
    asCharacters()

    Exactly how that works with events like START_DOCUMENT, NAMESPACE or PROCESSING_INSTRUCTION, I don’t yet know. I’ll update this text when I do. Luckily, we will most often only need the START_ELEMENT, END_ELEMENT, and CHARACTERS events, so this lack of knowledge isn’t crucial.

    XMLEvent.asStartElement()

    The asStartElement() method returns a java.xml.stream.StartElement object. From this object you can get the name of the element, get the namespaces of the element, and the attributes of the element. See the Java 6 JavaDoc for more detail.

    XMLEvent.asEndElement()

    The asEndElement() method returns a java.xml.stream.EndElement object. From this object you can get the element name and namespace.

    XMLEvent.asCharacters()

    The asCharacters() method return a java.xml.stream.Characters object. From this object you can obtain the characters themselves, as well as see if the characters are CDATA, white space, or ignorable white space.

  4. The XMLEventWriter class in the Java StAX API allows you to write StAX XMLEvent’s either to a Writer, an OutputStream, or a Result (special JAXP object).

    Here is a simple example that writes a series of events to disk, using a FileWriter:

    XMLOutputFactory factory = XMLOutputFactory.newInstance();
    XMLEventFactory eventFactory = XMLEventFactory.newInstance();

    try {
    XMLEventWriter writer =
    factory.createXMLEventWriter(
    new FileWriter(“data\\output.xml”));

    XMLEvent event = eventFactory.createStartDocument();
    writer.add(event);

    event = eventFactory.createStartElement(
    “jenkov”, “http://jenkov.com”, “document”);
    writer.add(event);

    event = eventFactory.createNamespace(
    “jenkov”, “http://jenkov.com”);
    writer.add(event);

    event = eventFactory.createAttribute
    (“attribute”, “value”);
    writer.add(event);

    event = eventFactory.createEndElement(
    “jenkov”, “http://jenkov.com”, “document”);
    writer.add(event);

    writer.flush();
    writer.close();
    } catch (XMLStreamException e) {
    e.printStackTrace();
    } catch (IOException e) {
    e.printStackTrace();
    }

    The result of executing this code is the following XML file (line breaks inserted for readability):

    As you can see, it is possible to generate XML using XMLEvent’s and the XMLEventWriter. But, if you are looking to just output some quick XML, you might be better off using the XMLStreamWriter instead. It’s API is easier to work with, and results in more dense code.

    Chaining XMLEventReader and XMLEventWriter

    It is possible to add the XMLEvent’s available from an XMLEventReader directly to an XMLEventWriter. In other words, you are pooring the XML events from the reader directly into the writer. You do so using the XMLEventWriter.add(XMLEventReader) method.

  5. The XMLStreamReader class in Java StAX provides a Cursor style API for parsing XML. Like the Iterator API it allows you to move from event to event in the XML, letting you control when to move to the next event. An “event” in this case is for instance the beginning of an element, the end of an element, a group of text etc. In other words, pretty much the same events you would get from a SAX parser.

    To read more about the difference between the Iterator and Cursor style API’s, read the introduction to StAX: Java StAX Parser

    You create an XMLStreamReader via the javax.xml.stream.XMLInputFactory class. Here is how that looks:

    XMLInputFactory factory = XMLInputFactory.newInstance();

    //get Reader connected to XML input from somewhere..
    Reader reader = getXmlReader();

    try {

    XMLStreamReader streamReader =
    factory.createXMLStreamReader(reader);

    } catch (XMLStreamException e) {
    e.printStackTrace();
    }

    Once created you can iterate through the XML input from the underlying Reader. Here is how that looks:

    XMLStreamReader streamReader = factory.createXMLStreamReader(
    new FileReader(“data\\test.xml”));

    while(streamReader.hasNext()){
    streamReader.next();
    if(streamReader.getEventType() == XMLStreamReader.START_ELEMENT){
    System.out.println(streamReader.getLocalName());
    }
    }

    You obtain the event type by calling the XMLStreamReader.getEventType() method. When you know the event type, you can process the given event as you need.

    XML Stream Events

    Below is a list of the events you can encounter in an XML stream. There are constants for each of these events in the javax.xml.stream.XMLStreamConstants interface.

    ATTRIBUTE
    CDATA
    CHARACTERS
    COMMENT
    DTD
    END_DOCUMENT
    END_ELEMENT
    ENTITY_DECLARATION
    ENTITY_REFERENCE
    NAMESPACE
    NOTATION_DECLARATION
    PROCESSING_INSTRUCTION
    SPACE
    START_DOCUMENT
    START_ELEMENT

    XML Event Processing

    From the XMLStreamReader you can get access to the corresponding XML data. You can also get information about where (line number + column number) in the XML stream the event was encountered.

  6. The XMLStreamWriter class in the Java StAX API allows you to write XML events (elements, attributes etc.) either to a Writer, an OutputStream, or a Result (special JAXP object).

    Here is a simple example that writes a series of events to disk, using a FileWriter:

    XMLOutputFactory factory = XMLOutputFactory.newInstance();

    try {
    XMLStreamWriter writer = factory.createXMLStreamWriter(
    new FileWriter(“data\\output2.xml”));

    writer.writeStartDocument();
    writer.writeStartElement(“document”);
    writer.writeStartElement(“data”);
    writer.writeAttribute(“name”, “value”);
    writer.writeEndElement();
    writer.writeEndElement();
    writer.writeEndDocument();

    writer.flush();
    writer.close();

    } catch (XMLStreamException e) {
    e.printStackTrace();
    } catch (IOException e) {
    e.printStackTrace();
    }

    The result of executing this code is the following XML file (line breaks inserted for readability):

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: