Get up to 80 % extra points for free! More info:

Lesson 6 - Introduction to the XML File Format in Java

In the previous exercise, Solved tasks for Java Files Lessons 1-5, we've practiced our knowledge from previous lessons.

In the previous lesson, Solved tasks for Java Files Lessons 1-5, we focused on text files. Today, we're going to focus on the XML format. First we're going to describe it, then show the classes that Java provides for reading and writing it.

The XML Format

We're about to go over lots of terms. If you don't understand any of them, don't worry, we'll go into as much detail as possible in further lessons :)

XML (eXtensible Markup Language) is a markup language developed by W3C (the organization that is responsible for Web standards). XML is very universal and is supported by a number of languages and applications. The word extensible indicates the ability to create your own language using XML, one of which is XHTML for creating websites. XML is a self-describing language, meaning that it has a structure in which we can determine what each value means. In CSV files, we can only guess what the third number eight means, whereas in XML, it'd be immediately clear that it's the number of articles that the user has written. The disadvantage to it is that the XML files are larger, but it's not a problem for us in most cases. Personally, I almost always choose to use the XML format, it's a good choice for saving a program's configuration, high scores for game players, or for saving a small user database. Thanks to XSD schemas, we can also validate them so that we can prevent errors during run time.

XML can be processed in different ways. Usually, by continuously reading/writing or using a DOM object structure. We're so far in that some tools allow us to work with XML just like a database and execute SQL queries on it (the XPath or SQL languages are used to do that). As you can imagine, this saves a lot of work.

XML competes with JSON, which is simpler but less popular in business applications. Unlike XML, it can be used to log at the end of a file easily without loading the entire document.

XML is very often used to exchange data between different systems (e.g. desktop applications and web applications on a server). Therefore, as we've already mentioned, there are many libraries for it and every tool is aware of and is able to work with it. This includes web services, SOAP, and so on. However, we won't deal with any of them now.

Last time, we saved a list of users to a CSV file. We saved their name, age, and date of registration. The values were next to each other, separated by semicolons. Each line represented a user. The file's contents looked like this:

John Smith;22;3/21/2000
James Brown;31;10/30/2012

Anyone who isn't directly involved wouldn't know what any of that means, would they? Here is the equivalent to that file in the XML format:

<?xml version="1.0" encoding="UTF-8" ?>
<users>
    <user age="22">
        <name>John Smith</name>
        <registered>3/21/2000</registered>
    </user>
    <user age="22">
        <name>James Brown</name>
        <registered>10/30/2012</registered>
    </user>
</users>

Now everyone can tell what is stored in the file. I saved age as an attribute just to demonstrate that XML is able to do things like that. Otherwise, it'd be saved as an element along with the name and registration date. Individual items are called elements. I'm sure you're all familiar with HTML, which is based on the same fundamentals as XML. The elements are usually paired, meaning that we write the opening tag followed by the value and then the closing tag with a slash. Elements can contain other elements, so it has a tree structure. Furthermore, we're able to save an entire hierarchy of objects into a single XML document.

At the beginning of an XML file, there's a header. The document has to contain exactly one root element in order for it to be valid. Here, it's the <user> element which contains the other nested elements. Attributes are written after the attribute name in quotation marks.

As you can probably tell, the file got bigger, which is the price paid for it to look pretty. If the user had more than three fields, you'd be able to see just how messy the CSV format can get, and how worthwhile the XML format is. Personally, as I gain more and more experience, I prefer solutions that are clear and simple, even if that means that they occupy more memory. This not only applies to files but for source codes as well. There is nothing worse than when a programmer looks at their code after a year and has no idea what the eighth parameter in a CSV file is when there are 100 numbers per line. Even worse, having a five-dimensional array, which is super fast, but if they designed an object structure instead, they wouldn't have to rewrite this whole functionality now. However, let's get back to today's topic.

XML in Java

We'll focus on two fundamental approaches to work with XML files - the continuous approach (the SAX parser) and the object oriented approach (DOM). Today's and the next lessons will be dedicated to SAX, after which we'll get to DOM. Again, there are more ways to work with XML files in Java and there are lots of classes for that. I try to show the most modern approaches and simple constructs.

Parsing XML via SAX

SAX (stands for Simple API for XML) is actually a simple extension of the text file reader. Writing is relatively simple. We subsequently write the elements and attributes in the same order as they are present in the file (we ignore the tree structure in this approach). Java provides the XMLStreamWriter which is then wrapped by the SAXParserFactory class. This relieves us from having to deal with the fact that XML is a text file. We only work with the elements, more accurately, nodes (more on that later).

Reading is performed just like writing. We read the XML as a text file, line by line, from top to bottom. SAX gives us what are known as nodes which it gets while reading. A node can be an element, an attribute, or a value. We receive nodes in a loop in the same order that they're written in the file. We use the XMLStreamReader class to read XML files.

The advantage to the SAX approach is its high speed and low memory requirements. We'll see the disadvantages once we compare this approach to the DOM object-oriented approach later on. In the next lesson, Writing XML Files via the SAX Approach in Java, we'll create a XML file using the SAX approach.


 

Previous article
Solved tasks for Java Files Lessons 1-5
All articles in this section
Files and I/O in Java
Skip article
(not recommended)
Writing XML Files via the SAX Approach in Java
Article has been written for you by David Capka Hartinger
Avatar
User rating:
No one has rated this quite yet, be the first one!
The author is a programmer, who likes web technologies and being the lead/chief article writer at ICT.social. He shares his knowledge with the community and is always looking to improve. He believes that anyone can do what they set their mind to.
Unicorn university David learned IT at the Unicorn University - a prestigious college providing education on IT and economics.
Activities