XML the Markup Language: Part 2

"If you wish to converse with me, define your terms" ~ Voltaire, 1694-1778

In the previous XML article, we covered the basics of markup languages and the general purpose of XML. Now, let's take out a magnifying glass and take a closer look at it.

More often than not, when on the computer, you don't really get a sense of what is actually going on behind the scenes. You select and click and things happen. Ever been curious about as what goes on behind those clicks and selections when dealing with documents? A closer look at the XML format might satisfy some of that curiosity.

So, the question is: How does it work exactly?

XML Up Close and Personal: the building blocks

Unlike the HTML tags, XML tags aren’t predefined in categorical headings to specify how an XML document is rendered or displayed. The focus of the XML tag description is purely on what the data information is. Thus, you can customize your tags and define specific information such as authors, dates, names etc.

There are basic standard components of XML markup. Of course, there are more complex ones, but are outside the scope of this article. Here are the major components:

Elements - Elements contain a start-tag, content and an end-tag. Using the XML vocabulary, the tags you use specifically define the textual information enclosed in the tags. Thus, an entire element may take the form: <name> Jane Smith <name/>

Root Elements - Root Elements are those that, within the XML markup structure, contain all other elements. You could think of a root element as your one major heading or categorization that describes the document. So you could have a root element such as: <report>

Nested Elements - Nested elements are elements within elements that can contain data within its tags:

<body>Jane Smith <em>rocks</em> because she sells many units.</body>

As you can see, the tags <em> and </em> provide additional directions on how the word “rocks” is supposed to be displayed with a nested element.

Attributes - Attributes provide extra information about elements or they can define the desired behavior of data for the software. They occur in pairs within the start tag and consist of a name and value. In the example below, the word “job” is the name and the word “conversion software vendor” is the value:

<name job = “conversion software vendor”>Jane Smith</name>

Tags - in Action!

Of course, putting all of it together, you’ll get something that’ll look like this (with a little more detail included for visual conception):  

<report>
    <sales representative>
       <name job=“conversion software vendor”>
Jane Smith</name>
       <salary>35000</salary>
       <raise>1000</raise>
       <body> Jane Smith <em>rocks</em> because she sells many units.</body>
   </sales representative>
</report>

You can see how versatile XML can be when you compare it with the previous HTML example in our first XML article. The nature of the data is prioritized; the layout de-emphasized.

XML documents can contain more information through their hierarchical structure, making XML documents lengthy. In addition, they can be complex because the structure really depends on what information you want your document to contain.

Yet, as you can see, while XML is more flexible with its tags than HTML, it’s also more strict in its structure. For instance, tags for XML are case-sensitive and XML elements must also be nested properly. When dealing with more detailed information, XML documents need to be well-formed with correct XML syntax in order to be appropriately rendered.

Stylesheet Language and XSLs

The advantage of customized tags is that it allows the raw data to be re-used and re-purposed. However, personally defined tags can pose a problem for universal rendering. XML documents are mainly concerned with containing raw data, and so lack any syntax for its formatting.

XML documents need more flexibility than a pre-defined tag formatting DTD can offer. An XML schema is first used to describe the valid format of a set of XML data (valid XML documents are ones that are well-formed and follow the rules of a DTD). Similar to the scheme of addresses requiring a name, street address, zip code and city, XML schemata (plural) defines the content that is allowed within the XML document.

The implementation of an XML stylesheet (XSL) is then used to provide the layout for an XML document (much like a CSS is used for HTML rendering within browsers). A stylesheet is basically a series of instructions that format and display the information.

XSLs are well formed and are declared within the XML document itself. They basically transform XML vocabulary into readable formats and are customizable files too, allowing user defined layouts, styles, pages, etc.

Thus, a possible stylesheet for the above XML data could be:

XML not rendered

And thus, the layout of the text above would look like this when it appears in your browser:

XML rendered

A Browser at the Finish Line

XSL uses XSLT (XSL Transformation), an XML-based language recommended by W3C for transforming XML elements into different outputs, such as HTML or RTF, which are recognized by browsers. The process is facilitated as XSLT uses the W3C’s XPath language, a language used to navigate through XML documents, identifying the XML elements to be processed into HTML elements. An XML parser is needed to view and read an XML document.

When you create an instance of the parser within the source code needed to transform your XML documents, it loads the XML document into the parser’s memory. Another instance is then created that loads the XSL document.

Important to note is that XML parsing can be done on either the server or the browser (“client side”, where the browser does the work). When the parsing is done client side, a last line of coding transforms the document using the XSL and uses it to populate an HTML document for viewing. If users have browsers that don’t support XML parsers, the work has to be done on the server which will transfer the XML document using the XSL and send it to the browser.

Still More to Come?

So now you know the basics of XML rendering without all the technical jargon you won’t have to look up. But this still doesn’t answer the question of its role in your digital life. Just from an introduction to XML, you get a sense of its overall operational use.

The last and third installment of the XML series is coming up! In it, you’ll get a sense of XML’s broader uses. ‘Till next time!