XML the Markup Language: Part 1

"Language is by its very nature a communal thing; that is, it expresses never the exact thing but a compromise - that which is common to you, me, and everybody." ~Thomas Earnest Hulme, Speculations, 1923

You may have heard of XML (Extensible Markup Language) and you may have used XML-based technologies without even knowing it. You may have wondered just exactly what it is and what it has to do with you.

Well, as much of our work process is now almost entirely paperless, the development of processes involving information exchange are sky-rocketing. XML is part of this development process as its features are being exploited for versatility, efficiency and ubiquity. This development also extends to digital portable devices that you may know, that use XML technologies to send and receive information. Also, what you read while on the Internet may be XML based. Software applications are invested in developing XML support for an even faster and more dynamic system of data sharing.

So whether you know it or not, and like it or not, XML is around you. The following article series is an introduction to it - what it is and how it fits into your digital world of data.

Backgrounder on Markup Languages

Where to start? Well, to begin, the nature of the digitally composed page is unique. Every document contains data that exists in the form of words, objects, pictures etc. In addition, it has the information structure in which to contain that data.

"The information structure?", you ask. This is provided by intermingling tags within your document's textual information. These tags surround and define your textual information, describing it so that it can be properly and structurally rendered when parsed or read by a document reader. A markup language, in general, is the semantic use of those tags.

You've probably not seen, but have experienced an interaction with markup languages. For instance, SGML (Standard Generalized Markup Language). Heard of it before? SGML is meta data. It specifies how to use markup languages. In order for markup languages to be utilized by all users, there must be a common standard to implementing the markup language. HTML (Hypertext Markup Language) is a SGML application (meaning that it uses SGML standards) and is the markup language that's behind the appearance of web pages on the Internet.

Documents in markup generally consist of:

A Declaration - identifies the nature of the markup language being used. For instance, there are SGML, HTML, XML etc. declarations. The processing of the document requires a declaration in order to determine how the document should be processed according to the nature of the markup being used.

DTD (Document Type Definition) - a specification that accompanies a document and defines the type of document to be rendered (ie. a report or letter or memo etc.). It generally includes information about the restrictions on the structure and content of the document type. Thus, it ensures that the document is properly formatted. Without a DTD, applications might be unable to process a document as it should be.

Content and Markup - the main bulk of data that composes the document.

Putting all these characteristics together, you’ll get something that looks like this (in HTML markup):

<p><strong>HTML</strong> markup <em>looks</em> great in a browser.</p>

Everything is tagged and defined. When this is presented in a browser, you see this:

HTML markup looks great in a browser.

XML documents are similar to the above one in HTML, but differ in the way the textual information is defined and categorized. Now it’s time to take a look at XML itself.

So What Is XML? A Brief Overview of its Function

Approved by the World Wide Web Consortium (W3C) in 1998, XML (Extensible Markup Language) was designed to describe different kinds of data while also aiding in the sharing of that data across different systems and, in particular, those connected to the Internet.

XML is a data description language based on SGML principles. However, XML only structures, stores and sends information. It doesn’t really do anything other than that. Thus, its primary role is to represent data apart from visual markup, a different purpose from HTML altogether.

Readily usable on the Internet and for general-purposes, it was intended to be legible, concise, and easy to author so that anyone interested in publishing can construct an XML document.

Platform Independence

Markup Languages are an easy way of relaying information because it maps out and organizes the raw data in your browser. Yet we talk about how XML is a platform independent format and form of data exchange. How so?

XML is more elastic than HTML in the way it structures its data. As a self-descriptive language, its markup tags are user defined and thus, the file carries its meaning in its tags.

In addition, XML documents are plain text files. They don’t need proprietary software applications to render them, as binary files (which contain more than just textual data) do. This allows XML to be used by different platforms, applications and systems.

Thus, markup languages can serve as independent data containers from which documents can be rendered and then generated into a variety of other different formats.

Stay tuned...

XML documents are becoming the data exchange format of choice as the development of XML-based web services grows. That being said, what are the mechanics behind XML that make it so? And what about these user defined XML tags? What do they look like exactly and how do they work to ensure flexibility and compatibility?

Stay tuned into Investintech.com because in our next article we’ll cover more on the nature and functional uses of XML tags.