Sunday, September 07, 2008

XML 10th Anniversary

In an upcoming Information Insider column, I invite XML to an intimate party where we can celebrate its 10th anniversary. I also invited Alexander Falk, CEO of Altova, and an XML aficionado if ever there were one (here's his blog http://www.xmlaficionado.com/ Here are some of the questions I asked Alexander as background for the column. I hope you'll find this interview interesting. After all, celebrating a "double digits" anniversary doesn't happen often. Alexander's responses to my questions are shown in blue text.

Question: The XML Recommendation is now 10 years old. XML led to hundreds of additional specifications, yet its adoption rate in publishing and word processing software (and XHTML in web pages) seems slow. What is your assessment of XML adoption, and what do you see for the next 10 years?

Ten years is a mighty long time to make forecasts for – my crystal ball is only rated for 2-3 years max…
What we’ve seen with XML over the last 10 years is a huge adoption in all areas that are data-centric, rather than content-centric. XML has become the lingua franca of data exchange and interchange and has made a whole class of enterprise applications possible, because you can now move data fairly freely between disparate systems.

The benefits of XML in a pure content-creation scenario – be it publishing, word processing, Web design – are only realizable if you have a large amount of content and use it with some content management system. That is not something that most small- or medium-size businesses would do, and that has, I believe, let to a somewhat slower rate of adoption in those areas.

Question: OOXML is essentially “ rich text format” expressed as XML rather than leveraging existing XML standards such as MathML. MS Office is expensive; OpenOffice (based on ODF that leverages other XML standards) is free. MS Office maintains office share. What gives?

This is an interesting conundrum. From a purely academic perspective I would agree with your statement that leveraging existing XML standards is desirable. But the reality is that 95% of the world’s office documents are MS Office documents today, and people want to continue working with those documents – and want to reuse the content that exists in those documents in other applications, and by opening the file format up and having them be XML-based rather than binary format, such reuse is now possible. I can tell you from our experience that we have received countless requests from our customers that they want to be able to work with OOXML documents, and not a single request for ODF. Also, when I look at e-mail that I receive from others, I have yet to encounter a single e-mail that came with an ODF attachment. I don’t necessarily like Microsoft’s near-monopoly on the office market, but to deny its existence and standardize on a file format like ODF that nobody actually uses in the real world doesn’t make much sense either.

Here we disagree a bit; my question to Alexander followed by his response.

Question: OOXML (which today looks like it will become an ISO Standard) is still essentially just an XML expression of Microsoft’s internal word processing format, “Rich text format.” What value does such a use of XML provide to potential applications?

Actually, I need to disagree on that one. OOXML is not just RTF in disguise. OOXML includes separate and distinct markup languages for expressing word processing documents, spreadsheets, and presentations. The wordprocessingML is somewhat related to RTF because it is based on a similar concept (runs of characters with styles applied to them), but that is where the similarity ends. We found that it is very easy to use XSLT (or XQuery) to extract content from either wordprocessingML or spreadsheetML documents in OOXML that were created in Office 2007 (or other OOXML compatible apps), and likewise it is very easy for us to generate OOXML content in both of those formats from our applications. For example, our data mapping tool MapForce makes it very easy for people to map data from a variety of data sources (including EDI, databases, Web services, XML, etc.) into spreadsheetML documents that they can then open with Excel 2007. Likewise, our stylesheet design tool StyleVision, makes it very easy for people to produce stylesheets that render reports from XML or database data not just in HTML or PDF, but now also in wordprocessingML for use in Word 2007.

Still, what is new in OOXML that didn't exist in earlier editions as Rich Text Format? And if 2007 simply uses XML as a replacement for RTF, I don't see the added value. Sure, you can search for table captions (if you want), but the richness of ODF is not there and won't be (can't be, due to compatibility with earlier versions).

Question: HTML 5 seems like a step backward from XML and XHTML. Is this a sign of eroding support for XML? One reason for HTML 5 (to quote the W3C) is “new elements are introduced based on research into prevailing authoring practices.” Wasn’t XHTML sufficient, or maybe too difficult for “ prevailing authoring practices”?

I’m afraid that the reality is that a lot of HTML is still created by hand: people creating some HTML in Web-tools like Dreamweaver or other HTML editors and then going into the HTML and messing around in it in text editing mode. Since those tools have been very slow to enforce XHTML compliance, people continue to generate sloppy HTML pages, and so there is unfortunately a real need out there to at least standardize on what authoring practices exist in the real world.

The much better approach is, of course, to generate XHTML by means of an XSLT stylesheet from XML source pages, which is what we do, e.g., for the http://www.altova.com/ Web site.

Question: XQuery is a standard co-developed by the developers of SQL. What’s your prediction for widespread adoption and use of XQuery?

I initially thought that XQuery had a lot of promise, too, which is why Altova was very quick to provide an implementation of XQuery in our products, including an XQuery editor, debugger, and the ability in our mapping tool to produce XQuery code. However, we’ve found that the adoption of XQuery in the real world is happening much slower than we and many others had anticipated. I think that one of the issues is that there isn’t yet a clear and consistent XQuery implementation level and API across all database systems that people can rely on. The beautiful thing about SQL is that – for the most part – you can throw the same SQL query against an Oracle, IBM DB2, SQL Server, or even MySQL database, and you will get back the same result. The same is not true for XQuery yet, and until we reach that level of wide-spread adoption in the database servers, it has no chance to be as widely adopted by database users and application developers.

The reality is that we see a lot more interest in XSLT 2.0 from our customers than XQuery.

Sad but true Alexander. I had high hopes for XQuery but I don't hear much about it these days.

Question: Will XBRL be one of the “next big things” leading to a major use of XML by investors via a new set of prosumer applications? Enterprise processes and financial systems? What role will XQuery provide in these contexts?

I do indeed see XBRL as being the next big thing. The fact that both the Europeans and the SEC are mandating XBRL for financial reports from publicly listed companies will be a huge driver of XBRL adoption on a global scale. I am convinced that XBRL will be essential in financial systems and will find its way into enterprise applications fairly swiftly. When it comes to the use of XBRL by investors as prosumer applications, I’m a little bit more skeptical. It is certainly clear that investment professionals will use XBRL to better compare data between different companies in a certain market and to derive some key financial figures much easier than before, because the financial reports don’t have to be re-keyed into their systems. But I don’t think that this effect will transcend the investment professionals and become easily available for consumers anytime soon. As to what role XQuery will play: it might play some role, but I’m thinking of XBRL more as a standardized data transport mechanism and am expecting investment firms to map the XBRL into their internal decision-making and analysis applications and do the querying there.

On this we agree. This might be XML's first great opportunity to transform significant amounts of content -- and the processes to generate that content -- outside the tech doc arena.

Question: I know some subscribers to online financial services are wondering if they will be able to supplement (or even skip) certain of these services by analyzing sets of XBRL files themselves. What are the practical limitations to such analysis? Is there an inherent limitation to max numbers of XBRL files that can be XQueried at once?

There aren’t really any limitations that I’m aware of. The problem is more one of: how will you use the data? An investor who is very accounting-savvy can probably easily use XBRL to extract some key financial indicators for a company and compare several possible investment candidates in an industry group. But most investors I know rather want the key financial indicators automatically calculated by somebody else rather than directly work with the raw XBRL data. So I am skeptical that individual investors will be able to skip their subscriptions. Augmenting them is, however, a possibility and I indeed see the ability for some people to get a more in-depth look at some numbers than what they can currently get from Bloomberg or similar services.


2 comments:

Anonymous said...

The spec known as "HTML5" is actually both HTML and XHTML. The HTML version of the spec which replaces HTML4 is called HTML5, and the XHTML version of the spec which replaces XHTML1 is called XHTML5.

Content Curmudgeon and the Green Hornet. said...

Thanks, I realized that but should have made that detail more clear.

My question is why bother upgrading HTML 4 at all? Especially since XHTML was to replace HTML 4.