Sunday, September 07, 2008

XML 10th Anniversary

In an upcoming Information Insider column, I invite XML to an intimate party where we can celebrate its 10th anniversary. I also invited Alexander Falk, CEO of Altova, and an XML aficionado if ever there were one (here's his blog Here are some of the questions I asked Alexander as background for the column. I hope you'll find this interview interesting. After all, celebrating a "double digits" anniversary doesn't happen often. Alexander's responses to my questions are shown in blue text.

Question: The XML Recommendation is now 10 years old. XML led to hundreds of additional specifications, yet its adoption rate in publishing and word processing software (and XHTML in web pages) seems slow. What is your assessment of XML adoption, and what do you see for the next 10 years?

Ten years is a mighty long time to make forecasts for – my crystal ball is only rated for 2-3 years max…
What we’ve seen with XML over the last 10 years is a huge adoption in all areas that are data-centric, rather than content-centric. XML has become the lingua franca of data exchange and interchange and has made a whole class of enterprise applications possible, because you can now move data fairly freely between disparate systems.

The benefits of XML in a pure content-creation scenario – be it publishing, word processing, Web design – are only realizable if you have a large amount of content and use it with some content management system. That is not something that most small- or medium-size businesses would do, and that has, I believe, let to a somewhat slower rate of adoption in those areas.

Question: OOXML is essentially “ rich text format” expressed as XML rather than leveraging existing XML standards such as MathML. MS Office is expensive; OpenOffice (based on ODF that leverages other XML standards) is free. MS Office maintains office share. What gives?

This is an interesting conundrum. From a purely academic perspective I would agree with your statement that leveraging existing XML standards is desirable. But the reality is that 95% of the world’s office documents are MS Office documents today, and people want to continue working with those documents – and want to reuse the content that exists in those documents in other applications, and by opening the file format up and having them be XML-based rather than binary format, such reuse is now possible. I can tell you from our experience that we have received countless requests from our customers that they want to be able to work with OOXML documents, and not a single request for ODF. Also, when I look at e-mail that I receive from others, I have yet to encounter a single e-mail that came with an ODF attachment. I don’t necessarily like Microsoft’s near-monopoly on the office market, but to deny its existence and standardize on a file format like ODF that nobody actually uses in the real world doesn’t make much sense either.

Here we disagree a bit; my question to Alexander followed by his response.

Question: OOXML (which today looks like it will become an ISO Standard) is still essentially just an XML expression of Microsoft’s internal word processing format, “Rich text format.” What value does such a use of XML provide to potential applications?

Actually, I need to disagree on that one. OOXML is not just RTF in disguise. OOXML includes separate and distinct markup languages for expressing word processing documents, spreadsheets, and presentations. The wordprocessingML is somewhat related to RTF because it is based on a similar concept (runs of characters with styles applied to them), but that is where the similarity ends. We found that it is very easy to use XSLT (or XQuery) to extract content from either wordprocessingML or spreadsheetML documents in OOXML that were created in Office 2007 (or other OOXML compatible apps), and likewise it is very easy for us to generate OOXML content in both of those formats from our applications. For example, our data mapping tool MapForce makes it very easy for people to map data from a variety of data sources (including EDI, databases, Web services, XML, etc.) into spreadsheetML documents that they can then open with Excel 2007. Likewise, our stylesheet design tool StyleVision, makes it very easy for people to produce stylesheets that render reports from XML or database data not just in HTML or PDF, but now also in wordprocessingML for use in Word 2007.

Still, what is new in OOXML that didn't exist in earlier editions as Rich Text Format? And if 2007 simply uses XML as a replacement for RTF, I don't see the added value. Sure, you can search for table captions (if you want), but the richness of ODF is not there and won't be (can't be, due to compatibility with earlier versions).

Question: HTML 5 seems like a step backward from XML and XHTML. Is this a sign of eroding support for XML? One reason for HTML 5 (to quote the W3C) is “new elements are introduced based on research into prevailing authoring practices.” Wasn’t XHTML sufficient, or maybe too difficult for “ prevailing authoring practices”?

I’m afraid that the reality is that a lot of HTML is still created by hand: people creating some HTML in Web-tools like Dreamweaver or other HTML editors and then going into the HTML and messing around in it in text editing mode. Since those tools have been very slow to enforce XHTML compliance, people continue to generate sloppy HTML pages, and so there is unfortunately a real need out there to at least standardize on what authoring practices exist in the real world.

The much better approach is, of course, to generate XHTML by means of an XSLT stylesheet from XML source pages, which is what we do, e.g., for the Web site.

Question: XQuery is a standard co-developed by the developers of SQL. What’s your prediction for widespread adoption and use of XQuery?

I initially thought that XQuery had a lot of promise, too, which is why Altova was very quick to provide an implementation of XQuery in our products, including an XQuery editor, debugger, and the ability in our mapping tool to produce XQuery code. However, we’ve found that the adoption of XQuery in the real world is happening much slower than we and many others had anticipated. I think that one of the issues is that there isn’t yet a clear and consistent XQuery implementation level and API across all database systems that people can rely on. The beautiful thing about SQL is that – for the most part – you can throw the same SQL query against an Oracle, IBM DB2, SQL Server, or even MySQL database, and you will get back the same result. The same is not true for XQuery yet, and until we reach that level of wide-spread adoption in the database servers, it has no chance to be as widely adopted by database users and application developers.

The reality is that we see a lot more interest in XSLT 2.0 from our customers than XQuery.

Sad but true Alexander. I had high hopes for XQuery but I don't hear much about it these days.

Question: Will XBRL be one of the “next big things” leading to a major use of XML by investors via a new set of prosumer applications? Enterprise processes and financial systems? What role will XQuery provide in these contexts?

I do indeed see XBRL as being the next big thing. The fact that both the Europeans and the SEC are mandating XBRL for financial reports from publicly listed companies will be a huge driver of XBRL adoption on a global scale. I am convinced that XBRL will be essential in financial systems and will find its way into enterprise applications fairly swiftly. When it comes to the use of XBRL by investors as prosumer applications, I’m a little bit more skeptical. It is certainly clear that investment professionals will use XBRL to better compare data between different companies in a certain market and to derive some key financial figures much easier than before, because the financial reports don’t have to be re-keyed into their systems. But I don’t think that this effect will transcend the investment professionals and become easily available for consumers anytime soon. As to what role XQuery will play: it might play some role, but I’m thinking of XBRL more as a standardized data transport mechanism and am expecting investment firms to map the XBRL into their internal decision-making and analysis applications and do the querying there.

On this we agree. This might be XML's first great opportunity to transform significant amounts of content -- and the processes to generate that content -- outside the tech doc arena.

Question: I know some subscribers to online financial services are wondering if they will be able to supplement (or even skip) certain of these services by analyzing sets of XBRL files themselves. What are the practical limitations to such analysis? Is there an inherent limitation to max numbers of XBRL files that can be XQueried at once?

There aren’t really any limitations that I’m aware of. The problem is more one of: how will you use the data? An investor who is very accounting-savvy can probably easily use XBRL to extract some key financial indicators for a company and compare several possible investment candidates in an industry group. But most investors I know rather want the key financial indicators automatically calculated by somebody else rather than directly work with the raw XBRL data. So I am skeptical that individual investors will be able to skip their subscriptions. Augmenting them is, however, a possibility and I indeed see the ability for some people to get a more in-depth look at some numbers than what they can currently get from Bloomberg or similar services.

Saturday, February 16, 2008

Update on Office 2007 Compatibility etc.

Julie ("funnybroad") has updated her slide show about her Office 2007 compatibility findings. Here is an excerpt from what she said:

I've replaced my original Office 2007 Compatibility Mode Confusion paper on with an updated version. I had to delete and re-create the existing one, so the link to it from your blog is now broken (click here for Julie's updated info)....everything has been re-tested with Service Pack 1, and sadly, compatibility still sucks. So go to the new link, not the older one.


While I'm on the subject of Office 2007, when I tested and reviewed the product I was happy to see a weird longstanding behavior removed: You print a document, then exit and are asked if you want to save changes. Most people simply "yes," fearing they forgot whatever change they'd made and don't want to lose it. Others say "no" thinking they made a change inadvertantly and don't want it to stick. Well, I was happy to see that dumb "feature" removed, but recently --several automated patch upgrades later, I guess-- I see the "feature" is removed. So we've got compatibility with pre-2007 suites, but this is one compatibility feature they could have dropped and it would have made the product better.

Monday, January 21, 2008

How Green Are Your Documents

Over the past 6 months, I've seen some of my hunches about growing awareness of environmental issues and concern about fossil fuel supplies (and prices) confirmed. Although oil never did close at $100/barrel, prices are sky high by anyone's estimate. In the autumn of 2007 I tried a different theme in my Information Insider column – one that I believe has never been done. I laid the groundwork for this series with the EContent 100 annual issue, in a column titled “ Content 2.0 Converges.” I titled the follow-on column in this series “ How Green Are Your Documents?” (the editor since changed that to “ It Ain't Easy Being Green” -- a fine alternative). I sent out queries to a variety of vendors for any thoughts they had about their products and the green theme, and waited. And waited. And began to think that this was the craziest idea I'd ever had and wondered how I'd meet the deadline with a different (unplanned) column in case this didn't pan out. Then the vendors began to respond, all except Google, but I blame that on the difficulty of finding the right contact there rather than Google's lack of interest – since Google is indeed showing itself to be very green indeed.

Who did respond? Adobe, MarkLogic, and Olive Software – the latter a vendor I'd never heard of but found (yes) with a Google search. And it was an avalanche of interest.

Let's start with Adobe. One obvious Adobe product is Acrobat, which has become a default electronic document standard, bulked-up with collaborative features in version 8 with Acrobat Connect, formerly Macromedia’s Breeze web conferencing but now integrated with Acrobat. I get the idea that web conferencing can cut down travel and thus save travel and carbon costs, but I was looking for more, and Adobe provided it. First, they've done as Google and now Microsoft have also done: begun adding online documents to their product set. In this case, Adobe acquired Buzzword, a web-based text editor. Interesting, but not the green lead I was looking for. Then it got interesting.

Adobe's new AIR (Adobe Integrated Runtime) lets web applications run offline – key, IMHO, to assuring the acceptance of online, collaborative documents and reducing the use of paper (with all the energy savings that implies). AIR is a cross-OS SDK, a mashup of Flash, HTML, Ajax, etc. AIR can target applications to the desktop and get the rich abilities expected in local clients plus the web. The key here is that you get persistent presence on the desktop, offline/online with re-synching of web content when you go back online. Traditional media has been moving to the web for some time; now the web is also moving to the desktop, with traditional functions on a browser or paper migrating to the desktop. All financial documents for example could give you reports, etc. and also perform applications that would require paper, such as loan applications required swapping excel spreadsheets, etc.

Developers (not end users) are beginning to develop AIR-basedonline-offline catalogs --that bane of the mail box. You download the application that would include the catalog, navigate through them, sort and search, flag them within the application and get notifications when available (reminders when back in stock). As you walk through the catalog, you could add electronic notes. You could share them with friends etc., send them an email with the relevant information. Collaborate on different desktops. Adobe says that Linux support for AIR is coming.

A 10 MB PDF catalog could be the whole size of the AIR application, and with progressive images or assets on demand, could make the “catalog application” smaller than the PDF.

Hot AIR? Yes, but in a good sense – reducing global warming in its own way.