Wednesday, December 28, 2005

Clash of the Titans -- over CONTENT in 2006

Those of you who read my Information Insider column on EContent Magazine ( know my views about the importance of content and how well XML supports the creation, reuse, and repurposing of content. Hand-in-hand with content and XML is the value I place on open-source software, software --like the XML family of standards themselves-- have been peer reviewed.

If I am not mistaken, 2006 will be the year of the "Clash of the Titans," not only Microsoft and Google but somewhat lesser but important players too, including Sun Microsystems, Corel, Adobe, and a host of others who have decided to strike at Microsoft's jugular: Microsoft Office, the product from which Microsoft derives about 1/4 of its revenues. In fact, EContent Magazine will be publishing my column with this same title in early 2006.

In mid-year, I will begin writing a software review of Microsoft Office 12 (assuming a reasonably stable product is available by then). I will also be including a sidebar comparing Office 12 with two open-source products: OpenOffice and StarOffice. Since these latter two products are already available, I will begin writing my comments about these products in the first quarter of 2006, as a preparation to compare them with Office 12.

Others have in a sense beaten me to the punch: The Commonwealth of Massachusetts, the European Union, and many other foreign countries. By that I mean that they have already announced either deep skepticism about the "openness" of Office 12 and/or have pledged their support for OpenOffice or StarOffice.

Let the games begin!

Monday, August 22, 2005

OK... so what's a document?

What is a Document?
If we’re going to discuss document (or content) management systems of various sorts, we have to frame up the discussion with what may seem like a trivial question: What is a document? Most of us have moved beyond whether or not a document is only paper, lambskin, or papyrus, but when you expand the definition to include electronic content, the answer to what constitutes a document gets fuzzy. So here’s how I define documents and document content.

First, a document is any piece of meaningful content –often with some of the look-and-feel of physical renditions like paper books--, whether public or private. Books are merely physical instances of electronic documents. “Meaningful content” extends beyond mere words, and includes video and audio. If you had to take a chance that every IPod download would be random static, you probably wouldn’t bother buying the device or repeating the download. And these days, multimedia content usually comes with its own metadata – information about the content, such as its author, when it was created, and so on.

But wait, you ask: Catalogs are books, and catalog content these days is usually found in a database. So are databases documents too? No, although I assert that the reverse is true in a way: “documents are data too,” a nice little bumper sticker thought.

I see document content as existing in a spectrum, ranging from databases (the most highly structured form of content) to highly structured content (increasingly expressed in XML) to forms to documents that are more-or-less subtly structured. Note I don’t use the term “unstructured documents” as many others do. To me, “unstructured documents” are an oxymoron: Either they have structure of some sort (and therefore have use and meaning) or they do not. “Unstructured” means “random” and randomness is not useful to purveyors of content. Even a well laid out advertisement has structure. Although the advertisement’s electronic format may not make it easy to discern its structure and pitch, human eyeballs readily do (or the sponsor won’t re-run the ad).

Why the brouhaha about “subtle” versus “un”-structured? Because if you don’t understand the difference you give up any chance of using whatever mechanisms to structure a document –such as styles—that are available to you. Or you give up and say “hey, it’s unstructured, no wonder I can’t repurpose or transform the document to be or do something else.” And that would be a pity. We invest a lot in our documents and deserve to get the most out of them that we can.

So to summarize, documents are book-like containers of information, regardless of their format. Databases aren’t books (but can be used to produce books). The organization of documents ranges from subtly (or loosely) structured (like restaurant menus or a child’s book) to highly structured (like a form). Moreover, the binary format of documents, most commonly that of a word processor or page layout system, does not yield clues easily to its structure but structure it has or it wouldn’t be useful. Word processors and page layout programs’ structure is pretty darned loose, however, and that is one reason why the publishing world (that would be all of us) is ever-so-slowly moving to more explicit structure, XML.

Interestingly, Microsoft may actually be putting its cash horde where its mouth (support for XML) is. Press teasers coming out of Redmond suggest that Office 2006 will not only replace its binary format “RTF” with “XML,” but may in fact begin supporting the content (not just the look-and-feel) with arbitrary XML.

So curmudgeon point number 1: No document is unstructured. All documents (and document formats) have some structure, ranging from subtle or loose to very rigid. But please stop using the “unstructured” adjective, OK?

Friday, August 19, 2005

Why a Content Curmudgeon? What's Content Anyway?

In this blog I'll be providing critical commentary about issues of interest to anyone who produces content of any any sort, but primarily the written kind. And if you create content of any sort, sooner or later you become interested in managing it... filing it, editing it, sharing it, maybe deleting or archiving it, and certainly being able to find it.
All these "lifecycle" issues surrounding content apply especially to words... I feel so strongly about words that my vanity plate simply has one word on it: WORD. Moreover, my website is

And why the curmudgeon part? Because there is a lot of BS surrounding the way we create, value, and manage our content --especially our words. I deal with document management and search vendors frequently --and we all use a word processor and alternately curse and sigh at our word processors, especially MS "Word".

This blog is a critical and frank look at words and the tools to manage and create them. Although I also write a bi-monthly column, "Info Insider" for eContent Magazine, this blog is where I'll put the franker stuff.

So if you're interested in Content (or even Knowledge) Management, from A-Z, from creation through deletion, and all the tools in between-- stay tuned.