Saturday, February 24, 2007

Exploring Outlook 2007

So far, Outlook has been the easiest of the applications to learn how to use; it appears to be very much like earlier versions of Outlook, with some convenience security features like adding a sender to your “safe list.” This lets you download graphics automatically from email that you’ve put on that list. I easily exchanged meeting requests and acceptances between Outlook 2007 and Outlook 2003 systems.

There were some quirks and disappointments however. First, my Palm Pilot no longer synchs with Outlook (Palm is said to be working on a Hot Synch upgrade for Office 2007). My Acrobat 8 plug-ins to convert email or email folders to PDF are no longer available. Adobe is working on that.

Although the new office.x file formats are compressed and thus about half the size of the earlier versions like .doc, Outlook's "PST" mail file remains its old bloated self. Not only that, but MS has added a feature to make "search" of email at least marginally useful (it never has been useful IMHO before), but that requires indexing the whole mail box. OK; I'm good with that. But how big is this index file and can you perform periodic maintenance on it to keep it trim (e.g., remove the big gaps in the index from deleting email). I see no information about index maintenance anywhere -- and, for that matter, as of a day later (after admittedly putting the laptop in snooze mode) Outlook hadn't yet finished indexing my email. (My PST file is about 100 MB). If/when it does finish indexing , I'll try to figure out where they hide the index to see how big it is and whether indexing makes email usefully searchable. Not that I now (or ever) have had a choice. Click "search" and you can do nothing until you index the email. The older search by sender/date etc. isn't apparently available anymore.

If you highlight a PDF email attachment, Outlook says it cannot preview the file because there is no previewer installed for it. There is a link to browse for previewers on the Microsoft site, but there wasn’t one for Acrobat. Earlier I’d installed (via a separate download) the PDF export tool for Office, but I have no idea what plug-in is missing here.

So far I am disappointed not to find some truly useful new features or bug fixes:

1) It would be nice if you delete junk mail entries that Outlook would put them onto its junk mail list automatically.

2) I still can't assign tasks to another Outlook user. This never worked for me in Outlook 2003. When I tried assigning a task from an Outlook 2003 user (myself on another system) to myself on the Outlook 2007 system, the assigned task arrived only as email in the assignee’s inbox. This is the same and not-very-useful way it was handled in Office 2003.

Installing Office 2007

In a nutshell: Not as smooth as I’d hoped, with lots of surprises along the way and after installation.

My environment: An HP/Compaq laptop running XP service pack 2, 1.4 Ghz with xxx memory and sufficient disk for the installation. Since Microsoft does not make IE version yy available on Windows 2000, I decided some time ago to standardize on Mozilla Firefox on both the older machine and my laptop where I performed the Office 2007 testing and review.

I specifically requested a custom, limited installation (not an upgrade) so I could keep my Office 2003 applications to develop test documents to test with Office 07, and vice-versa. I selected the default office shared features and office tools (Microsoft Office Graph and Microsoft Document Imaging) although I still am not sure what they are. I was careful to check “do not remove older versions,” although (see below) although they were not removed, they were hidden. Moreover, when

The installation required 1.3 GB and took about 25 minutes to install on a 1.4 Mhz Windows XP laptop. After installation, the system suggested I go to Office Online for updates. After about 5 minutes I thought I was downloading updates when I received a message: I must use Internet Explorer at least version 5.0 (I switched to Mozilla Firefox several months ago). I downloaded “OGAPlugininstall” and installed it from my local machine. NOTE: This automatic download to update Office does not include the ability to save files in Acrobat PDF format. For that you must search, download, and install the “SaveasPDFandXPS” file. I saw no sign of a compatibility plug-in to open and save documents in the OpenOffice document formats.

Before you can install these, however, you must “activate” the Office applications. This took three tries, but when I succeeded I received a “Welcome” message that recommended I also download a file that allows periodic updates to track and solve crashes and other system failures.” I did that.

Microsoft Office lists as a feature about being able to "save to PDF or XPS"... but that facility doesn't come with the Office software package. You have to Google to find it on the MS site, download it, install it separately...

From start to finish, it took me about ¾ hour to install Office, and I got more than I wanted (more about that later).

Office Professional 2007 Installation Notes:

  • Initial total disk space: 20,651,000,000 bytes; after installation 19,346,000,000 bytes – 1.3 Gigabytes taken.
  • I selected the option to install only Word, Excel, and PowerPoint. What I ended up with was those plus Outlook, MS Access, and MS Publisher.

Installation Lessons Learned:

  • Be sure you have at least 1.5 spare gigabytes of storage before you begin. Updating your old copy of Office (replacing the old applications) will reduce the total net storage you will use.
  • Be very careful to select only the components of Office that you want due to compatibility with other systems or devices (e.g., Palm Pilots). I was careful but I didn't get what I expected, and I don't dare "go home again" -- uninstall Outlook 2007 and re-install Outlook 2007, since I expect the format of the .PST file is different (thus the problem with my Palm Pilot synching).
  • Plan about an hour to install Office, and then plan several hours to learn how to use each applications.
  • I seemed to “lose” Word 2003; couldn’t even find it after a search through the whole hard disc for “winword.exe.” Rather than risk uninstalling the unwanted applications and attempting to install their Office 2003 counterparts, I left things as they were. However, when I click on a shortcut for Word 2003, I periodically receive a message about “installing…” and Word 2003 starts up. This isn’t normal behavior. (I didn’t receive the same “installing” message with Excel, at least not yet.)
  • Tried again later to open a “.doc” file with a double-click – assuming I’d managed to retrieve WinWord for Office 2003. Sorry, I received a very long “configuring” message and Word 2007 took control again. Then later when I tried opening the “.doc” file, Word 2003 took over. Later a double-click brought up Word 20003. This was very confusing to say the least.
  • Likewise with Excel 2003 (“office 11”) and old files with the .xls extension. Clicking on them brought up Excel 2007, even after trying again to specifically associate files with extension “.xls” with Excel from Office 2003. Excel 2007 would open the spreadsheet, in “compatibility” mode. Similar issue with .xls files; if I open Excel via its shortcut, I could then open .xls files with it. Lesson: apparently the new Office preempts the old extension names and openes them in "compatibility" mode. That seems reasonable, but I still don't like the occasional "installing..." message nor the time it takes to finish.

Microsoft Office 2007 Review Begins!

Well, I received my review copy of Microsoft Office 2007 Professional, and have begun the long process of installing, learning, and testing it. I am going to post all my unvarnished findings here -- if you read this and think I've gotten something wrong, feel free to tell me. I am also writing a companion Info Insider column exploring the strategic issues involved with selecting a new Office system --whether client side (new or an upgrade to Office 2007), Open Office / StarOffice) or web based (like Google Apps).

So here you'll see my findings as they occur, good, bad, and always honest.

Friday, January 26, 2007

Web 2.0 - Content 2.0

2.0
OK, it's been a while since I've posted any curmudgeonly thoughts. Been busy writing about Acrobat 8.0 and its consistency with Content 2.0.
What's "Content 2.0" you say? Well, you have to be living under a rock not to have heard about Web 2.0, and since the yin/yang of the web is application/data, I thought it was important to point out that the yang-side of things - content - is in the midst of its own birth pangs.

What are some distinguishing characteristics of content that are undergoing transformational change, the "2.0" thingy? Like Web 2.0, I see a parallel set of attributes in the content side of things:

1) Truly structured (XML-based) content, the question being how comparatively descriptive the structures are. See O’Reilly – “Data is the next ‘Intel-Inside’

2) Web standards applied to content


3) Social, cooperative, collaborative media


4) Delivered anywhere, anytime to any device (re-purposing


5) Combined in new and unexpected ways (re-use


6) Unexpected “mashup’s” providing new content possibilities (e.g., SVG and FlashPaper as alternates to PDF


7) Highly visual



Unlike the content world before OpenOffice (and its cousins StarOffice etc.), and Office 2007, content was pretty much whatever you wrote and laid out on a page... just like static web pages. The words were important, but the tools were presentational -- they helped you add visual appeal to the words, but the words were pretty much a continuous stream of text separated by spaces and punctuation. Now things are changing. Although Office 2007 appears (I'll know more after I get my hands on a package) to merely translate the old "visual layout of text" --Rich Text Format-- into XML, still it is a giant step in the right direction. It helps to be able to search for a figure caption or table caption, for example, while excluding paragraph text.

Now there's too much to think about in Content 2.0 to lay it all down here, but suffice it to say that Adobe has taken a similar step forward with Acrobat 8.0 in adding flash-based collaboration to the formerly electronic reader capabilities of Acrobat. And just before I submitted the column praising Adobe for that feat, I learned that --like OpenOffice and Office 2007-- Adobe had its own initiative to transform Acrobat's internal structure to XML -- the "Mars" project. So that makes another heavy hitter signing up for Content 2.0.

One nit-pick about Acrobat 8.0 though --it's open but still isn't completely converted to the "open source" religion. I was very disappointed to see that there is no Adobe writer plug-in to Mozilla, no "web capture" if you will that preserves the links. Still, Adobe is taken some distinctly forward steps in Acrobat 8.0 and deserves credit for that.

Monday, July 24, 2006

Secret to Content Longevity?

The Wrapper, not the Gum
One goal of FDsys is to preserve content for future access and repurposing. So naturally the questions are: 1) Are you using XML? and 2) If so, what DTD or schema? I was expecting to hear "WordML" (Microsoft Word's XML standard, which is really more an XML expression of its RTF or Rich Text Format; I was also hoping to here OpenDocument, the rich XML office standard on which OpenOffice is constructed. The answers surprised me, but in retrospect should not have. FDsys's plan is to take the content in whatever format it arrives --preferably in a reasonably small number of common formats-- and to concentrate on the metadata wrapper itself, for accessibility. Here's what Mike Wash said.

"We have developed requirements for the information packages that will
exist in FDsys. FDsys architecture is based on the Open Archival
Information System (OAIS) model which develops the concept of
submission, archival and dissemination packages. The excerpts from the
Requirements Document will help you understand our approach to
structuring submission packages and dissemination packages."

And now the details, obviously too much for my 800-word Information Insider column. By "RD" Wash means the FDsys "Requirements Document.

"Page 31 in the RD 2.0 Document
3.2.3.1 Submission Information Packages (SIP)
This section specifies the packaging details for the Submission
Information Package (SIP), and describes how digital content and its
associated metadata are logically packaged for submission to FDsys.
A SIP contains the target digital object(s) and associated descriptive and
administrative metadata. It will be the vehicle whereby content packages
are submitted to FDsys by Content Originators. The concept of the SIP in
the OAIS (Open Archival Information System) model provides a starting
point for the specification of content and associated metadata, but it does
not specify how it is packaged. It is necessary that a SIP follow prespecified
rules so that FDsys can validate and accept the content for
ingest.

Associated with the SIP are three types of information:
* Content Information (digital object(s) and Representation Information),
* Packaging Information, and
* Descriptive Information.
Packaging Information is the information that binds or encapsulates the
Content Information. To accomplish this, a SIP will include a binding
metadata file (sip.xml) that relates the digital objects and metadata
together to form a system-compliant SIP. The Metadata Encoding and
Transmission Standard (METS) schema shall be adopted as the encoding
standard for the sip.xml file, and GPO will specify profiles for METS to
drive its implementation for FDsys.

Descriptive Information is the metadata that allows users to discover the
Content Information in the system.

All file components of the SIP will be populated within a structured file
system directory hierarchy and are then aggregated into a single file or
entity for transmission and ingest into the system."

Wash elaborates further:
"Page 42 in the RD 2.0 Document
3.2.3.4 Dissemination Information Package (DIP)
Dissemination Information Packages (DIPs) are transient copies of digital
objects, associated content metadata, and business process information
that are delivered from the system to fulfill End User requests and Content
Originator orders. As necessary, DIPs should follow the concept of a DIP
as outlined in the OAIS (Open Archival Information System) model.

The DIP is created as part of delivery processing and digital objects may
be adjusted based on orders and requests to support the delivery of hard
copy output, electronic presentation, and digital media.

The DIP should include all digital objects and/or metadata necessary to
fulfill requests and orders. The DIP may also include a binding metadata
file that relates the digital objects and metadata together to form a
package. The Metadata Encoding and Transmission Standard (METS)
schema has been adopted for the SIP and AIP and may be used as the
encoding standard for the binding metadata file, if a binding metadata file
is created."

Standardized, format neutral, and concentrating on the information about the content rather than the content itself. That is the long view, because when you are dealing with a very large (and unpredictable) number for format types, you have to concentrate on the access and delivery of these things.

More Q&A to follow soon.

Thursday, July 20, 2006

Future Digital Systems - FDSys... Complete Q&A

Cutting Room Floor -- Mike Wash Q&A - GPO, FDSys
My next InfoInsider column describes an initiative at the US Government Printing Office that surprised me by its breadth, vision, and implementation pace. That initiative is called Future Digital System (or FDsys). FDsys began with strategic planning in July 2004 and developed a strategic vision for the 21st Century. This vision provides a plan to provide printing and electronic delivery services to the three branches of federal government, 1250 Federal Depository libraries (providing protection from disastrous losses), and to the general public. FDsys is packaged into six phases, is currently mid-way through phase 4 (implementation planning), and expects a full system implementation in October of 2007.

For the past several months I've posed questions and received responses from Mike Wash, GPO's Chief Technical Officer. Due to the size constraints of my column in eContent Magazine, I could only summarize my questions and Mike's answers. If you've read this far, I assume you'd like more details. Here, in this and succeeding posts, are the details of my interactions with Mike.

Question: Ever the IT guy, I asked "What are your broad systems acquisition strategies:
a. Best of Breed versus integrated systems?

b. Proprietary versus Open Source."
Since we're talking about essentially loosely structured content, by "Proprietary" you can easily infer "Microsoft." By "Open Source" you can equally infer "OpenOffice" or "StarOffice 8."
Answer: Here were Mike's answers.
"FDsys will be focused on meeting customer needs;
therefore, GPO is taking a best of breed approach to acquiring
and integrating the technology components that will comprise
FDsys." and "FDsys is a standards based system."

My comment: Since the federal government is "by the people," --all of us-- I think he did a pretty job of stating a preference for standards while not specifying exactly which standards he was referring to. OpenOffice became an ISO standard in May.


Monday, June 26, 2006

More on the Enterprise Search Summit

ESS - more thoughts

Expect more vendor consolidation, and there are many instances of it already:

* Autonomy bought Verity (an oil-and-water team, IMHO).
* Oracle bought TripleHop (great product that combined Autonomy’s statistical search with Verity’s keyword approach).

Not only that, but due to the "camel's nose in the tent" phenomenon, if you've already picked a major vendor that you trust for a major collection of services (Microsoft for ASP/email/Visual Source Safe... or Oracle for databases....) you may be tempted to go with that vendor's search solution --for better or for worse. Doing that will lock you in for a long time. Vendors may see "search" as a way to lock you into their other more profitable solutions.

Sunday, June 25, 2006

Enterprise Search Symposium

Well, it's been about a month since I attended this conference in NYC. I wanted to let my first impressions sink in before relaying my conclusions about enterprise search and this conference. After a month, I have to say I am as ambivilant in many of my conclusions as I was in May. Here are some of those conclusions.

First, the conference was surprisingly well attended. I'd estimate there were from 800-900 attendees, way above the total last year (I'm told). So maybe this finally is the year of Enterprise search. On the other hand, several conference presentors reminded us that IBM developed enterprise search software in the mid 1960s --that's right, about 40 years ago-- and the fundamental capabilities haven't changed a whole lot. Moreover, the market for enterprise search is less than a billion dollars, a relatively small size. So is this the year Enterprise search finally takes off? Or is this a little like Lucy's football?

More thoughts on the way this week.

Sunday, April 02, 2006

MS Office Irritants

Microsoft Office has been with us for a long time. In particular, Microsoft Word's DOS version was available at least in the early 90s. I will give the latest version of Office 2007 (aka "Office 12") a fair and impartial review when there is a stable version to review. In the meantime though, Microsoft, listen up. Although I'm writing this blog, believe me there are many other folks in the "silent majority" of users who feel as I do but just suffer in silence.

I like MS Access and Excel, but there are lots of things I hate about Word and PowerPoint. I hope you'll endeavor to fix these in Office 12 and not simply tart up the old products with a new interface. First, here are some particularly annoying things about Word. Let's start with stability. This product has been around for 15 or so years, and it still is buggy -- as in "Word has experienced a problem" and then the whole thing hangs. Maybe you'll be able to save what you've done, maybe not. But shouldn't this product be bullet proof?

Then there's the infamous "Do you want to merge changes" message when you open a document attached to an email. Everyone's first (second, third) reaction is "huh?" Yes, I understand how to get around that, but I shouldn't have to.

Here's another bogus feature: Whenever you print a Word document --nothing else but print, mind you-- and exit the document, you get the message "Do you want to save the changes?" Again, a big "huh?" First, second, third reactions are "I don't think I made any changes, but I guess I'd better save it anyway." The message comes because you've inadvertantly associated a printer to the document. If this warning is such a great idea, then why don't you apply it consistently with PowerPoint and Excel too?

Here's another: Unhelpful HELP. If you want to know how to prit to fit a paper width or a certain number of pages, Word's help says "If your work doesn't fit exactly on the number of printed pages you want, you can adjust, or scale, your printed work to fit on more or fewer pages than it would at normal size. You can also specify that you want to print your work on a certain number of pages." Great. And how do we do that? Your HELP system, like Word itself, has grown lazy and bloated.

And don't even get me started with security and "leaking" of personal information inadvertently.

These are just a few examples of simple things Microsoft could have done long ago to improve this product. I've got hundreds more but I'm getting carpal tunnel just from resaving my Word documents after printing them...

Getting more value from Office Documents -- XQuery

Since documents by definition are created by and for the world-wide masses, there is a wide variation in the value of these documents. By value I mean both the quality of what they contain (are they true? accurate? interesting?) but also in their value as assets that can be transformed or reused. Microsoft Office 2007 will be XML-based; OpenOffice and StarOffice 8 are XML-based. You can argue whose XML is richer and more useful (so far I give that award to OO and SO8 for a variety of reasons that I'll explain later). But it is still hard to do much with that XML unless you use the right tools. I've used Altova's "Spy" for some time as a suite for some XML management like schema development and analysis and been generally happy with that suite. Increasingly, however, styling and transforming the content is becoming the best way to derive value from investments in XML content. That means using XSLT and XQuery, and I'm increasingly believing that for serious use of those standards, you need a different tool: Stylus Studio. To learn more about this alternative suite, check out Larry Kim's Stylus Studio blog.

Watch this space... StarOffice 8 Review in earnest

Well I've gotten the green light to review StarOffice 8 in eContent magazine, with a very short timeframe. So watch this space for details that will be unavailable in the review, or even too "edgy" for a printed publication. In this blog I can be as candid and opinionated as I wish... after all, I am a curmudgeon.

Tuesday, January 17, 2006

Star Office 8 - Writer - More Findings

More StarOffice 8 Findings

I’m using StarOffice 8 (SO8) as my default word processor now, and two general impressions about StarOffice Writer continue to amaze me:

1) How easy it is to learn how to use, its familiar feel for MS Word users.
2) How it has improved many of the annoying shortcomings of MS Word.

Although I’ve read the reviewer’s guide, I find myself generally not needing the HELP system. It is as though I’m interacting with a twin –not an identical twin, and not the evil twin, but with the better twin.

Little annoying things with MS Word that SO8 has fixed:

1) First and foremost, the table model. You really can join (merge) cells vertically, and it isn’t just a “fake join” (one where the cell boundary has simply been hidden, as in MS Word); you really can vertically center text within a vertically merged call.
2) Second, when you use the format painter to paint table cell attributes, you paint not only the text font attributes but also the cell attributes (like borders and cell shading). This should be an easy fix for MS Word. We’ll see.

So though I’m still in the honeymoon phase, do I notice any blemishes on SO8’s Writer? Well a couple. Here’s what I’ve found so far:

1) Unexpected hang while initializing the templates for first-time use. I sent a message to SO8 support about this. This has happened only once.
2) And speaking of the format painter, sometimes it doesn’t seem to paint formats. Several times, especially with text copied from a web page into a Writer document, you can try as you will to format paint but it doesn’t seem to work. I’m not sure why. I’ll bet I could edit the XML text contents to fix the problem (as I might have done with WordPerfect reveal codes a decade ago), but haven’t gone to try that yet.
3) There is a lovely little anticipated text completion feature as you type into Writer, and sometimes it cleverly guesses the right word, but I can’t figure out how to accept its suggested completion. I’m sure I’ll figure that out; the feature is too nice not to be documented there.
4) And speaking of XML, I examined the “content.xml” piece of a saved document, and then I opened it in Altova XML Spy to see if the document was valid. It wasn’t (although it was well-formed). I’m not sure why; there are schema references; perhaps they’ve changed. Anyway, SO8 comes so close and I wish there were a way to get that validity check to work.
5) Last and minor to some folks but not to me, I dearly wish that the text analysis functions were a bit beefier. I find the thesaurus to be somewhat better than MS Words, but there is no way to run a reading level analysis the way there is in Word. I’m sure that will come; maybe there is even an add-on somewhere that I don’t know about. But I do miss that reading level analyzer.

As I said, working with Writer has an uncannily familiar feel, so trying to figure out what to test next is in some ways difficult. It is almost like the experience of saving all your laptop Windows settings before getting the thing re-imaged, then using the updated laptop with its settings. You tend to find little differences and things missing as you work.

What do I plan to check next in Writer? Much as I dislike MS Word’s “fields” capabilities for a variety of reasons, I’ve grown comfortable with the way they work. One feature I especially like (and need) is the ability to display a “last date and time saved” string in a document footer. I use this even in document management systems, since you never know which version the printed copy you’re looking at is.

Tuesday, January 10, 2006

StarOffice 8 - Installation and first impressions begin

Installation (and registration) went smooth as silk. Unlike typical MS applications, StarOffice8 offered (but didn't default) to be my choice for opening MS Word, PowerPoint, and Excel.

Jumped right into the Writer, created a table, and tried right away to do a couple of things that always bugged me in MS Word:
1) Using the style painter to copy cell background from one cell to another... worked like a charm.
2) Joined cells horizontally and vertically... worked fine.

#2 is something that WordPerfect for DOS in the early 90s could do, but MS Word hasn't mastered it yet (fakes the vertical join, but doesn't really do it).

Bonus surprise: I could save the table in native Writer or many other formats... including DocBook. Fantastic!

How to Eat the Review Elephant?

Well the StarOffice8 CD arrived yesterday, along with lots of reviewer hints and overviews. How to go about assessing this office suite? What criteria that I can also apply to Office 12? Here are SOME of the review dimensions that I will be considering the following criteria; can you suggest anything to add?:

Overall Package Considerations

Cost

Licensing options, including List price/ street price, support options, ease of installation, disk space consumed, memory required. Overall value.

Ease of Use
Intuitiveness of screens, HELP, click efficiency, importing/exporting others formats?

Import/Export
Accuracy, robustness (e.g., styles included?), complexity to perform. Interoperability with MS Office. Evaluate:

Automates the analysis of documents to identify potential migration risks?
Calculates the cost of migration
Compare migration options in different Office Package editions (Migration Partner, Enterprise Edition in StarOffice 8)
How well does it migrate Macros?

Installation

Ease of use, ability to uninstall.

Performance

How long does it take to perform simple and complex procedures (such as updating a TOC or index, inserting a graphic).

Data Management

OLE? Live data from databases? DB queries as source (which ones)? XML Sources?

XML Support
Schema and DTD?
Format versus meaning?


Tags for formatting only?
Support for external XML models?

Forms Support?
Which forms schema? XForms.

Extensibility? Allow use of "alien attributes"?

Functionality

How robust is the office package? What does it contain? Consider these modules:

Word Processing (emphasized here)
Spreadsheets
Presentations
Drawing packages (vector/raster)
Database
Others – Chart, Math, integrated tools

Word Processing Capabilities

Styles
Supported in presentation tool? Robustness in WP?

Robustness of table model

Technical document Support

Desktop Publishing (layout intensive) Support


Header/Footers

Tables Of Contents, Indexes, Hyperlinks, Cross References

Writer Support

Spell Check, Thesaurus, Word Count, Reading Level


PDF Support
Which Acrobat version compatibility?

Wednesday, December 28, 2005

Clash of the Titans -- over CONTENT in 2006

Those of you who read my Information Insider column on EContent Magazine (http://econtentmag.com) know my views about the importance of content and how well XML supports the creation, reuse, and repurposing of content. Hand-in-hand with content and XML is the value I place on open-source software, software --like the XML family of standards themselves-- have been peer reviewed.

If I am not mistaken, 2006 will be the year of the "Clash of the Titans," not only Microsoft and Google but somewhat lesser but important players too, including Sun Microsystems, Corel, Adobe, and a host of others who have decided to strike at Microsoft's jugular: Microsoft Office, the product from which Microsoft derives about 1/4 of its revenues. In fact, EContent Magazine will be publishing my column with this same title in early 2006.

In mid-year, I will begin writing a software review of Microsoft Office 12 (assuming a reasonably stable product is available by then). I will also be including a sidebar comparing Office 12 with two open-source products: OpenOffice and StarOffice. Since these latter two products are already available, I will begin writing my comments about these products in the first quarter of 2006, as a preparation to compare them with Office 12.

Others have in a sense beaten me to the punch: The Commonwealth of Massachusetts, the European Union, and many other foreign countries. By that I mean that they have already announced either deep skepticism about the "openness" of Office 12 and/or have pledged their support for OpenOffice or StarOffice.

Let the games begin!

Monday, August 22, 2005

OK... so what's a document?

What is a Document?
If we’re going to discuss document (or content) management systems of various sorts, we have to frame up the discussion with what may seem like a trivial question: What is a document? Most of us have moved beyond whether or not a document is only paper, lambskin, or papyrus, but when you expand the definition to include electronic content, the answer to what constitutes a document gets fuzzy. So here’s how I define documents and document content.

First, a document is any piece of meaningful content –often with some of the look-and-feel of physical renditions like paper books--, whether public or private. Books are merely physical instances of electronic documents. “Meaningful content” extends beyond mere words, and includes video and audio. If you had to take a chance that every IPod download would be random static, you probably wouldn’t bother buying the device or repeating the download. And these days, multimedia content usually comes with its own metadata – information about the content, such as its author, when it was created, and so on.

But wait, you ask: Catalogs are books, and catalog content these days is usually found in a database. So are databases documents too? No, although I assert that the reverse is true in a way: “documents are data too,” a nice little bumper sticker thought.

I see document content as existing in a spectrum, ranging from databases (the most highly structured form of content) to highly structured content (increasingly expressed in XML) to forms to documents that are more-or-less subtly structured. Note I don’t use the term “unstructured documents” as many others do. To me, “unstructured documents” are an oxymoron: Either they have structure of some sort (and therefore have use and meaning) or they do not. “Unstructured” means “random” and randomness is not useful to purveyors of content. Even a well laid out advertisement has structure. Although the advertisement’s electronic format may not make it easy to discern its structure and pitch, human eyeballs readily do (or the sponsor won’t re-run the ad).

Why the brouhaha about “subtle” versus “un”-structured? Because if you don’t understand the difference you give up any chance of using whatever mechanisms to structure a document –such as styles—that are available to you. Or you give up and say “hey, it’s unstructured, no wonder I can’t repurpose or transform the document to be or do something else.” And that would be a pity. We invest a lot in our documents and deserve to get the most out of them that we can.

So to summarize, documents are book-like containers of information, regardless of their format. Databases aren’t books (but can be used to produce books). The organization of documents ranges from subtly (or loosely) structured (like restaurant menus or a child’s book) to highly structured (like a form). Moreover, the binary format of documents, most commonly that of a word processor or page layout system, does not yield clues easily to its structure but structure it has or it wouldn’t be useful. Word processors and page layout programs’ structure is pretty darned loose, however, and that is one reason why the publishing world (that would be all of us) is ever-so-slowly moving to more explicit structure, XML.

Interestingly, Microsoft may actually be putting its cash horde where its mouth (support for XML) is. Press teasers coming out of Redmond suggest that Office 2006 will not only replace its binary format “RTF” with “XML,” but may in fact begin supporting the content (not just the look-and-feel) with arbitrary XML.

So curmudgeon point number 1: No document is unstructured. All documents (and document formats) have some structure, ranging from subtle or loose to very rigid. But please stop using the “unstructured” adjective, OK?

Friday, August 19, 2005

Why a Content Curmudgeon? What's Content Anyway?

In this blog I'll be providing critical commentary about issues of interest to anyone who produces content of any any sort, but primarily the written kind. And if you create content of any sort, sooner or later you become interested in managing it... filing it, editing it, sharing it, maybe deleting or archiving it, and certainly being able to find it.
All these "lifecycle" issues surrounding content apply especially to words... I feel so strongly about words that my vanity plate simply has one word on it: WORD. Moreover, my website is http://my-words.org.

And why the curmudgeon part? Because there is a lot of BS surrounding the way we create, value, and manage our content --especially our words. I deal with document management and search vendors frequently --and we all use a word processor and alternately curse and sigh at our word processors, especially MS "Word".

This blog is a critical and frank look at words and the tools to manage and create them. Although I also write a bi-monthly column, "Info Insider" for eContent Magazine, this blog is where I'll put the franker stuff.

So if you're interested in Content (or even Knowledge) Management, from A-Z, from creation through deletion, and all the tools in between-- stay tuned.