Monday, June 26, 2006

More on the Enterprise Search Summit

ESS - more thoughts

Expect more vendor consolidation, and there are many instances of it already:

* Autonomy bought Verity (an oil-and-water team, IMHO).
* Oracle bought TripleHop (great product that combined Autonomy’s statistical search with Verity’s keyword approach).

Not only that, but due to the "camel's nose in the tent" phenomenon, if you've already picked a major vendor that you trust for a major collection of services (Microsoft for ASP/email/Visual Source Safe... or Oracle for databases....) you may be tempted to go with that vendor's search solution --for better or for worse. Doing that will lock you in for a long time. Vendors may see "search" as a way to lock you into their other more profitable solutions.

Sunday, June 25, 2006

Enterprise Search Symposium

Well, it's been about a month since I attended this conference in NYC. I wanted to let my first impressions sink in before relaying my conclusions about enterprise search and this conference. After a month, I have to say I am as ambivilant in many of my conclusions as I was in May. Here are some of those conclusions.

First, the conference was surprisingly well attended. I'd estimate there were from 800-900 attendees, way above the total last year (I'm told). So maybe this finally is the year of Enterprise search. On the other hand, several conference presentors reminded us that IBM developed enterprise search software in the mid 1960s --that's right, about 40 years ago-- and the fundamental capabilities haven't changed a whole lot. Moreover, the market for enterprise search is less than a billion dollars, a relatively small size. So is this the year Enterprise search finally takes off? Or is this a little like Lucy's football?

More thoughts on the way this week.

Sunday, April 02, 2006

MS Office Irritants

Microsoft Office has been with us for a long time. In particular, Microsoft Word's DOS version was available at least in the early 90s. I will give the latest version of Office 2007 (aka "Office 12") a fair and impartial review when there is a stable version to review. In the meantime though, Microsoft, listen up. Although I'm writing this blog, believe me there are many other folks in the "silent majority" of users who feel as I do but just suffer in silence.

I like MS Access and Excel, but there are lots of things I hate about Word and PowerPoint. I hope you'll endeavor to fix these in Office 12 and not simply tart up the old products with a new interface. First, here are some particularly annoying things about Word. Let's start with stability. This product has been around for 15 or so years, and it still is buggy -- as in "Word has experienced a problem" and then the whole thing hangs. Maybe you'll be able to save what you've done, maybe not. But shouldn't this product be bullet proof?

Then there's the infamous "Do you want to merge changes" message when you open a document attached to an email. Everyone's first (second, third) reaction is "huh?" Yes, I understand how to get around that, but I shouldn't have to.

Here's another bogus feature: Whenever you print a Word document --nothing else but print, mind you-- and exit the document, you get the message "Do you want to save the changes?" Again, a big "huh?" First, second, third reactions are "I don't think I made any changes, but I guess I'd better save it anyway." The message comes because you've inadvertantly associated a printer to the document. If this warning is such a great idea, then why don't you apply it consistently with PowerPoint and Excel too?

Here's another: Unhelpful HELP. If you want to know how to prit to fit a paper width or a certain number of pages, Word's help says "If your work doesn't fit exactly on the number of printed pages you want, you can adjust, or scale, your printed work to fit on more or fewer pages than it would at normal size. You can also specify that you want to print your work on a certain number of pages." Great. And how do we do that? Your HELP system, like Word itself, has grown lazy and bloated.

And don't even get me started with security and "leaking" of personal information inadvertently.

These are just a few examples of simple things Microsoft could have done long ago to improve this product. I've got hundreds more but I'm getting carpal tunnel just from resaving my Word documents after printing them...

Getting more value from Office Documents -- XQuery

Since documents by definition are created by and for the world-wide masses, there is a wide variation in the value of these documents. By value I mean both the quality of what they contain (are they true? accurate? interesting?) but also in their value as assets that can be transformed or reused. Microsoft Office 2007 will be XML-based; OpenOffice and StarOffice 8 are XML-based. You can argue whose XML is richer and more useful (so far I give that award to OO and SO8 for a variety of reasons that I'll explain later). But it is still hard to do much with that XML unless you use the right tools. I've used Altova's "Spy" for some time as a suite for some XML management like schema development and analysis and been generally happy with that suite. Increasingly, however, styling and transforming the content is becoming the best way to derive value from investments in XML content. That means using XSLT and XQuery, and I'm increasingly believing that for serious use of those standards, you need a different tool: Stylus Studio. To learn more about this alternative suite, check out Larry Kim's Stylus Studio blog.

Watch this space... StarOffice 8 Review in earnest

Well I've gotten the green light to review StarOffice 8 in eContent magazine, with a very short timeframe. So watch this space for details that will be unavailable in the review, or even too "edgy" for a printed publication. In this blog I can be as candid and opinionated as I wish... after all, I am a curmudgeon.

Tuesday, January 17, 2006

Star Office 8 - Writer - More Findings

More StarOffice 8 Findings

I’m using StarOffice 8 (SO8) as my default word processor now, and two general impressions about StarOffice Writer continue to amaze me:

1) How easy it is to learn how to use, its familiar feel for MS Word users.
2) How it has improved many of the annoying shortcomings of MS Word.

Although I’ve read the reviewer’s guide, I find myself generally not needing the HELP system. It is as though I’m interacting with a twin –not an identical twin, and not the evil twin, but with the better twin.

Little annoying things with MS Word that SO8 has fixed:

1) First and foremost, the table model. You really can join (merge) cells vertically, and it isn’t just a “fake join” (one where the cell boundary has simply been hidden, as in MS Word); you really can vertically center text within a vertically merged call.
2) Second, when you use the format painter to paint table cell attributes, you paint not only the text font attributes but also the cell attributes (like borders and cell shading). This should be an easy fix for MS Word. We’ll see.

So though I’m still in the honeymoon phase, do I notice any blemishes on SO8’s Writer? Well a couple. Here’s what I’ve found so far:

1) Unexpected hang while initializing the templates for first-time use. I sent a message to SO8 support about this. This has happened only once.
2) And speaking of the format painter, sometimes it doesn’t seem to paint formats. Several times, especially with text copied from a web page into a Writer document, you can try as you will to format paint but it doesn’t seem to work. I’m not sure why. I’ll bet I could edit the XML text contents to fix the problem (as I might have done with WordPerfect reveal codes a decade ago), but haven’t gone to try that yet.
3) There is a lovely little anticipated text completion feature as you type into Writer, and sometimes it cleverly guesses the right word, but I can’t figure out how to accept its suggested completion. I’m sure I’ll figure that out; the feature is too nice not to be documented there.
4) And speaking of XML, I examined the “content.xml” piece of a saved document, and then I opened it in Altova XML Spy to see if the document was valid. It wasn’t (although it was well-formed). I’m not sure why; there are schema references; perhaps they’ve changed. Anyway, SO8 comes so close and I wish there were a way to get that validity check to work.
5) Last and minor to some folks but not to me, I dearly wish that the text analysis functions were a bit beefier. I find the thesaurus to be somewhat better than MS Words, but there is no way to run a reading level analysis the way there is in Word. I’m sure that will come; maybe there is even an add-on somewhere that I don’t know about. But I do miss that reading level analyzer.

As I said, working with Writer has an uncannily familiar feel, so trying to figure out what to test next is in some ways difficult. It is almost like the experience of saving all your laptop Windows settings before getting the thing re-imaged, then using the updated laptop with its settings. You tend to find little differences and things missing as you work.

What do I plan to check next in Writer? Much as I dislike MS Word’s “fields” capabilities for a variety of reasons, I’ve grown comfortable with the way they work. One feature I especially like (and need) is the ability to display a “last date and time saved” string in a document footer. I use this even in document management systems, since you never know which version the printed copy you’re looking at is.

Tuesday, January 10, 2006

StarOffice 8 - Installation and first impressions begin

Installation (and registration) went smooth as silk. Unlike typical MS applications, StarOffice8 offered (but didn't default) to be my choice for opening MS Word, PowerPoint, and Excel.

Jumped right into the Writer, created a table, and tried right away to do a couple of things that always bugged me in MS Word:
1) Using the style painter to copy cell background from one cell to another... worked like a charm.
2) Joined cells horizontally and vertically... worked fine.

#2 is something that WordPerfect for DOS in the early 90s could do, but MS Word hasn't mastered it yet (fakes the vertical join, but doesn't really do it).

Bonus surprise: I could save the table in native Writer or many other formats... including DocBook. Fantastic!

How to Eat the Review Elephant?

Well the StarOffice8 CD arrived yesterday, along with lots of reviewer hints and overviews. How to go about assessing this office suite? What criteria that I can also apply to Office 12? Here are SOME of the review dimensions that I will be considering the following criteria; can you suggest anything to add?:

Overall Package Considerations

Cost

Licensing options, including List price/ street price, support options, ease of installation, disk space consumed, memory required. Overall value.

Ease of Use
Intuitiveness of screens, HELP, click efficiency, importing/exporting others formats?

Import/Export
Accuracy, robustness (e.g., styles included?), complexity to perform. Interoperability with MS Office. Evaluate:

Automates the analysis of documents to identify potential migration risks?
Calculates the cost of migration
Compare migration options in different Office Package editions (Migration Partner, Enterprise Edition in StarOffice 8)
How well does it migrate Macros?

Installation

Ease of use, ability to uninstall.

Performance

How long does it take to perform simple and complex procedures (such as updating a TOC or index, inserting a graphic).

Data Management

OLE? Live data from databases? DB queries as source (which ones)? XML Sources?

XML Support
Schema and DTD?
Format versus meaning?


Tags for formatting only?
Support for external XML models?

Forms Support?
Which forms schema? XForms.

Extensibility? Allow use of "alien attributes"?

Functionality

How robust is the office package? What does it contain? Consider these modules:

Word Processing (emphasized here)
Spreadsheets
Presentations
Drawing packages (vector/raster)
Database
Others – Chart, Math, integrated tools

Word Processing Capabilities

Styles
Supported in presentation tool? Robustness in WP?

Robustness of table model

Technical document Support

Desktop Publishing (layout intensive) Support


Header/Footers

Tables Of Contents, Indexes, Hyperlinks, Cross References

Writer Support

Spell Check, Thesaurus, Word Count, Reading Level


PDF Support
Which Acrobat version compatibility?

Wednesday, December 28, 2005

Clash of the Titans -- over CONTENT in 2006

Those of you who read my Information Insider column on EContent Magazine (http://econtentmag.com) know my views about the importance of content and how well XML supports the creation, reuse, and repurposing of content. Hand-in-hand with content and XML is the value I place on open-source software, software --like the XML family of standards themselves-- have been peer reviewed.

If I am not mistaken, 2006 will be the year of the "Clash of the Titans," not only Microsoft and Google but somewhat lesser but important players too, including Sun Microsystems, Corel, Adobe, and a host of others who have decided to strike at Microsoft's jugular: Microsoft Office, the product from which Microsoft derives about 1/4 of its revenues. In fact, EContent Magazine will be publishing my column with this same title in early 2006.

In mid-year, I will begin writing a software review of Microsoft Office 12 (assuming a reasonably stable product is available by then). I will also be including a sidebar comparing Office 12 with two open-source products: OpenOffice and StarOffice. Since these latter two products are already available, I will begin writing my comments about these products in the first quarter of 2006, as a preparation to compare them with Office 12.

Others have in a sense beaten me to the punch: The Commonwealth of Massachusetts, the European Union, and many other foreign countries. By that I mean that they have already announced either deep skepticism about the "openness" of Office 12 and/or have pledged their support for OpenOffice or StarOffice.

Let the games begin!

Monday, August 22, 2005

OK... so what's a document?

What is a Document?
If we’re going to discuss document (or content) management systems of various sorts, we have to frame up the discussion with what may seem like a trivial question: What is a document? Most of us have moved beyond whether or not a document is only paper, lambskin, or papyrus, but when you expand the definition to include electronic content, the answer to what constitutes a document gets fuzzy. So here’s how I define documents and document content.

First, a document is any piece of meaningful content –often with some of the look-and-feel of physical renditions like paper books--, whether public or private. Books are merely physical instances of electronic documents. “Meaningful content” extends beyond mere words, and includes video and audio. If you had to take a chance that every IPod download would be random static, you probably wouldn’t bother buying the device or repeating the download. And these days, multimedia content usually comes with its own metadata – information about the content, such as its author, when it was created, and so on.

But wait, you ask: Catalogs are books, and catalog content these days is usually found in a database. So are databases documents too? No, although I assert that the reverse is true in a way: “documents are data too,” a nice little bumper sticker thought.

I see document content as existing in a spectrum, ranging from databases (the most highly structured form of content) to highly structured content (increasingly expressed in XML) to forms to documents that are more-or-less subtly structured. Note I don’t use the term “unstructured documents” as many others do. To me, “unstructured documents” are an oxymoron: Either they have structure of some sort (and therefore have use and meaning) or they do not. “Unstructured” means “random” and randomness is not useful to purveyors of content. Even a well laid out advertisement has structure. Although the advertisement’s electronic format may not make it easy to discern its structure and pitch, human eyeballs readily do (or the sponsor won’t re-run the ad).

Why the brouhaha about “subtle” versus “un”-structured? Because if you don’t understand the difference you give up any chance of using whatever mechanisms to structure a document –such as styles—that are available to you. Or you give up and say “hey, it’s unstructured, no wonder I can’t repurpose or transform the document to be or do something else.” And that would be a pity. We invest a lot in our documents and deserve to get the most out of them that we can.

So to summarize, documents are book-like containers of information, regardless of their format. Databases aren’t books (but can be used to produce books). The organization of documents ranges from subtly (or loosely) structured (like restaurant menus or a child’s book) to highly structured (like a form). Moreover, the binary format of documents, most commonly that of a word processor or page layout system, does not yield clues easily to its structure but structure it has or it wouldn’t be useful. Word processors and page layout programs’ structure is pretty darned loose, however, and that is one reason why the publishing world (that would be all of us) is ever-so-slowly moving to more explicit structure, XML.

Interestingly, Microsoft may actually be putting its cash horde where its mouth (support for XML) is. Press teasers coming out of Redmond suggest that Office 2006 will not only replace its binary format “RTF” with “XML,” but may in fact begin supporting the content (not just the look-and-feel) with arbitrary XML.

So curmudgeon point number 1: No document is unstructured. All documents (and document formats) have some structure, ranging from subtle or loose to very rigid. But please stop using the “unstructured” adjective, OK?

Friday, August 19, 2005

Why a Content Curmudgeon? What's Content Anyway?

In this blog I'll be providing critical commentary about issues of interest to anyone who produces content of any any sort, but primarily the written kind. And if you create content of any sort, sooner or later you become interested in managing it... filing it, editing it, sharing it, maybe deleting or archiving it, and certainly being able to find it.
All these "lifecycle" issues surrounding content apply especially to words... I feel so strongly about words that my vanity plate simply has one word on it: WORD. Moreover, my website is http://my-words.org.

And why the curmudgeon part? Because there is a lot of BS surrounding the way we create, value, and manage our content --especially our words. I deal with document management and search vendors frequently --and we all use a word processor and alternately curse and sigh at our word processors, especially MS "Word".

This blog is a critical and frank look at words and the tools to manage and create them. Although I also write a bi-monthly column, "Info Insider" for eContent Magazine, this blog is where I'll put the franker stuff.

So if you're interested in Content (or even Knowledge) Management, from A-Z, from creation through deletion, and all the tools in between-- stay tuned.