Tuesday, September 28, 2010

Using MS Word to Create Blogger Posts

Using Microsoft Word and Google Docs to Create Blogs and Post

Them to Blogger


I have been having a deuce of a time trying to use MS Word as the word processor for blog posts to Blogger. Google no longer supports the Word blogger add-in, and Word reports simply that it can’t publish the blog (if you started in Word to create a document of type blog). This started happening just this week for me, although research tells me it has been going on for some time for others.

I am posting this so if you want to use MS Word to create posts for Blogger, you can see how to do it. Too bad Google doesn’t have the grace to post this for all of us earlier.

Start with an MS Word .docx File


This title is actually a Heading 2 style (the start of the blog uses a Heading 1 style).
This document originated in MS Word, was saved-as an MS Word .docx document with graphic picture placed where you see it now. That’s me, several years ago, with my Superman suit. Quite a few years ago, actually. You could just as easily have used a “.doc” file instead of an Office 2007 version of Word.

I then uploaded it to Google Docs, selected all (by highlighting the whole thing), then CTRL-C copied it (copy doesn’t work; it simply creates a link), and pasted it into a new Blogger blog (the one you are now reading). Bottom Line: You can use MS Word in .doc mode to create blogs, then post them to Blogspot via Google Docs.

Table Test

I was curious if an MS Word table would survive the process. As you can see, it did.
Here is a test table, 2 columns x 3 rows, with the first row a heading and shaded:

Heading 1Heading 2
Row 1 column 1Row 1 column 2
Row 2, column 1Row 2, column 2

The shading didn't show up really well, but you can see a difference.

Graphics Test Too

Here’s a picture test to see whether it would survive the trip to Google Docs, and then a copy/paste to Blog. You can see that it did. This is much better than the old “paste a picture to the top of the blog, then try your darndest to move it where you really want it to be placed.”
 

Bottom Line

This process works, not perfectly but it works, although I don’t know what happens if I delete the Google Docs version of this document and the accompanying in-line picture. For now, I’ll leave well enough alone.


If you are as frustrated with Blogger as I am when it comes to creating blog posts with Word, I hope this helps. For now, I'll leave well enough alone.

October 22 2010 update:
This funky procedure that worked a month ago now fails. I've posted many questions to Blogger support about how to use Word as an editor for Blogger blog posts, but have heard nothing.

Saturday, September 25, 2010

Beginning of iPad Buyer's Remorse

Well, it's been several months now since I bought my iPad and in many ways I love the device. 15 seconds for a cold boot; very easy to get to my email and to the web. However, some things are not
quite as smooth as I'd hoped.


First, email. I have 2 personal email accounts, one via Comcast and one Gmail. I create folders in both to store email I want available online, but I can see folders on only one (Gmail). So I've forwarded all Comcast email to Gmail; so far that seems to take care of the folder issue. Moreover, although I really
hate the 'conversation' feature in Gmail (delete parts of a conversation and you may lose what's in your inbox), but it appears to work with my iPad better by far than did Comcast.


I love eBooks, and the promise of being able to view and view and manage digital content on a single
device no matter what the content's format or where it came from. There, as I said in my previous post, the iPad has a long way to go. Not only does iBook do a poor job of presenting PowerPoint files, an
(unfortunate) staple in the business world, but the other reader app I bought, GoodReader, is better
but not much better with PowerPoint in that regard. Still a bargain, but getting its WiFi drag-and-drop feature (from PC to iPad) is problematic. Then there is the Kindle app, a nice free way to download Amazon eBooks with the ability to comment on them, synchronize them with my PC, etc. Except you shouldn't try to
read Kindle books on the iPad in natural light; the shiny touch-sensitive screen produces so much glare that you simply can't use it to read eBooks(or use it in general). And when all is said and done, I have 3 places to store eBooks of various types, instead of one; three interfaces to learn; and three different sets of functions to learn how to use. This isn't "Usability 101."


But photos, there's a kind a rich media that Apple has mastered, right' Well yes, after you get them on the iPad, but getting them there is quite a trick. Using iTunes (a weird application for general purpose transfer of files) is one way, but why not buy the Apple Camera Connection kit, then just transfer files
to your iPad directly via USB cable or maybe memory card? What could be simpler? Nothing is simple I've found, and unfortunately the kit doesn't work. I have 3 digital cameras. I stopped trying after the second one.
My first camera, a Polaroid digital with USB port, could connect to the iPad. I got two device errors
after connecting twice: 'This device uses too much power' (hey, it has 4 AAA batteries; try them!) and 'Device not supported.' The memory card fit one of the two connectors in the kit, and I could easily transfer pix from the card. However, camera #2 is my Sony camcorder, with a memory stick for stills and low-resolution video. The memory stick doesn't fit the Apple connector, so l tried using the USB cable; that works flawlessly with Windows. Nada. No error message from the iPad, no message (and no photo transfer) at all. So one card fit one connector; neither USB cable worked with the two cameras.


Apple deserves credit for vetting applications before it allows them to be sold, but Apple clearly has a long way to go with standards (like USB) and quasi-standards (like FLASH). The iPad remains an engaging device, but not as simple or as functional as I'd hoped.

Saturday, July 24, 2010

eDiscovery - Your Next Crisis?

Litigation has a long tradition in the US. Now, as firms and enterprises increasingly shift from paper to digital knowledge assets, that litigation trend is also moving into the digital arena. Ediscovery is a broad term applying to one of a series of responses to a "triggering event." That event starts begins an obligation to preserve and disclose data that may be due to a judicial order, or even the mere knowledge of a future legal proceeding that is likely to require preserving and finding relevant information stored in your electronic documents. In the Ediscovery world, these assets are now called Electronically Stored Information or ESI. Ediscovery is a relatively new concept. You could be excused if you are not familiar with the term. In the US, the Federal Rules of Civil Procedure or "FRCP" issued rule 26, and related rules, in December 2007. This update to the FRCP made all ESI "discoverable" just as non-electronic information, usually paper, is discoverable. ESI, eDiscovery, FRCP… these and related acronyms are enough to make your head swim. But keep your head above water and pay attention, because if you are not ready for eDiscovery, you could be in for some serious pain, both to your organization's bottom line and to its reputation.
In my view, eDiscovery is built on a series of tools and best practices that should be present in every enterprise and that everyone should proactively follow. Sadly, few actually are prepared, because these tools and practices are often seen as optional, a distraction from the main business activities. The tools I refer to are Enterprise Content Management (ECM), Records Management (RM) and Search. The best practices relate to the processes and procedures you follow to oversee all your ESI, records and non-records.
So how do you get started? Meet EDRM, the Electronic Discovery Reference Model, and its sibling, IMRM, the Information Management Reference Model. I told you this wouldn't be simple.

Reference Models

The EDRM group, responsible for both these reference models, is a consortium of vendors and other interested parties wanting to develop comprehensive guidelines, standards, and tools to reduce the incidence of eDiscovery nightmares, or provide ways to cope when they occur. The Electronic Discovery Reference Model (EDRM) provides guidelines, sets standards, and delivers resources to help those who purchase eDiscovery solutions and vendors who provide them,  improving the quality of the tools and reducing the costs associated with eDiscovery. IMRM, related to one part of EDRM, complements that model.

IMRM

IMRM, shown below and courtesy of EDRM.NET, aims to "provide a common, practical, flexible framework to help organizations develop and implement effective and actionable information management programs. The IMRM Project, also part of the EDRM industry working group, aims to offer guidance to Legal, IT, Records Management, line-of-business leaders and other business stakeholders within organizations." This project within the EDRM group suggests ways to facilitate a common way among these different groups to discuss and make decisions on the organization's information needs.





Although this diagram has the ring of endless numbers of PowerPoint slides you've seen on a variety of topics, it re-iterates some basic, commonsensical ideas that all should adopt but most ignore. I won't go into details about this, but the general themes are obvious. These various different business units, often at odds and seldom understanding each other's language and values, must work together to manage ESI, whether records or not. The result could be that eDiscovery nightmare. Some key takeaways: Decide and oversee the ways your organization creates and saves information. Throw away what isn't needed, keep what you must – all within the corporate requirements for both records and other ESI. IT will benefit (less to back up, archive, and index for search); Legal will be happy you are reducing risk; Records Management will appreciate getting all the help with ESI it can get; and business profits will be shielded somewhat from the risks of bad information management practices.

EDRM

Now what of the EDRM model itself? Again, this is not an easy concept but still critical to prepare for that inevitable crisis. To understand this model, courtesy EDRM (edrm.net), read left to right and notice how the process sifts through huge volumes of ESI and aims to focus on the important, most relevant pieces. EDRM has eight ongoing projects to fill out the details of their goals to "establish guidelines, set standards, and delivering resources."




IMRM is related to the left-most process, "Information Management," but don't view it as a picture of Information Management itself. Instead, think of IMRM as a way of promoting cross-organizational dialog– always important, critical if that eDiscovery request comes a knocking.



So those two models give you the grand overview. In upcoming posts I'll look at some of the elements of these models in greater detail. I also spoke to several leading eDiscovery vendors recently. I'll also tell you their views and my impressions about . In my next few posts, I'll tell you their views and my impressions about vendor involvement with EDRM in general. Sure, each vendor has an element of self-interest. That's understood. Are these guys just out to make a buck on the "next big thing," or are they up to something truly useful , for eDiscovery and maybe more in this collaborative effort? 


In my next post I'll look at the first element of the EDRM model, Information Management. You'll see what vendors had to say, and I'll give you my assessment about whether their participation in this is just vendor smoke or indeed a way for you to get started preparing for that next crisis.
  

Tuesday, June 01, 2010

Exploring the iPad: First Impressions


Well, like 2 million other folks, I bought an iPad (WIFI, 32 gigs of memory). Like others, I couldn't resist the allure of this seductive device, and I'm suffering from Windows weariness:


  • Bloat
  • Security issues
  • Complexity
  • Slow performance
  • Reliability issues
  • Battery life between charges
  • Lack of openness to standards
  • Etc…
Why did I buy the iPad? Primarily as an e-book reader but also as a quicker, lighter web access tool. What do I think of the iPad as an e-Book reader? Marginal, but more about that assessment later. First a few words about my first impressions.

The iPad is seductively beautiful and I don't find the 1.5 pounds to be excessive; it is about the weight of a hardback novel. Getting used to it, if you are not a Mac-head or don't own an iPhone though, is difficult. Using Safari and the iPad is a little like joining a club where you haven't been told the secret handshakes. HELP isn't built-in, although there is an iPad help site automatically listed in your web favorites. After a while I learned to use two fingers to pinch or spread the screen; tapping twice in the middle of the screen enlarges and centers the page (unless you tap on a link that happens to be there). How you do a simple string-search "find" –CTRL F in most other browsers—I still haven't figured out and am beginning to guess that features is just missing.

I'm also realizing that I've traded one vendor's nose-thumbing to standards (or de facto standards) –Microsoft—for another vendor acting the same way: Apple. You'd be surprised, for example, how many sites you cannot use since Apple refuses to support Flash. Forget Hulu and most news video clips from major news sites. They almost all use Flash since it is ubiquitous and has a light footprint.

If you have a Windows PC (I have several), how do you get files from it to the iPad for viewing? First, I was amazed how little the Apple folks (both in the local Bethesda Apple store and the online Apple geniuses) know about --or maybe even have thought about—working with Windows machines. The store rep told me I could use iTunes and essentially drag and drop my files, or just email them to myself. Or I could subscribe for the fee-based MobilMe service to store these files (no thanks). Email all 500 files? No thanks to that either. Drag and drop? Sorry, I misunderstood. It turns out that you can drag-and-drop –as always, there's an app for that. The surprise was that it was built into an inexpensive app I already purchased to supplement the iPad's mediocre eReading abilities: GoodReader.

The notion of using iTunes to get files over to the iPad is itself revealing. I'd never used iTunes before, but its name rightly suggests music, tunes. So you can get your downloaded MP3s etc. to the iPad, and it will also transfer photos. But how about transferring a PDF file, or a folder of PDF files? No, but there's an app for that. Matter of fact there are many apps for that, some with 1 star ratings, some with 4 star ratings. Buy one and try it out. If that doesn't work, buy another and try that one out (I have no idea how you uninstall apps, but presumably I'll learn the secret handshake for that after I've bought several redundant apps.


And remember: I bought the iPad as an e-book reader, a reader for all my Microsoft office, Open Office, e-Pub, and PDF files. I also plan to compare it as an e-Reader to Plastic Logic's Que Pro.  How well does the iPad work natively as an e-book reader? What about the app I picked? Details about that later, as I begin learning more. Full details in an upcoming review comparing both products. Assuming I get my hands on a Que, which appears again to be behind schedule.


 

NOTE: Unfortunate Plastic Logic QUE ProReader update. As of late June, and announced on Que's LinkedIn group, Que has stopped shipping the ProReader due to changing market realities. I pointed out a couple of those in my interview with their Marketing staff: Lack of color (not a complete Que-killer IMHO), and lack of support for any browser (a REAL Que-killer). The lack of browser means you must purchase subscriptions for online products such as the WSJ even if you already have a subscription. And if you have subscriptions to some niche publication for which they don't offer a Que version, you're out of luck.


 

I wish Que luck, and I'm told I'll get an eval hardware copy when it is ready, but the rumor on the street is that this could be several years in the offing. As I said in my recent column, "the clock is ticking" and –in this case—time is not Que's friend.


 

But back to the iPad.


 

Here are a couple sneak preview pictures. First, here is a screenshot of one slide from a recent presentation I gave.



 

And here is how it look rendered on my iPad (with both the native iPad viewer and with the $1 app). Picture isn't great but it was an early Sunday morning shot in natural light with my Canon EOS. (Figuring out how to do iPad screenshots isn't easy; I found out by Googling. Once you know the multiple secret handshakes, it is easy.


 



 

What's wrong with that picture? Quite a few things as you can see.

I'll try to ratchet down my curmudgeon index a bit before my next iPad post.


 


 

 

 

I Hate Comcast

Oh, did I mention I hate Comcast?
Unfortunately I must have broadband even if I'd gladly give up their cable "services." Rich media is also content, and it is really unfortunate to be at the mercy of this cable company for accessing that content without constraint, including adding our own (subscription-free) devices to use it.

I spent 4 separate days waiting for them to fix the problem they caused with their little "digital to analog" gadget.

Do you hate Comcast too?

Believe it or not, there is a Facebook page devoted to (and named) just that: I hate Comcast. If you'd like to read the details of my rant and those of others, click here.

About all any of us can do is to press our elected senators and congresspeople to vote FOR Web Neutrality. Reduce cable strangleholds as much as we can. I'd do the same, but unfortunately --living in DC-- I have no senator or representative.

Sunday, April 11, 2010

E-Books, E-Readers, and Peak Oil


Huh? Isn't this kind of like mixing oil and water? Not really.

For related reasons, having to do in part with resource constraints, the cost of print subscriptions continue to rise, even sometimes becoming prohibitive. After 40 years as a continuous WSJ print subscriber, I canceled my print subscription. It cost nearly $400/year, and I already have an online subscription that costs around $100. The WSJ is great, but not $400/year great, especially in this economy, when I also have the online edition. So I cut the cord and went completely on-line. With online access I can of course search, save articles, print them to PDF, the way I used to clip print articles. My paper press archive goes back 30 years, but PDF lasts forever, right? Another advantage of online news: the news is always fresh. Besides I'm helping save the environment, or at least I hope so. I reduce the number of plastic bags (that you can't recycle); I eliminate the need to recycle the paper itself. There is even a potential business advantage to the right e-Reader: It can preserve into the indefinite future the opportunity to view important documents. What could be nicer?

Well, there are advantages to print. Print never crashes. You can read print even when broadband is down or out of reach. You can fall asleep in a chair, drop the newspaper, and not have to buy a new one. Can't do that with a laptop. Print is very easy to read, indoors or out in bright sunlight. And print graphically rich, uses color, and is still more familiar and comfortable. Spouse says "I miss the WSJ print edition." Oh Oh.

I tell her to wait, I'll find an e-Reader that is nearly as good or even better than print. It will meet my kind of Turing test for print: doesn't crash, very portable etc. but also preserves the benefits of online: Searching, always current. iPad is here; Plastic Logic's Que reader is coming. We'll find something (but haven't bought anything yet). Now the limitations begin to appear, both from others' reviews and my own discussions with vendors.

iPad is ever so cool, has Apple's trademark usability, color… what could be better? For one thing, it tries to be everything a netbook can be, way more than just an e-Reader. I don't care if it can run my iPhone apps because I don't have an iPhone. In fact, I don't want to be nickled and dimed (more like dollared) to buy lots of little apps to fill in iPad's gaps (like being able to print or use a USB). And iPad doesn't run Flash, which is commonly used on many web sites, including the online WSJ. This feels a little like the "microsofting" of Apple. You can run anything, but not without add-ons that may not play well together. So I can buy another iPad-custom WSJ subscription, right? And do I do that for every subscription? Oh well, at least iPad has a (downsized) browser, so I can get to the WSJ in some fashion if I decide to spring for an iPad. But what about the other constraints? Early reviews say that that beautiful 1 and one half pound product begins to feel very heavy after a while, even can make your wrists hurt. And what kind of netbook wannabe is only single-threaded?

So I've now talked with a marketing rep from Plastic Logic about Que, and expect to get an evaluation device as soon as they become available. Yes, I know it will not display color (hey, the WSJ didn't start using color until it became common in other print editions). And it is very light and also cool in its own way – even has a screen that is more book-like, roughly 8 ½ y 11 inches. It reads virtually every document format known to humankind, and has huge amounts of space for all my books. But wait: It doesn't do flash either, and apparently has no browser, not even a limited one.

Maybe I misunderstood. And maybe when I finally get my hands on Que, I'll discover other advantages that cancel out the negatives.

Or quite possibly there is no perfect e-Reader. I'm guessing that's the case, since this is the real world. And if that's the case, I have to figure out exactly what a document is, and what attributes are optional (like Flash). That is no easy decision, since it requires peering into the future and guessing exactly what I'll be willing to do without.

And that's where the similarity with Peak Oil comes in. Liquid fossil fuels provide 95% or so of the world's transportation fuel needs. Yet liquid fuels will eventually run out, and before they do, they will become erratically less available and more expensive. So we'll also have to figure out which transportation options are critical, which optional. I'm guessing SUVs are optional, and public transit is critical.

And this may even have some bearing on e-Readers: they depend on electricity and broadband. Those are critical resources too, right?

What's your guess, about which transportation choices are critical and which are optional?

What's your guess about what constitutes the essence of a document, so it can be preserved and read generations hence?

Monday, January 18, 2010

Justifying eDiscovery Systems

As I said in my Information Insider October 2009 column, "The landmark 2006 Federal Rules of Civil Procedures Rule 26 and its updates make all electronic stored information (ESI) subject to legal discovery, and ESI continues its unbridled growth." Given the nation's increasing litigiousness, and the exploding amount of electronic information everywhere that could be subject to subject to 2006 FRCP rule 26, I am surprised how little we've heard about such litigation. Is it simply that our attention is elsewhere (whether the US Health Care debate, 2 wars, Global Warming –or is it Global Cooling?, the earthquake in Haiti…)? Or is eDiscovery yet another ticking time bomb that will burst onto the news when we least expect it? Well the vendors supplying eDiscovery solutions have plenty to say about that.

And what is special about eDiscovery? Why not just buy the very best search system available, and use it to do all the "e-lectronic" discovery that you want? After all, isn't it all about "search"? I spoke with Ursula Talley, VP Marketing of Stored IQ, to gather expert opinions on this subject. Here are excerpts from her comments about this, which I find pretty illuminating.

First, "Enterprise Search and eDiscovery Search technology do share a set of core capabilities, specifically crawling, indexing and searching data across a multitude of various applications and storage systems. Enterprise Search is designed to assist knowledge workers with information access and retrieval. The end result is that a user can find some files with information that can help that user complete a task." So what's the difference? Ursula went on to say "eDiscovery Search is designed to support a workflow that can be legally defended in court. The end result is a set of data files that is preserved (saved to a new, target location without any changes to the metadata and recording every system and location for each data file was originally located)." This kind of quarantining of content goes over and above what you can do with any enterprise search system. Moreover, she says that search performed by eDiscovery systems must also be very robust. Such eDiscovery searching can require queries with between 25 – 300 search terms. Moreover (for those of you who have ever posed a complex query on an enterprise search system, then went to have a cup of coffee while you waited for the result to return) eDiscovery search must be able to copy large volumes of content that has been found, "if necessary hundreds to thousands of gigabytes, without disrupting user productivity."

While it's at it, robust eDiscovery systems such as those from StoredIQ can provide de-duplication of email and user files (saving space and attorney time pouring over the same redundant files), while keeping a record of every location where those items originally resided – in case the judge asks. Lastly, searching just email systems can be a real pain, since they are so big and are threaded. Even the best search often is like sorting through low-grade ore, tons of it. eDiscovery systems also can extract both metadata and content from email and export this into a database format that can be queried and re-used into legal document review applications.

So how do you go about justifying the purchase of an eDiscovery system? Not by claiming you can add features to an existing or new Enterprise Search system. Instead, focus on the other features that you'll need if a lawsuit comes a calling. Unfortunately, getting your eDiscovery house in order may be like getting your electronic records management house in order – really hard to justify until after the lawsuit. Still, at least you can avoid the trap of thinking that Enterprise Search can do all you need to find and quarantine your information for a credible eDiscovery defense.

Tuesday, October 06, 2009

And Now For Something Completely Different

... actually 8 things. What is different is that I normally use this blog for details that I couldn’t squeeze into my eContent Magazine column, Info Insider. The eight things I’m referring to are in AIIM’s recent (free) e-book describing the eight reasons you need a strategy for managing information.



John Mancini has a knack for writing simply, and this e-book (free for the downloading here) is well done. Although it is 95 pages long, don’t be put off by that; the pages are small ;-). Not only that, but the content, distilled from various “8 things” blogs, provides truly useful perspectives on Information Management. Here’s one gem from the section “Tidal Wave of Information.”

“A study by IDC a few years back concluded that there are currently 281 billion exabytes of information in the Digital Universe. So how much is this? Well…an exabyte is a million million megabytes. Thanks a lot. To put it in a bit of perspective, a small novel contains about a megabyte of information. So in other words, the Digital Universe is equal to 12 stacks of novels (fewer if the chosen novel is a big fat one like Harry Potter 6 or one of those Ken Follett Pillars of the Earth deals) stretching from the earth to the sun. So it's a big number, whatever it is.”
Go ahead, download a copy and enjoy the read.

Tuesday, May 05, 2009

It's TAXonomy Time


TAXonomy Time – Why the interest in Taxonomies?

I'm hearing the word “taxonomy” more and more often in ECM projects, often uttered by business people in the same sentence as “metadata.” Can it be that business people are becoming comfortable with these terms? If you know you've got a serious information overload problem, where do you start with taxonomies to tame and organize your content? Everybody starts with Excel for metadata and Visio or similar graphical tools to sketch out taxonomies. Those tools are available, sometimes free, and well understood. But they are fundamentally static. Do you need more? What are some best practice and alternatives?

As part of my latest column “It's TAXonomy Time” in EContent Magazine, I spoke with Carol Hert, PhD., Chief Taxonomist and Consultant for Schemalogic Inc. to get her take on trends in taxonomy projects. Here are my questions and Hert's responses.

1) What is the state of client awareness of the value and urgency of developing taxonomies? What is the trend – use the Gartner “hype cycle” stages if you’d like. Do you see increasing interest in taxonomies, and –if so—why? Is the “information explosion” itself motivating this interest?

We typically work with large corporations that have already developed and deployed multiple taxonomies across their organizations. These companies are well aware of the cost and limitations of trying to manage these taxonomies in a dynamic environment that includes many consuming systems. Some of the organizations we work with are focused on taxonomy harmonization-integrating single-use taxonomies into one or several related taxonomies that can be utilized enterprise-wide.

We continue to see increased interest in taxonomies with the further proliferation of SharePoint and other collaboration systems, the need to increase the efficiency of the information worker, and the continued interest in enterprise information findability. Also the need to meet compliance requirements for large amounts of unstructured information continues to increase the need to govern and manage information more effectively.

2) What are typical approaches to taxonomy development:

a.
Use an existing taxonomy only

b.
Build on existing taxonomies

c.
Enterprise versus single-application (tactical) approach

d.
Use tools not available from current application vendors (e.g., EMC Documentum) for possible use with multiple vendors, or vendor-specific tools?

Our customers usually have multiple taxonomies deployed across their organizations. They have issues with managing and coordinating multiple taxonomies, especially in a dynamic environment. The first thing we do is to collect these multiple taxonomies and model them in our metadata management platform. We can then work with the customer to connect and optimize these taxonomies and then extend them as well. Some of our customers approach this from an enterprise wide perspective, while others choose to focus on a single department, function or business process and then expand.

Because complexity increases as number of business stakeholders expands, most organizations are working to achieve a balance between the optimal goal of enterprise-wide taxonomies and single-application taxonomies. All our customers use SchemaLogic’s metadata management platforms to build and manage their taxonomies. Our systems are designed to allow customers to model enterprise-wide taxonomies and publish those taxonomies to multiple applications such as SharePoint and Documentum and well as to search engines such as FAST or auto-classification systems such as Teragram.

3) What trends do you see in the evolution of taxonomy development? In supporting technologies (such as SOA or SaaS)

There continues to be a need to manage taxonomies in a more dynamic way. The need to collaborate across the enterprise, locate and share information, and improve information governance at the same time is putting pressure on organizations to develop a more flexible approach to managing information. The distributed nature of SOA and SaaS architectures puts further pressure on companies to establish a enterprise with taxonomy that can be accessed by multiple applications.

4) What are best practices for developing taxonomies? What are some approaches to avoid?

Books could (and have been written on this topic), but a short list of Best Practices might include:

  • Understand the ultimate uses to which the taxonomies will be put (there is no one perfect taxonomy).
  • Incorporate business and technical stakeholders in the development process to assure that the final product will met requirements.
  • Conduct a “taxonomy”audit prior to developing any new taxonomies to understand what already exists and might be leveraged.
  • Consider taxonomy maintenance and governance during development processes to assure that the taxonomy is able to be maintained and there are clear lines of responsibility.
  • Look for externally available taxonomies but be cautious as they have not been designed for the particular goals of the organization in question. Participate in industry-wide organizations where taxonomy development efforts might be occurring.

5) Are there any emerging or existing standards other than ISO 2788 for developing or expressing taxonomies? Is ISO 2788 relevant (I gather it is oriented towards human indexers) and who tends to use it?

ISO2788 is relevant in terms of providing extensive guidance into term forms, and other such matters. Since most organizations work in networked environments and want to transfer taxonomic information electronically, most will need to explore approaches to structuring taxonomic data for electronic transmission. Some of the standards to be aware of are RDF, OWL, Topic Maps, and SKOS. Additionally, since taxonomies might reside in metadata repositories, standards such as ISO 11179 may be relevant.

6) What are some common exports from taxonomy tools (e.g., Excel)? Are there any common formats for importing existing taxonomies or developing them in taxonomy tools? For example, are there XML DTDs or Schemas?

CSV is a good common base line as some organizations still manage a number of their taxonomies in Excel. Some taxonomy management vendors have XML formats (such as we do) but these may be proprietary and need some translation into an XML format another application could use. Standards such as RDF, OWL, and Topic Maps might be used in this context as well.

7) Can you provide client case studies?

Yes. We have published several customer case studies and would be happy to work with you on additional case studies in the future.

Now About Tools

1) What are typical costs for acquiring and implementing taxonomy products?

The costs of taxonomy products varies greatly based on the particular application. Simple taxonomy modeling tools can cost less than $1000. While enterprise wide taxonomy management and governance systems can cost over $500,000. These larger systems provide highly scalable modeling capability, complete change management and governance, integration to full suites of enterprise applications and metadata compliance monitoring. We have deployed systems that range in price from less than $50,000 to over $1M.

2) What are three key features in taxonomy tools; what are three unique features in yours?

Three key features:

1. Support for a variety of relationships between terms (should at least be able to support the term relationship types specified by ISO2788).

2. Allow unlimited hierarchical structures.

3. Provide import and export features.

Three unique features in ours:

1. Extensive change management component that enables changes in taxonomies to be automatically subjected to governance.

2. Set of productized connectors that automatically can provide updated taxonomy information to consuming applications. In addition, the ability to create custom connectors.

3. Ability for end-user administrators of the interface to create custom properties on terms and taxonomies.

3) How would you assess the current state of the art for automatic classification features?

Auto-classification systems continue to improve, but still lack the precision and accuracy provided by a managed taxonomy. Taxonomies have been found to be useful frameworks upon which an auto-classification system can be developed rather than have the auto-classifcation tool start from scratch. A combination of taxonomy management to provide structure and manage term relationships combined with auto-classification methods has proven to be the most effective solution.

4) Do you provide “connectors” to work with enterprise content management systems such as EMC Documentum and Microsoft SharePoint?

We provide connectors that allow our customers to publish taxonomies out to subscribing systems such as Documentum and SharePoint. We also publish taxonomic metadata to search engines, auto-classification systems, portals, and other enterprise applications

---
So there you have it from an expert. And if you happen to use -- or be interested in using Documentum or SharePoint (or both), here's a way to move beyond graphical tools and spreadsheets to manage and leverage your taxonomies.

Sunday, March 08, 2009

CMIS - EMC's role and vision for the future

First off, what on earth does CMIS stand for and why should any content management person care? Here's the easy part, what it stands for: "Content Management Interoperability Services." What is promises is a way for customers (vendors, and others) to begin allowing useful sharing of content between different vendor repositories. That is a huge thing, since right now most companies have several, maybe hundreds (and maybe they don't even know how many) different document repositories they have under their enterprise roof.

To write my column on this subject ("Building Content Bridges") I interviewed EMC and Day software. The former one of the original writers of the specification; the latter a vendor that is keenly supportive of content management standards. The following notes are taken from my EMC interview.

On the 23 rd of October, 2008, I spoke with two representatives from EMC about the emerging standard CMIS: Patricia Anderson, Sr. Marketing Manager, Documentum Platform Marketing, Content Management & Archiving and Dr. David Choy, Sr. Consultant. "CC" below refers to my comment on statements in the interview -- "Content Curmudgeon."

I was curious about the timeline for CMIS to be implemented (assuming it succeeds), and why CMIS is important either to EMC or to the content management space in general. Following are my notes from that interview.

Dr. Choy: Nobody knows how long the process will take, but about a year or more for a full-fledged standard. There were eight companies participating with validating the current version of the CMIS spec for interoperability (IBM, EMC, Microsoft and five others). The eight proved that the spec could be used to assure interoperability. After that the team sent the proposed standard to OASIS. The formal process for discussing the standard takes time, but in the meantime for EMC we intend to make the prototype available for the public to play with.


Security has administrative issues (mechanisms proprietary to each vendor) and also in the runtime space; security policies reign. CMIS security and access control is out of scope at this point. Each vendor has its own security model. In the near term, that is outside the scope of CMIS. Security policy is now reduced to the lowest common denominator (CRUD), but every vendor supports those.

---

CC: By CRUD, Dr. Choy means the basic four operations, Create, Read, Update or Delete. Every content management system provides at minimum those same operations. How they determine who can do those things is a separate issue, and CMIS assumes each system manages its own security in its own way. If the administrator of a CMIS-compliant system gives you one of these rights, then from your own CMIS-compliant system you can access and perform operations on content in that system.

---


Patricia: One of the questions is “ what caused the need for this standard in the first place?” But organizations would set up more than one repository platform, perhaps departments or as the result of M&As. We realized that it was difficult getting to this other information. This also hampered development that was cross-divisional or cross-platform. Then with Web 2.0 mashups, it became even more difficult to leverage use of information. ECM folks realized that it was a hindrance that affected all vendors. We looked at different standards but wanted a standard that was platform-agnostic and services-based, to unlock information in different repositories. Serious discussions began in October 2006. Other committees like IECM tried to develop such standards, but they needed to start fresh.”


David: iECM is an AIIM consortium that tried to create something similar to CMIS. That group wasn’t set up for highly technical interoperability standards. Very little concrete results occurred. iECM is still looking at best practices and standards, not technical areas.

---

CC: Clearly you need both and without either there is no bridge between the repositories.

---

Patricia: For users, CMIS can expand the available applications and open the market for developers to write cross-repository applications. It is an open protocol and supports all repositories that support the standard. This provides customers lots of investment protection.


David: Enterprise Content Integrated Services is an example of an application that can facilitate cross repository work. Federated search, mashups, business process workflows across repositories.


Patricia: This is the first and only web services standard. An insurance company could have separate subsidiaries across the world, and writing to a standard would enable access and update to the repository information. A distributed environment such as a franchise would also facilitate sharing of information outside each organization.


Patricia: The 3 originals were the first tier; then we included others such as Alfresco (participated), Oracle, SAP, OpenText; now Day Software. This standard is comparable to what SQL did for databases years ago.


David: The importance is how widely a standard is adopted. The spec is publicly available. Interested parties (after technical committee is formed) can send comments to the technical committee. They’d need to join the technical committee. Enterprise customers (the first group) can benefit from CMIS and need to tap into different repositories. The second group is between repositories and vendors, allowing them to access each others content. The third group interested in CMIS is Independent Software Vendors.


Patricia: Another way customers benefit is from having a broad suite of applications for their vertical markets, since a developer could develop for all.


David: Road maps for CMIS are difficult because CMIS is not a full-fledged standard yet. My rough guesstimate would be about a year, after the standard is released. We do intend to make prototypes available for the public before then, and those would be built on Doc Foundation Services. So those interfaces are close.


Patricia: This proposed specification is already 2 years in development and vendors have done interoperability testing. We didn't just send paper to OASIS, working prototypes. “What should I do today?” When you are evaluating the specification, when you go to your next purchase or RFI, ask if vendors support that standard.




Saturday, January 10, 2009

Enterprise Search Summit Program

Do any of you feel like you can't keep up with the latest trends in search, or you just feel like you could wring more value out of your investment but aren't sure how? Or maybe you don't get the connection between Web 2.0 and Search? Whether you are responsible for your Intranet, your commercial site, or the various repositories inside your firewall, I heartily recommend the annual Enterprise Search Summit to be held this May in NYC.

I've attended this in the past, as a paid attendee (my "day job" employer considered it that worthwhile!), not gratis as a columnist for eContent magazine which is part of the Information Today Inc. portfolio. Michelle is the editor for eContent and designs/runs the Search Summit. I like this conference a lot. To learn more, click here.

Sunday, September 07, 2008

XML 10th Anniversary

In an upcoming Information Insider column, I invite XML to an intimate party where we can celebrate its 10th anniversary. I also invited Alexander Falk, CEO of Altova, and an XML aficionado if ever there were one (here's his blog http://www.xmlaficionado.com/ Here are some of the questions I asked Alexander as background for the column. I hope you'll find this interview interesting. After all, celebrating a "double digits" anniversary doesn't happen often. Alexander's responses to my questions are shown in blue text.

Question: The XML Recommendation is now 10 years old. XML led to hundreds of additional specifications, yet its adoption rate in publishing and word processing software (and XHTML in web pages) seems slow. What is your assessment of XML adoption, and what do you see for the next 10 years?

Ten years is a mighty long time to make forecasts for – my crystal ball is only rated for 2-3 years max…
What we’ve seen with XML over the last 10 years is a huge adoption in all areas that are data-centric, rather than content-centric. XML has become the lingua franca of data exchange and interchange and has made a whole class of enterprise applications possible, because you can now move data fairly freely between disparate systems.

The benefits of XML in a pure content-creation scenario – be it publishing, word processing, Web design – are only realizable if you have a large amount of content and use it with some content management system. That is not something that most small- or medium-size businesses would do, and that has, I believe, let to a somewhat slower rate of adoption in those areas.

Question: OOXML is essentially “ rich text format” expressed as XML rather than leveraging existing XML standards such as MathML. MS Office is expensive; OpenOffice (based on ODF that leverages other XML standards) is free. MS Office maintains office share. What gives?

This is an interesting conundrum. From a purely academic perspective I would agree with your statement that leveraging existing XML standards is desirable. But the reality is that 95% of the world’s office documents are MS Office documents today, and people want to continue working with those documents – and want to reuse the content that exists in those documents in other applications, and by opening the file format up and having them be XML-based rather than binary format, such reuse is now possible. I can tell you from our experience that we have received countless requests from our customers that they want to be able to work with OOXML documents, and not a single request for ODF. Also, when I look at e-mail that I receive from others, I have yet to encounter a single e-mail that came with an ODF attachment. I don’t necessarily like Microsoft’s near-monopoly on the office market, but to deny its existence and standardize on a file format like ODF that nobody actually uses in the real world doesn’t make much sense either.

Here we disagree a bit; my question to Alexander followed by his response.

Question: OOXML (which today looks like it will become an ISO Standard) is still essentially just an XML expression of Microsoft’s internal word processing format, “Rich text format.” What value does such a use of XML provide to potential applications?

Actually, I need to disagree on that one. OOXML is not just RTF in disguise. OOXML includes separate and distinct markup languages for expressing word processing documents, spreadsheets, and presentations. The wordprocessingML is somewhat related to RTF because it is based on a similar concept (runs of characters with styles applied to them), but that is where the similarity ends. We found that it is very easy to use XSLT (or XQuery) to extract content from either wordprocessingML or spreadsheetML documents in OOXML that were created in Office 2007 (or other OOXML compatible apps), and likewise it is very easy for us to generate OOXML content in both of those formats from our applications. For example, our data mapping tool MapForce makes it very easy for people to map data from a variety of data sources (including EDI, databases, Web services, XML, etc.) into spreadsheetML documents that they can then open with Excel 2007. Likewise, our stylesheet design tool StyleVision, makes it very easy for people to produce stylesheets that render reports from XML or database data not just in HTML or PDF, but now also in wordprocessingML for use in Word 2007.

Still, what is new in OOXML that didn't exist in earlier editions as Rich Text Format? And if 2007 simply uses XML as a replacement for RTF, I don't see the added value. Sure, you can search for table captions (if you want), but the richness of ODF is not there and won't be (can't be, due to compatibility with earlier versions).

Question: HTML 5 seems like a step backward from XML and XHTML. Is this a sign of eroding support for XML? One reason for HTML 5 (to quote the W3C) is “new elements are introduced based on research into prevailing authoring practices.” Wasn’t XHTML sufficient, or maybe too difficult for “ prevailing authoring practices”?

I’m afraid that the reality is that a lot of HTML is still created by hand: people creating some HTML in Web-tools like Dreamweaver or other HTML editors and then going into the HTML and messing around in it in text editing mode. Since those tools have been very slow to enforce XHTML compliance, people continue to generate sloppy HTML pages, and so there is unfortunately a real need out there to at least standardize on what authoring practices exist in the real world.

The much better approach is, of course, to generate XHTML by means of an XSLT stylesheet from XML source pages, which is what we do, e.g., for the http://www.altova.com/ Web site.

Question: XQuery is a standard co-developed by the developers of SQL. What’s your prediction for widespread adoption and use of XQuery?

I initially thought that XQuery had a lot of promise, too, which is why Altova was very quick to provide an implementation of XQuery in our products, including an XQuery editor, debugger, and the ability in our mapping tool to produce XQuery code. However, we’ve found that the adoption of XQuery in the real world is happening much slower than we and many others had anticipated. I think that one of the issues is that there isn’t yet a clear and consistent XQuery implementation level and API across all database systems that people can rely on. The beautiful thing about SQL is that – for the most part – you can throw the same SQL query against an Oracle, IBM DB2, SQL Server, or even MySQL database, and you will get back the same result. The same is not true for XQuery yet, and until we reach that level of wide-spread adoption in the database servers, it has no chance to be as widely adopted by database users and application developers.

The reality is that we see a lot more interest in XSLT 2.0 from our customers than XQuery.

Sad but true Alexander. I had high hopes for XQuery but I don't hear much about it these days.

Question: Will XBRL be one of the “next big things” leading to a major use of XML by investors via a new set of prosumer applications? Enterprise processes and financial systems? What role will XQuery provide in these contexts?

I do indeed see XBRL as being the next big thing. The fact that both the Europeans and the SEC are mandating XBRL for financial reports from publicly listed companies will be a huge driver of XBRL adoption on a global scale. I am convinced that XBRL will be essential in financial systems and will find its way into enterprise applications fairly swiftly. When it comes to the use of XBRL by investors as prosumer applications, I’m a little bit more skeptical. It is certainly clear that investment professionals will use XBRL to better compare data between different companies in a certain market and to derive some key financial figures much easier than before, because the financial reports don’t have to be re-keyed into their systems. But I don’t think that this effect will transcend the investment professionals and become easily available for consumers anytime soon. As to what role XQuery will play: it might play some role, but I’m thinking of XBRL more as a standardized data transport mechanism and am expecting investment firms to map the XBRL into their internal decision-making and analysis applications and do the querying there.

On this we agree. This might be XML's first great opportunity to transform significant amounts of content -- and the processes to generate that content -- outside the tech doc arena.

Question: I know some subscribers to online financial services are wondering if they will be able to supplement (or even skip) certain of these services by analyzing sets of XBRL files themselves. What are the practical limitations to such analysis? Is there an inherent limitation to max numbers of XBRL files that can be XQueried at once?

There aren’t really any limitations that I’m aware of. The problem is more one of: how will you use the data? An investor who is very accounting-savvy can probably easily use XBRL to extract some key financial indicators for a company and compare several possible investment candidates in an industry group. But most investors I know rather want the key financial indicators automatically calculated by somebody else rather than directly work with the raw XBRL data. So I am skeptical that individual investors will be able to skip their subscriptions. Augmenting them is, however, a possibility and I indeed see the ability for some people to get a more in-depth look at some numbers than what they can currently get from Bloomberg or similar services.


Saturday, February 16, 2008

Update on Office 2007 Compatibility etc.

Julie ("funnybroad") has updated her slide show about her Office 2007 compatibility findings. Here is an excerpt from what she said:

I've replaced my original Office 2007 Compatibility Mode Confusion paper on slideshare.net with an updated version. I had to delete and re-create the existing one, so the link to it from your blog is now broken (click here for Julie's updated info)....everything has been re-tested with Service Pack 1, and sadly, compatibility still sucks. So go to the new link, not the older one.

-----

While I'm on the subject of Office 2007, when I tested and reviewed the product I was happy to see a weird longstanding behavior removed: You print a document, then exit and are asked if you want to save changes. Most people simply "yes," fearing they forgot whatever change they'd made and don't want to lose it. Others say "no" thinking they made a change inadvertantly and don't want it to stick. Well, I was happy to see that dumb "feature" removed, but recently --several automated patch upgrades later, I guess-- I see the "feature" is removed. So we've got compatibility with pre-2007 suites, but this is one compatibility feature they could have dropped and it would have made the product better.

Monday, January 21, 2008

How Green Are Your Documents

Over the past 6 months, I've seen some of my hunches about growing awareness of environmental issues and concern about fossil fuel supplies (and prices) confirmed. Although oil never did close at $100/barrel, prices are sky high by anyone's estimate. In the autumn of 2007 I tried a different theme in my Information Insider column – one that I believe has never been done. I laid the groundwork for this series with the EContent 100 annual issue, in a column titled “ Content 2.0 Converges.” I titled the follow-on column in this series “ How Green Are Your Documents?” (the editor since changed that to “ It Ain't Easy Being Green” -- a fine alternative). I sent out queries to a variety of vendors for any thoughts they had about their products and the green theme, and waited. And waited. And began to think that this was the craziest idea I'd ever had and wondered how I'd meet the deadline with a different (unplanned) column in case this didn't pan out. Then the vendors began to respond, all except Google, but I blame that on the difficulty of finding the right contact there rather than Google's lack of interest – since Google is indeed showing itself to be very green indeed.

Who did respond? Adobe, MarkLogic, and Olive Software – the latter a vendor I'd never heard of but found (yes) with a Google search. And it was an avalanche of interest.

Let's start with Adobe. One obvious Adobe product is Acrobat, which has become a default electronic document standard, bulked-up with collaborative features in version 8 with Acrobat Connect, formerly Macromedia’s Breeze web conferencing but now integrated with Acrobat. I get the idea that web conferencing can cut down travel and thus save travel and carbon costs, but I was looking for more, and Adobe provided it. First, they've done as Google and now Microsoft have also done: begun adding online documents to their product set. In this case, Adobe acquired Buzzword, a web-based text editor. Interesting, but not the green lead I was looking for. Then it got interesting.

Adobe's new AIR (Adobe Integrated Runtime) lets web applications run offline – key, IMHO, to assuring the acceptance of online, collaborative documents and reducing the use of paper (with all the energy savings that implies). AIR is a cross-OS SDK, a mashup of Flash, HTML, Ajax, etc. AIR can target applications to the desktop and get the rich abilities expected in local clients plus the web. The key here is that you get persistent presence on the desktop, offline/online with re-synching of web content when you go back online. Traditional media has been moving to the web for some time; now the web is also moving to the desktop, with traditional functions on a browser or paper migrating to the desktop. All financial documents for example could give you reports, etc. and also perform applications that would require paper, such as loan applications required swapping excel spreadsheets, etc.

Developers (not end users) are beginning to develop AIR-basedonline-offline catalogs --that bane of the mail box. You download the application that would include the catalog, navigate through them, sort and search, flag them within the application and get notifications when available (reminders when back in stock). As you walk through the catalog, you could add electronic notes. You could share them with friends etc., send them an email with the relevant information. Collaborate on different desktops. Adobe says that Linux support for AIR is coming.

A 10 MB PDF catalog could be the whole size of the AIR application, and with progressive images or assets on demand, could make the “catalog application” smaller than the PDF.

Hot AIR? Yes, but in a good sense – reducing global warming in its own way.





Monday, July 09, 2007

Office Suites and XML - Vendor feedback

In my latest Info Insider column, I mentioned contacting two vendors to get their take on the impact of the two major office suites, OpenOffice/Star Office 8 (ODF) and Microsoft Office 2007 (OOXML), using XML internally. The vendors I contacted were Altova and MarkLogic. Here are the questions I asked them, followed by their responses.

Now that OpenOffice and Office 2007 both use XML natively, what new opportunities are there for analyzing or transforming Office documents?

Do you have any examples of customers using your products (or those of your technology partners) to analyze or transform OpenOffice/StarOffice or MS Office 2007 documents, leveraging their use of XML?

In essence, both vendors seem poised to provide ways for customers to extract extra value from
their document repositories, although the current state is a “ chicken and egg” problem. For now, there are no office document repositories, so there is no rush to buy new products to extract this value. However, sooner or later the enterprise chickens will be forced to lay the XML eggs (see below).

MarkLogic

Following are the responses from MarkLogic, specifically John Kreisa, Director of Product Marketing for MarkLogic. Regarding opportunities for analyzing or transforming Office documents (whether ODF or OOXML), John says:

"Microsoft’s choice of XML as a core form for Office 2007 means that everybody using Office will be authoring directly in XML – Office becomes a direct means for creating XML content. We believe there is a significant opportunity for customers to leverage the ever-increasing amount of XML content by combining Office 2007 with an XML content server, like MarkLogic. Doing so will allow users to exploit the XML within the content in two ways. First they can combine all their content into one common repository, which is the first step to getting more value from the content. Then second, they can build content applications to repurpose the content, dynamically publish the content in new ways, and perform analytic functions they haven’t been able to do before.

Loading all of their content into a content server lets organizations analyze their entire content in new ways including understanding the term frequency, word counts, page counts etc, and understand the relationships within the content like citation analysis between articles and many other areas of analysis. What we typically see is that once organizations take a platform approach to their content they immediately find new ways to exploit it and generate new business opportunities."


Of course this begs the question “When will there be enough XML content to put into a repository, since adoption rates are currently low even though as users upgrade to OOXML or switch to ODF, they will generate documents for this repository. And in the case of OOXML, if users decide to stick with Microsoft they’ll have no choice but to upgrade, since sooner or later Microsoft will stop releasing free security patches to its earlier office products.

Kreisa confirmed the problem of the current adoption rate in his response to my second request for examples of customers using MarkLogic products (or those of your technology partners) to analyze or transform OpenOffice/StarOffice or MS Office 2007 documents, leveraging their use of XML:

"While Mark Logic does not currently have any customers using MarkLogic Server with MS Office 2007, we do anticipate that as adoption of Office 2007 increases, our customers will leverage the XML content they create with Office 2007 by combining it with MarkLogic to create new content, repurpose existing content into multiple formats, and republish this content, and to mine the content to find previously undiscovered information.

Our senior VP of products demonstrated our Office 2007 related capabilities in a general session at our User Conference in May, and the audience were very impressed – lots of nodding and clapping. When people see what we can do it generates interest in upgrading to Office 2007.

We have not heard much from our customer base regarding OpenOffice. However Mark Logic’s fundamental value proposition remains the same. We can load, query, manipulate and render the XML from StarOffice in the same manner we do for Microsoft Office 2007.

In response to your question about how presentational XML facilitates text analytics in Microsoft Office, it really depends on the goal of the user. Highly marked up XML can complicate or confuse tools that are not capable of handling this kind of deep XML. MarkLogic Server, on the other hand, can easily handle this kind of content and separate the markup from the text. For example, if a user wants to know how many places a certain word is in bold or how many words are tagged as <title1> style, we can help with that kind of analysis. We see this as potentially relevant for technical documentations organizations, for example, who want to make sure that they have consistency across their different documents."

Altova

Altova is the vendor who created the famous XML Spy product line, providing lots of ways to create, analyze, and manipulate XML on desktop PCs. Here are responses to the same questions from Alexander Falk, President, CEO and Co-Founder of Altova.

"Organizations save vast amounts of information in Microsoft Word documents and Microsoft Excel spreadsheets, but until now, that content could not be re-used in an extensible, programmatic way. With the Open XML document formats, that data is now standards-based; and the new capabilities in Altova XMLSpy allow developers to extract, edit, query, and transform XML data from within documents that use Office Open XML Formats - the new file type used by the 2007 Microsoft Office release - to make the data highly interoperable and easy to process. This provides huge advantages to business people and application developers.

Because XML Spy's support for Office Open XML was released only a few weeks ago, its too early to provide feedback."

I followed up to ask about the issue of XML quality in the two office suites, and whether or not one offers greater potential for leveraging the new XML internals. Office 2007 is almost exclusively presentational, while OpenOffice goes beyond that with support for additional standards, Scaleable Vector Graphics, MathML and XML Forms.

"Yes, that is an old argument. In an ideal world, the content authors would be motivated to create content with semantically meaningful tagging, e.g. Docbook or DITA. But the reality is that in today’s world most content is created in Office documents, so it is better to be able to extract and process that content with Office Open XML, than to continue to wait until all content creators use semantically meaningful tags. Furthermore, the Office 2007 Open Office XML formats are not just for Word documents. Extracting data from the millions of Excel spreadsheets that get created and processing it further in XML opens the door to a huge opportunity for information reuse and repurposing."

So there you have it. OOXML will likely have the largest installed base. In fact, the Massachusetts Information Technology Division (ITD), (the agency that essentially stuck its finger in Microsoft’s eye) has released a new draft of its Enterprise Technical Reference Model This draft now includes OOXML as an acceptable open format. The discussion period will end on 20 July 2007, but I’m betting the draft will become approved. 20. For an expert insight into the issues with the Massachusetts ITD, go to:

http://www.consortiuminfo.org/standardsblog/article.php?story=20070702101415578&mode=print

And there are still persuasive arguments that OOXML is fundamentally inferior to ODF, and how that plays out over the next several years will be abstractly fascinating to watch -- if only the future of our office document content weren’t so important. I’v e got my opinions on the XML quality issue, expressed in my Information Insider columns at EContent Magazine for some time. Here is O’Reilly’s take on the issue.

http://www.onlamp.com/pub/a/onlamp/2007/06/14/achieving-openness-a-closer-look-at-odf-and-ooxml.html .

It is right for both the above vendors to profess no preference over one format or the other, since both suites use XML and their products can and will work with each. Still, quality and openness matter. We’ll see how this plays out.


Friday, July 06, 2007

More Evidence of Content 2.0 - Blogging with StarOffice 8

Sun Web Logging! I just received a Blog publishing plug-in to Star Office Writer called Sun Weblog Publisher (Go to sun.com/products-n-solutions/edu/solutions/staroffice.html for details about StarOffice 8). I am publishing this blog entry using the Weblog Publishing tool. I just installed it and already I'm in love with this. I have to admit, when I first heard about the product from Sun's PR, I wasn't quite sure why I'd want it. Then as I thought about it, the many reasons became very clear. Among the reasons:


  • Use the word processor interface that you're accustomed to and use many times each day.

  • Create your blog offline, and publish it when you're ready to.

  • Leave your HTML skills at the door (and use them when you really need to, but in a robust environment such as DreamWeaver).

  • And hey –let's admit it-- it's getting hard to remember the each blog's username/password pair. (I have a database of over 400 passwords – more than most people, I'm sure-- but every one you don't have to remember is a big help.)


I tried to include a screenshot from a portion of the Sun Weblog brochure: a great picture of the ants carrying big leaves is a perfect metaphor for the blogosphere.

Apparently you can't do that with this tool, even though Blogspot allows you to upload an image from either a location on the web or from your local computer. Oh well, a minor thing and this after all is version 1.0. For a list price of $9.95, the Weblog Publishing tool is till a terrific value.

One last thing -- this tool is powerful, and lets you blog to different blogs on the same blog server (like blogspot) or on different blog servers. You can even download a posted blog entry, edit it, and push it back to the blog. Nice.

Tuesday, April 03, 2007

Office 2007 Packaging

This weekend, I received the official MS Office Professional 2007 package, the same that consumers or SMBs would get when they buy the product. Now I admit I have trouble with contemporary packaging of all sorts -- razor blades, anything that is meant to prevent shoplifting, or the electronic equivalent of bootlegging software, especially anything with the Microsoft label. I completely sympathize with Microsoft's aggressive stance vis-a-vis bootlegged software. However, I've seen a couple of things lately --including the packaging for MS Office-- that I think goes a bit over the top.

First there was the prompt to download important security updates. It turns out, that that was a piece of software to determine whether or not my copy of Windows was genuine. Of course it was, since I was using review software that I'd received from Microsoft, but I think that procedure is a bit devious.

Now on to the more physical side of security: The package I received containing MS Office 2007 Professional. There were two sticky labels, one on the top and one on the body, indicating I should pull the one on the top and then somehow open the package. Problem was, pulling the top tab looked like it would damage the license key that was firmly affixed to the top. So I tugged and pulled, did my best not to damage anything, then moved on to the main seals. After much tugging (and using heavy-duty shears to cut what looked like a pop-rivet on the side), I realized that this package is intended to swivel downward, getting you to the software and manual. Inside and attached to the inside packaging was a set of graphics about the contents (Excel, Word, etc.) with a headline "Manage analyze and communicate..." I can't tell you exactly what the rest of the headline was, because to read it I'd have to bend and maybe break the outside plastic shell that houses the swivel-down housing with the CD, manual, etc.

Truthfully, this packaging looks like it was built by a committee, and "Security" got to veto "Ease of Use."


Sunday, March 18, 2007

HELP - online only?

So far as I can see, there are two ways to activate HELP in Office 2007 applications: The tiny little question mark in the upper right side, and the old standby F1. Both seem to get you only Microsoft online help. What happens if I lose (or temporarily do not have) an online connection? Am I stuck?

Actually, this de-emphasized HELP suggests to me that Microsoft believes the new ribbon interface is so clear that you won't need help. And secondly, that if you need help, you always have a broadband connection. I'm not sure both assumptions are true.

Anyway, here is the answer to the question I received from Microsoft's rapid response team:
"...the question mark button does, by default, take the user to Microsoft Office Online for Help. But if you click on the “Connected to Microsoft Office Online” button at the bottom of the box, you can choose to “Show Content Only from this computer” and that allows the user to see help content when not connected to the internet.

Super Tooltips, a feature of the Microsoft® Office Fluent™ user interface in the 2007 Microsoft Office system, integrates Help topics into the product in a new way to make the experience easier for new customers. One of the main problems that people have with Help topics today is that they don’t know the terms used to describe features. Super Tooltips are integrated help tips that provide quick access to information about a command directly from the command’s location in the Office Fluent user interface. One of the biggest innovations that began with version 2003 was the opportunity to get feedback on our Help. We use this feedback to drive the development of new content and to update current help topics as needed. We also use the feedback to identify trends that assist us in creating better Help for new features. The 2007 Office system Help was developed with the benefit from having feedback from thousands of Office customers."

So if you think to go to the bottom of the HELP box, you'll figure out how to get information without being online.

The right brain, aesthetic side of Office 2007; the Left Brain view of PowerPoint

I've been so caught up in looking at new features, or where my old Office features now reside, that I've overlooked one important point. Microsoft has clearly expended a lot of effort to achieve two important benefits: truly elegant set of styles (themes) along with some new fonts, and much improved consistency between the various Office programs.

Across all the Office programs, there is a new, softer look that subliminally suggests you can approach the new system comfortably. That new friendliness is true across all the applications, from the Outlook email program through Word, Excel and PowerPoint (the only applications I'm currently evaluating in Office Professional). This right-brain improvement in all the applications isn't something you'll see in feature checklists, or if you do it may sound like marketing hype. But seeing is believing.

On the consistency side, one of my past pet peeves with the Office suite was inconsistency. If I created a table in Word and imported it into PowerPoint, or vice versa, I'd always get something different. And if the direction was from Word to PowerPoint, I'd get a "dumbed down" table because that's all PowerPoint could handle. Now I've found that you can create complex (and beautiful) tables in PowerPoint with all the horizontal and vertical cell merges you want, and export them accurately into Word. Not only the power of the new table model, but this consistency across applications, is a very strong inducement to work with the new Office 2007.

Now it is Sunday evening, and it appears I spoke too soon about how well PowerPoint uses styles and its consistency. It appears that if you have existing objects (e.g., bullets) and change the bullet styles via the master, it doesn't apply those changes to the existing bullets, only to new ones. In fact, PowerPoint Help confirms this: "It is a good idea to create a slide master before you start to build individual slides, rather than after. When you create the slide master first, all of the slides that you add to your presentation are based on that slide master. However, if you create a slide master before you start to build individual slides, some of the items on the slides may not conform to the slide master design."

Thus IMHO, PowerPoint styles miss the point: A truly styles-based system would let you change your mind about the look and feel of a particular kind of object, then apply your change to all the objects of that type.

One last observation: Your editing view of PowerPoint slides, where you can see and edit the objects, is called "Normal." Why not "Draft," since Microsoft changed the name "Normal" to "Draft" for MS Word. Another inconsistency. Naughty naughty.