Tuesday, October 06, 2009

And Now For Something Completely Different

... actually 8 things. What is different is that I normally use this blog for details that I couldn’t squeeze into my eContent Magazine column, Info Insider. The eight things I’m referring to are in AIIM’s recent (free) e-book describing the eight reasons you need a strategy for managing information.

John Mancini has a knack for writing simply, and this e-book (free for the downloading here) is well done. Although it is 95 pages long, don’t be put off by that; the pages are small ;-). Not only that, but the content, distilled from various “8 things” blogs, provides truly useful perspectives on Information Management. Here’s one gem from the section “Tidal Wave of Information.”

“A study by IDC a few years back concluded that there are currently 281 billion exabytes of information in the Digital Universe. So how much is this? Well…an exabyte is a million million megabytes. Thanks a lot. To put it in a bit of perspective, a small novel contains about a megabyte of information. So in other words, the Digital Universe is equal to 12 stacks of novels (fewer if the chosen novel is a big fat one like Harry Potter 6 or one of those Ken Follett Pillars of the Earth deals) stretching from the earth to the sun. So it's a big number, whatever it is.”
Go ahead, download a copy and enjoy the read.

Tuesday, May 05, 2009

It's TAXonomy Time

TAXonomy Time – Why the interest in Taxonomies?

I'm hearing the word “taxonomy” more and more often in ECM projects, often uttered by business people in the same sentence as “metadata.” Can it be that business people are becoming comfortable with these terms? If you know you've got a serious information overload problem, where do you start with taxonomies to tame and organize your content? Everybody starts with Excel for metadata and Visio or similar graphical tools to sketch out taxonomies. Those tools are available, sometimes free, and well understood. But they are fundamentally static. Do you need more? What are some best practice and alternatives?

As part of my latest column “It's TAXonomy Time” in EContent Magazine, I spoke with Carol Hert, PhD., Chief Taxonomist and Consultant for Schemalogic Inc. to get her take on trends in taxonomy projects. Here are my questions and Hert's responses.

1) What is the state of client awareness of the value and urgency of developing taxonomies? What is the trend – use the Gartner “hype cycle” stages if you’d like. Do you see increasing interest in taxonomies, and –if so—why? Is the “information explosion” itself motivating this interest?

We typically work with large corporations that have already developed and deployed multiple taxonomies across their organizations. These companies are well aware of the cost and limitations of trying to manage these taxonomies in a dynamic environment that includes many consuming systems. Some of the organizations we work with are focused on taxonomy harmonization-integrating single-use taxonomies into one or several related taxonomies that can be utilized enterprise-wide.

We continue to see increased interest in taxonomies with the further proliferation of SharePoint and other collaboration systems, the need to increase the efficiency of the information worker, and the continued interest in enterprise information findability. Also the need to meet compliance requirements for large amounts of unstructured information continues to increase the need to govern and manage information more effectively.

2) What are typical approaches to taxonomy development:

Use an existing taxonomy only

Build on existing taxonomies

Enterprise versus single-application (tactical) approach

Use tools not available from current application vendors (e.g., EMC Documentum) for possible use with multiple vendors, or vendor-specific tools?

Our customers usually have multiple taxonomies deployed across their organizations. They have issues with managing and coordinating multiple taxonomies, especially in a dynamic environment. The first thing we do is to collect these multiple taxonomies and model them in our metadata management platform. We can then work with the customer to connect and optimize these taxonomies and then extend them as well. Some of our customers approach this from an enterprise wide perspective, while others choose to focus on a single department, function or business process and then expand.

Because complexity increases as number of business stakeholders expands, most organizations are working to achieve a balance between the optimal goal of enterprise-wide taxonomies and single-application taxonomies. All our customers use SchemaLogic’s metadata management platforms to build and manage their taxonomies. Our systems are designed to allow customers to model enterprise-wide taxonomies and publish those taxonomies to multiple applications such as SharePoint and Documentum and well as to search engines such as FAST or auto-classification systems such as Teragram.

3) What trends do you see in the evolution of taxonomy development? In supporting technologies (such as SOA or SaaS)

There continues to be a need to manage taxonomies in a more dynamic way. The need to collaborate across the enterprise, locate and share information, and improve information governance at the same time is putting pressure on organizations to develop a more flexible approach to managing information. The distributed nature of SOA and SaaS architectures puts further pressure on companies to establish a enterprise with taxonomy that can be accessed by multiple applications.

4) What are best practices for developing taxonomies? What are some approaches to avoid?

Books could (and have been written on this topic), but a short list of Best Practices might include:

  • Understand the ultimate uses to which the taxonomies will be put (there is no one perfect taxonomy).
  • Incorporate business and technical stakeholders in the development process to assure that the final product will met requirements.
  • Conduct a “taxonomy”audit prior to developing any new taxonomies to understand what already exists and might be leveraged.
  • Consider taxonomy maintenance and governance during development processes to assure that the taxonomy is able to be maintained and there are clear lines of responsibility.
  • Look for externally available taxonomies but be cautious as they have not been designed for the particular goals of the organization in question. Participate in industry-wide organizations where taxonomy development efforts might be occurring.

5) Are there any emerging or existing standards other than ISO 2788 for developing or expressing taxonomies? Is ISO 2788 relevant (I gather it is oriented towards human indexers) and who tends to use it?

ISO2788 is relevant in terms of providing extensive guidance into term forms, and other such matters. Since most organizations work in networked environments and want to transfer taxonomic information electronically, most will need to explore approaches to structuring taxonomic data for electronic transmission. Some of the standards to be aware of are RDF, OWL, Topic Maps, and SKOS. Additionally, since taxonomies might reside in metadata repositories, standards such as ISO 11179 may be relevant.

6) What are some common exports from taxonomy tools (e.g., Excel)? Are there any common formats for importing existing taxonomies or developing them in taxonomy tools? For example, are there XML DTDs or Schemas?

CSV is a good common base line as some organizations still manage a number of their taxonomies in Excel. Some taxonomy management vendors have XML formats (such as we do) but these may be proprietary and need some translation into an XML format another application could use. Standards such as RDF, OWL, and Topic Maps might be used in this context as well.

7) Can you provide client case studies?

Yes. We have published several customer case studies and would be happy to work with you on additional case studies in the future.

Now About Tools

1) What are typical costs for acquiring and implementing taxonomy products?

The costs of taxonomy products varies greatly based on the particular application. Simple taxonomy modeling tools can cost less than $1000. While enterprise wide taxonomy management and governance systems can cost over $500,000. These larger systems provide highly scalable modeling capability, complete change management and governance, integration to full suites of enterprise applications and metadata compliance monitoring. We have deployed systems that range in price from less than $50,000 to over $1M.

2) What are three key features in taxonomy tools; what are three unique features in yours?

Three key features:

1. Support for a variety of relationships between terms (should at least be able to support the term relationship types specified by ISO2788).

2. Allow unlimited hierarchical structures.

3. Provide import and export features.

Three unique features in ours:

1. Extensive change management component that enables changes in taxonomies to be automatically subjected to governance.

2. Set of productized connectors that automatically can provide updated taxonomy information to consuming applications. In addition, the ability to create custom connectors.

3. Ability for end-user administrators of the interface to create custom properties on terms and taxonomies.

3) How would you assess the current state of the art for automatic classification features?

Auto-classification systems continue to improve, but still lack the precision and accuracy provided by a managed taxonomy. Taxonomies have been found to be useful frameworks upon which an auto-classification system can be developed rather than have the auto-classifcation tool start from scratch. A combination of taxonomy management to provide structure and manage term relationships combined with auto-classification methods has proven to be the most effective solution.

4) Do you provide “connectors” to work with enterprise content management systems such as EMC Documentum and Microsoft SharePoint?

We provide connectors that allow our customers to publish taxonomies out to subscribing systems such as Documentum and SharePoint. We also publish taxonomic metadata to search engines, auto-classification systems, portals, and other enterprise applications

So there you have it from an expert. And if you happen to use -- or be interested in using Documentum or SharePoint (or both), here's a way to move beyond graphical tools and spreadsheets to manage and leverage your taxonomies.

Sunday, March 08, 2009

CMIS - EMC's role and vision for the future

First off, what on earth does CMIS stand for and why should any content management person care? Here's the easy part, what it stands for: "Content Management Interoperability Services." What is promises is a way for customers (vendors, and others) to begin allowing useful sharing of content between different vendor repositories. That is a huge thing, since right now most companies have several, maybe hundreds (and maybe they don't even know how many) different document repositories they have under their enterprise roof.

To write my column on this subject ("Building Content Bridges") I interviewed EMC and Day software. The former one of the original writers of the specification; the latter a vendor that is keenly supportive of content management standards. The following notes are taken from my EMC interview.

On the 23 rd of October, 2008, I spoke with two representatives from EMC about the emerging standard CMIS: Patricia Anderson, Sr. Marketing Manager, Documentum Platform Marketing, Content Management & Archiving and Dr. David Choy, Sr. Consultant. "CC" below refers to my comment on statements in the interview -- "Content Curmudgeon."

I was curious about the timeline for CMIS to be implemented (assuming it succeeds), and why CMIS is important either to EMC or to the content management space in general. Following are my notes from that interview.

Dr. Choy: Nobody knows how long the process will take, but about a year or more for a full-fledged standard. There were eight companies participating with validating the current version of the CMIS spec for interoperability (IBM, EMC, Microsoft and five others). The eight proved that the spec could be used to assure interoperability. After that the team sent the proposed standard to OASIS. The formal process for discussing the standard takes time, but in the meantime for EMC we intend to make the prototype available for the public to play with.

Security has administrative issues (mechanisms proprietary to each vendor) and also in the runtime space; security policies reign. CMIS security and access control is out of scope at this point. Each vendor has its own security model. In the near term, that is outside the scope of CMIS. Security policy is now reduced to the lowest common denominator (CRUD), but every vendor supports those.


CC: By CRUD, Dr. Choy means the basic four operations, Create, Read, Update or Delete. Every content management system provides at minimum those same operations. How they determine who can do those things is a separate issue, and CMIS assumes each system manages its own security in its own way. If the administrator of a CMIS-compliant system gives you one of these rights, then from your own CMIS-compliant system you can access and perform operations on content in that system.


Patricia: One of the questions is “ what caused the need for this standard in the first place?” But organizations would set up more than one repository platform, perhaps departments or as the result of M&As. We realized that it was difficult getting to this other information. This also hampered development that was cross-divisional or cross-platform. Then with Web 2.0 mashups, it became even more difficult to leverage use of information. ECM folks realized that it was a hindrance that affected all vendors. We looked at different standards but wanted a standard that was platform-agnostic and services-based, to unlock information in different repositories. Serious discussions began in October 2006. Other committees like IECM tried to develop such standards, but they needed to start fresh.”

David: iECM is an AIIM consortium that tried to create something similar to CMIS. That group wasn’t set up for highly technical interoperability standards. Very little concrete results occurred. iECM is still looking at best practices and standards, not technical areas.


CC: Clearly you need both and without either there is no bridge between the repositories.


Patricia: For users, CMIS can expand the available applications and open the market for developers to write cross-repository applications. It is an open protocol and supports all repositories that support the standard. This provides customers lots of investment protection.

David: Enterprise Content Integrated Services is an example of an application that can facilitate cross repository work. Federated search, mashups, business process workflows across repositories.

Patricia: This is the first and only web services standard. An insurance company could have separate subsidiaries across the world, and writing to a standard would enable access and update to the repository information. A distributed environment such as a franchise would also facilitate sharing of information outside each organization.

Patricia: The 3 originals were the first tier; then we included others such as Alfresco (participated), Oracle, SAP, OpenText; now Day Software. This standard is comparable to what SQL did for databases years ago.

David: The importance is how widely a standard is adopted. The spec is publicly available. Interested parties (after technical committee is formed) can send comments to the technical committee. They’d need to join the technical committee. Enterprise customers (the first group) can benefit from CMIS and need to tap into different repositories. The second group is between repositories and vendors, allowing them to access each others content. The third group interested in CMIS is Independent Software Vendors.

Patricia: Another way customers benefit is from having a broad suite of applications for their vertical markets, since a developer could develop for all.

David: Road maps for CMIS are difficult because CMIS is not a full-fledged standard yet. My rough guesstimate would be about a year, after the standard is released. We do intend to make prototypes available for the public before then, and those would be built on Doc Foundation Services. So those interfaces are close.

Patricia: This proposed specification is already 2 years in development and vendors have done interoperability testing. We didn't just send paper to OASIS, working prototypes. “What should I do today?” When you are evaluating the specification, when you go to your next purchase or RFI, ask if vendors support that standard.

Saturday, January 10, 2009

Enterprise Search Summit Program

Do any of you feel like you can't keep up with the latest trends in search, or you just feel like you could wring more value out of your investment but aren't sure how? Or maybe you don't get the connection between Web 2.0 and Search? Whether you are responsible for your Intranet, your commercial site, or the various repositories inside your firewall, I heartily recommend the annual Enterprise Search Summit to be held this May in NYC.

I've attended this in the past, as a paid attendee (my "day job" employer considered it that worthwhile!), not gratis as a columnist for eContent magazine which is part of the Information Today Inc. portfolio. Michelle is the editor for eContent and designs/runs the Search Summit. I like this conference a lot. To learn more, click here.