Tuesday, May 05, 2009

It's TAXonomy Time

TAXonomy Time – Why the interest in Taxonomies?

I'm hearing the word “taxonomy” more and more often in ECM projects, often uttered by business people in the same sentence as “metadata.” Can it be that business people are becoming comfortable with these terms? If you know you've got a serious information overload problem, where do you start with taxonomies to tame and organize your content? Everybody starts with Excel for metadata and Visio or similar graphical tools to sketch out taxonomies. Those tools are available, sometimes free, and well understood. But they are fundamentally static. Do you need more? What are some best practice and alternatives?

As part of my latest column “It's TAXonomy Time” in EContent Magazine, I spoke with Carol Hert, PhD., Chief Taxonomist and Consultant for Schemalogic Inc. to get her take on trends in taxonomy projects. Here are my questions and Hert's responses.

1) What is the state of client awareness of the value and urgency of developing taxonomies? What is the trend – use the Gartner “hype cycle” stages if you’d like. Do you see increasing interest in taxonomies, and –if so—why? Is the “information explosion” itself motivating this interest?

We typically work with large corporations that have already developed and deployed multiple taxonomies across their organizations. These companies are well aware of the cost and limitations of trying to manage these taxonomies in a dynamic environment that includes many consuming systems. Some of the organizations we work with are focused on taxonomy harmonization-integrating single-use taxonomies into one or several related taxonomies that can be utilized enterprise-wide.

We continue to see increased interest in taxonomies with the further proliferation of SharePoint and other collaboration systems, the need to increase the efficiency of the information worker, and the continued interest in enterprise information findability. Also the need to meet compliance requirements for large amounts of unstructured information continues to increase the need to govern and manage information more effectively.

2) What are typical approaches to taxonomy development:

Use an existing taxonomy only

Build on existing taxonomies

Enterprise versus single-application (tactical) approach

Use tools not available from current application vendors (e.g., EMC Documentum) for possible use with multiple vendors, or vendor-specific tools?

Our customers usually have multiple taxonomies deployed across their organizations. They have issues with managing and coordinating multiple taxonomies, especially in a dynamic environment. The first thing we do is to collect these multiple taxonomies and model them in our metadata management platform. We can then work with the customer to connect and optimize these taxonomies and then extend them as well. Some of our customers approach this from an enterprise wide perspective, while others choose to focus on a single department, function or business process and then expand.

Because complexity increases as number of business stakeholders expands, most organizations are working to achieve a balance between the optimal goal of enterprise-wide taxonomies and single-application taxonomies. All our customers use SchemaLogic’s metadata management platforms to build and manage their taxonomies. Our systems are designed to allow customers to model enterprise-wide taxonomies and publish those taxonomies to multiple applications such as SharePoint and Documentum and well as to search engines such as FAST or auto-classification systems such as Teragram.

3) What trends do you see in the evolution of taxonomy development? In supporting technologies (such as SOA or SaaS)

There continues to be a need to manage taxonomies in a more dynamic way. The need to collaborate across the enterprise, locate and share information, and improve information governance at the same time is putting pressure on organizations to develop a more flexible approach to managing information. The distributed nature of SOA and SaaS architectures puts further pressure on companies to establish a enterprise with taxonomy that can be accessed by multiple applications.

4) What are best practices for developing taxonomies? What are some approaches to avoid?

Books could (and have been written on this topic), but a short list of Best Practices might include:

  • Understand the ultimate uses to which the taxonomies will be put (there is no one perfect taxonomy).
  • Incorporate business and technical stakeholders in the development process to assure that the final product will met requirements.
  • Conduct a “taxonomy”audit prior to developing any new taxonomies to understand what already exists and might be leveraged.
  • Consider taxonomy maintenance and governance during development processes to assure that the taxonomy is able to be maintained and there are clear lines of responsibility.
  • Look for externally available taxonomies but be cautious as they have not been designed for the particular goals of the organization in question. Participate in industry-wide organizations where taxonomy development efforts might be occurring.

5) Are there any emerging or existing standards other than ISO 2788 for developing or expressing taxonomies? Is ISO 2788 relevant (I gather it is oriented towards human indexers) and who tends to use it?

ISO2788 is relevant in terms of providing extensive guidance into term forms, and other such matters. Since most organizations work in networked environments and want to transfer taxonomic information electronically, most will need to explore approaches to structuring taxonomic data for electronic transmission. Some of the standards to be aware of are RDF, OWL, Topic Maps, and SKOS. Additionally, since taxonomies might reside in metadata repositories, standards such as ISO 11179 may be relevant.

6) What are some common exports from taxonomy tools (e.g., Excel)? Are there any common formats for importing existing taxonomies or developing them in taxonomy tools? For example, are there XML DTDs or Schemas?

CSV is a good common base line as some organizations still manage a number of their taxonomies in Excel. Some taxonomy management vendors have XML formats (such as we do) but these may be proprietary and need some translation into an XML format another application could use. Standards such as RDF, OWL, and Topic Maps might be used in this context as well.

7) Can you provide client case studies?

Yes. We have published several customer case studies and would be happy to work with you on additional case studies in the future.

Now About Tools

1) What are typical costs for acquiring and implementing taxonomy products?

The costs of taxonomy products varies greatly based on the particular application. Simple taxonomy modeling tools can cost less than $1000. While enterprise wide taxonomy management and governance systems can cost over $500,000. These larger systems provide highly scalable modeling capability, complete change management and governance, integration to full suites of enterprise applications and metadata compliance monitoring. We have deployed systems that range in price from less than $50,000 to over $1M.

2) What are three key features in taxonomy tools; what are three unique features in yours?

Three key features:

1. Support for a variety of relationships between terms (should at least be able to support the term relationship types specified by ISO2788).

2. Allow unlimited hierarchical structures.

3. Provide import and export features.

Three unique features in ours:

1. Extensive change management component that enables changes in taxonomies to be automatically subjected to governance.

2. Set of productized connectors that automatically can provide updated taxonomy information to consuming applications. In addition, the ability to create custom connectors.

3. Ability for end-user administrators of the interface to create custom properties on terms and taxonomies.

3) How would you assess the current state of the art for automatic classification features?

Auto-classification systems continue to improve, but still lack the precision and accuracy provided by a managed taxonomy. Taxonomies have been found to be useful frameworks upon which an auto-classification system can be developed rather than have the auto-classifcation tool start from scratch. A combination of taxonomy management to provide structure and manage term relationships combined with auto-classification methods has proven to be the most effective solution.

4) Do you provide “connectors” to work with enterprise content management systems such as EMC Documentum and Microsoft SharePoint?

We provide connectors that allow our customers to publish taxonomies out to subscribing systems such as Documentum and SharePoint. We also publish taxonomic metadata to search engines, auto-classification systems, portals, and other enterprise applications

So there you have it from an expert. And if you happen to use -- or be interested in using Documentum or SharePoint (or both), here's a way to move beyond graphical tools and spreadsheets to manage and leverage your taxonomies.