Wednesday, March 28, 2012

Moving (slowly) to WordPress

Don't know if it will be permanent or not, but I've had so many issues with MS Word and Blogger that I've decided to give WordPress a try.  (Hey, next step will be to use WordPress for my website!).

Go here to view my latest WordPress blog post!

Friday, March 16, 2012

ECM systems and Search - Either-Or?

So you want to manage your content and then you want to find it too. Sounds reasonable. Must you pick just an enterprise content management (ECM) or a Search system, one or the other but not both? That seems like a crazy proposition. Although most people (myself included) consider Search and Enterprise Content Management solutions complementary, you’d be surprised what some others think, especially when there is a shortage of funds. In this era of tight budgets, it isn’t surprising to find some who reason “well, if I can’t manage my stuff, at least I can find it when I need to manage it.” A shortsighted view at best, you may rightly think. If you don’t manage your files, and you do search and fine them, then you may find a great many versions and not be sure which one is the authentic version. Oh yes, you can always check the file creation date. And if that file is a Word document, and someone clicked “save” after printing a Word document, then that date itself isn’t meaningful. So you can find things but you can't be sure about the value of what you've found.

And what about the reverse: Managing your files but having no search solution? Luckily that is less of an issue. Every ECM system, from SharePoint to Documentum to Alfresco to name-your-favorite has a built-in search system. Why? Because it doesn’t do any good to store and manage you’re your files if you can’t locate them. Findability is easy when there are only a few hundred files and a few people storing them. That would be the first hour of the first day when an ECM system is first set up. Findability gets hard really fast after that.

Instead of thinking of the "ECM or Search" choice as an either-or, think about the critical benefits each provides the other. Search and ECM together are better than the the combination of each separately. To get the most out of an ECM system (even with its built-in search system) it is worth thinking about how search improves ECM and vice-versa. Ultimately, neither is effective without some good understanding of this symbiotic interplay, and each requires a commitment to improving its use, oversight and continuous improvement. When you appreciate this interplay, and the responsibilities that follow after you get an ECM system running, then you might even consider stepping up to a search system that goes beyond what you normally get for “free” with ECM. Interestingly, two of my favorite ECM systems, Documentum from EMC and Alfresco, both now use Lucene/Solr, and this open source search system doesn’t skimp on power. For more about the benefits of search to ECM, and ECM to search, go to my Guident blog post. You’ll be surprised how these two systems work together, the whole being bigger than the sum of their parts. I’d welcome your comments and thoughts there (and here too).

Saturday, August 27, 2011

Social Media in the Cloud

Here are some random musings as I wait for IRENE amidst the clouds and rain… and realize how to use Word simply to create my post for blogspot and yet retain style control... but now on to the post itself.

Most people don’t need to care and couldn’t find out anyway where their Facebook page is in the cloud. They just click on the link facebook/myfacebookname and voila: It’s there, ready for me to read my wall messages, upload some more pictures, or divulge personal information.

If however, you’re contemplating creating a corporate social media system like Facebook (whether inside or outside the firewall), cloud-based storage is appealing. If employees flock to it and upload vacation videos, you may need more storage than you expected and you can scale up rapidly. There is far less need for creating a complex infrastructure if you lease the cloud-space.

Cloud-based social systems outside the firewall can provide a myriad of benefits: Keeping closer to constituents, gauging outsider sentiments of corporate performance, and as a great public relations tool – especially in the face of a disaster such as the BP Oil Spill.

There are some concerns to consider before embarking on such a social media project however. As with any cloud-based application, know your vendor and consider vendor continuity. Is the vendor financially secure? And even if secure, how do you get your application (and more importantly, all your data) should the vendor get acquired and the new owner decides to eliminate that service?  How will your system be backed up, and –if you want to pick up your social marbles and go elsewhere—what format will you get your data in? Will you get both the content and the information about the content? Metadata can be as important as the content it describes.

If you share the cloud space, can you guarantee privacy (if needed) and maintain control over the application and data? Can you assure that you can put sections “on hold” if you receive a formal eDiscovery request for information? Suddenly that fun social media becomes “Electronically Stored Information,” and you will have to decide which constitute records you cannot destroy and maintain free of changes. Of course this assumes you considered the records retention aspect of that media to begin with.

It is hard to decide which is more challenging: Setting up the social media application in the cloud, or  deciding governance policies to oversee the cloud content.  I guess that’s why they call it cloudy.

Sunday, July 24, 2011

Email Dysfunction – Information Overload

Dysfunctional email

How long did it take for automobiles to standardize somewhat their controls, the way you drive them? Although it was long before my time, Charles Duryea built a three-wheeled automobile in 1893. This was not the first automobile (in 1803 Richard Trevithick of England built and ran a steam-powered carriage, called the Puffing Devil).  Antique cars today are generally considered to be those 45 years old or older. Still, if you get into an antique auto you can easily figure out how to drive it. Email of one sort or another has been around nearly as long as the automobile, if you consider digital Morse Code telegraph messages. You could consider the first widespread use of email to be Unix mail in 1972. I remember using a similar Data General email system in the late 70s. You could argue that the adoption of email is far greater than the adoption of automobiles. So why is each version so different? Comcast email does not look or work the same way as Google Gmail by a long shot. Delete one message in a related group ("conversation") and you delete the whole group. Use the Microsoft Outlook client, and things are even more dissimilar. Download Gmail to Outlook and you get lots of surprises – "sent mail" appears in the Outlook inbox. Use the Gmail Outlook client and things are similar but differ in important details. Are these differences due to each vendor wanting to maintain a competitive advantage? Why this dysfunctionality?

And all that just deals with usability. There are other "oops" events. Email vendors losing email. Differing SPAM policies so you either don't get email or you find it –well after it is of any use. Come on guys and gals, take a hint from the automobile industry (or kitchen appliances or house paints or… any modern product you can think of). You don't need to read the owner's manual to drive away a rental car. After nearly 40 years, this communication medium should be easier and consistent to use, and far more dependable.

And then there's information overload

I get about 200 emails each day. Some are due to "subscriptions" I never initiated; some are spam; most are authentic and worth reviewing. Fearing I may want to use some press releases, I create a full-text searchable collection every six months of over 1,000. SharePoint in my day job has become – to use AIIM's expression—a digital landfill. Government Computer News in October of 2010 wrote an article "You want the data? You can't handle the data!" GCN quotes a joint Avanade-Accenture study of over 500 C-level executives. 62 percent said they are "frequently interrupted by incoming data." 56% feel overwhelmed. Yet in the same study, 61% said they want faster access to data. How about those 20,000 search results from every Google query (delivered in .1 seconds, no less)?

A work colleague bragged to me recently that she had over 850 LinkedIn connections. I asked her if she was joking, and she said "absolutely not." She was proud of this achievement, and I'm guessing soon she'll be bragging that she has hit the millennium mark. My RSS reader shows 50 or so articles I might be interested in.

I think we are all addicted to information, myself included. There is no way we can consume all this information, much less pick out every critical needle in the haystacks. Luddite solutions may be part of the answer. I recently disabled text messages from my cell phone. At a recent company meeting, we were asked to vote on an issue via our cell phones, and those of us who couldn't do that were asked –a joke—to walk the "hall of shame" to come up and use a paper ballot. It was surprising how many fellow-luddites made the walk.

As with any addiction, curing it or at least making it manageable is painful. Maybe one way to start is to "just say no" to some of these information channels. We can always say we didn't get the email.

Tuesday, May 24, 2011

Thoughts on Enterprise Search Summit 2011

It has been a couple of weeks since ESS 2011 in NYC, and I've had a chance to collect my thoughts about the conference.
The conference, as usual, was a don't-miss opportunity for anyone interested in search systems, search projects, or practical ways to improve search satisfaction. There were more attendees than last year. I found a surprising dichotomy among the attendees and vendors. As to the attendees, they were either new to search or long-time search professionals. Vendors included only one big name (Google, who else?) and many smaller vendors from both the US (such as Basis Technology, H5, and Vivissimo) to vendors from Europe and Australia (Raytion –Germany—and SpringSense –Australia).
As to the themes, they were many. Some could have been from a conference 5 years ago. The enduring themes dealt with such topics as Search projects, and bringing failing search projects back on track (my own presentation) to newer themes of integrating search with social media and search on mobile devices. I was very surprised and pleased to see eDiscovery as a topic and to see at least two vendors offering eDiscovery products and services (H5 and Clearwell).
About 1/3 of the attendees at my presentation requested a copy of Guident's free "Findability Checklist," now expanded with attributed quotes anyone can use in their own search presentation. I've expanded that to include general ECM quotes too. If you want one too, send me an email.
Other observations:
  • Google was somewhat cocksure about its position in the commercial search market. This is the second year in a row when I found their presentation hard to understand, hard to hear (speakers need professional presentation training), and as much marketing as new material. Nice water bottles at their booth though ;-). I still believe their search appliance inside the firewall is up against competition, from vendors small and large.
  • Improving Search user satisfaction. These systems must be intuitive, and in this respect, Google sets the standard.
  • No Bing. No surprise.
  • Delivering search on mobile devices, although that is still a nascent theme inside the firewall.
  • Personalizing search also remains a holy grail.
  • Search systems still have plenty of differentiation, and there is plenty of room for vendors (such as SpringSearch) to add value to others' systems.

Best of show, IMHO, was RealStory's "Search Vendors in 30 minutes." The only disappointment (and a big one) is that they did not make their presentation available after the show.
All in all, a show worth attending.

Friday, December 10, 2010

Electronic Discovery Reference Model –Production and Presentation

Well we now in the final phases. You’ve identified, preserved, and collected all information relevant to the triggering event. You’ve winnowed it down to a potpourri of email, Office files, rich media, and highly complex things like CAD files . Now you’re in the final stretch, getting that Electronically Stored Information (ESI) that both sides in the lawsuit agree is relevant and must be handed over. If you’re lucky, both sides have agreed to have a settlement conference where the matter can end there. If you’re not lucky, you proceed with the two final steps.

How do you complete these last two steps, Production and Presentation? What unexpected hazards should you watch out for so you can be confident that you’ve done everything legally required and can step aside, letting the prosecuting and defense lawyers take over from here?



Here is where opposing attorneys “meet and confer,” agreeing (among other things) on the format that the relevant material will be produced for presentation, how to preserve or select metadata, and redaction rules. Production of ESI is not as simple as you might think, for two reasons – one that is just emerging, and one that is obvious.  Let’s take the obvious one first: Format. And to keep things simpler than they are becoming, lets restrict this discussion to email, Office documents, and complex design files. You may not be so lucky to be able to exclude social and rich media. Rich media can be everything from recorded voicemail to video. Social media means everything from Text messages to blog entries, and often is produced collaboratively.
For simplicity’s sake, the EDRM standard presents four options:
  • Paper (gasp!)
  • Quasi-Paper
  • Quasi-Native and
  • Native.

Essentially these options range from the easiest to produce to the easiest to use.  Printing to paper may seem both archaic and easy. Paper is fragile but stable and usually easy to produce. However, volume can be an issue (think time and money), and paper is not easy to search with anything but eyeballs. Still, paper is easy to understand and often a favorite.  However, if the information is not in your country’s most common format –- 8 and 1/2 inches in the U.S.—you will have to decide whether to tile information (hard to use) or find and pay for services to print non-standard sizes like architectural drawings. And if your firm is international, you’ll have to deal with several dimensions of paper.  And of course if Computer-Aided-Design or other multi-layer artifacts are in the mix, “printing” the CAD file just got immeasurably harder. Tiling or printing on large format paper can make it hard to correlate the layers. And always remember: If the files can have attachments or links to other files, those other things are also in the collection you must produce.
Quasi-paper refers to a high-fidelity digital format that looks like the paper original but can offer the advantage of speedy production and full-text (or simple string) searching.  The two most common forms of quasi-paper are TIFF for images, and Adobe PDF.  TIFF can be slightly higher fidelity than PDF but does not allow for text searching. And TIFF –even when compressed—can be very large.  Yet once again, problems arise that are similar to those of paper.
Even more easy to produce, and possibly easier to use, is “Quasi-Native” digitization.  Quasi-native formats are often exports from databases to flat files, ASCII, CSV or similar formats. They have the advantage of not requiring the opposing attorney to scrounge up the application to match your particular database or other complex information. Yet clearly the export formats, while searchable, do not capture much structure and can be quite hard to use.
Lastly, the easiest format to produce and use, with limitations, is native format. This simply means the format –whatever it is—that applications containing the information normally use to create, edit and store the files. This is the most useful format of all, and may require the opposing side to acquire applications (of certain versions) that they do not have. Additionally, native formats may be impossible to redact or to select and extract “pages” from the native files for the case.
What else could go wrong? After all, you’ve covered the formats and earlier you secured those files. The biggest issues could be where those files are located.  If inside a content management system, access controls could slow down the process. If inside an email backup file, they could be very difficult to selectively find and manage.  Those examples are inside your firewall. Suppose, like many firms you’ve begun storing your files or using applications in a public or multi-tenant web cloud. Who cares? It’s the same data, right? Yes, but where is the cloud, are there many clouds, and what agreements with the cloud service providers do you have for eDiscovery?

If your firm is located in only one country, the job gets easier but can still be difficult. Let’s take the example of your cloud service being only in the U.S. There are many privacy laws that could be barriers to complete production. These could affect both you and your cloud provider. Examples include the Health Insurance Portability and Protection Act (HIPPA), the Gramm-Leach-Biley Act (GLBA) also known as the Financial Services Modernization Act. And if your information is kept internationally (or if your cloud vendors are), each country has its own privacy law counterparts, most different from the others.


Finally, there is presentation, where the rubber meets the road.  The definition provided by EDRM.NET is “Displaying ESI before audiences (at depositions, hearings, trials, etc.), especially in native & near-native forms, to elicit further information, validate existing facts or positions, or persuade an audience.” EDRM.NET decomposes this final stage as a series of processes, from start to finish: “Develop Presentation Strategy / Plan, Select Exhibits / Format, Prepare and Test Exhibits, Present Exhibits, and finally Store / Maintain Exhibits.”
You can think of this as the lifecycle of a CSI show, running from the initial script and production planning through the actual courtroom drama. And since the presentation materials may be needed in an appeal, or at least must be treated as an official record, you must keep and maintain them in ways that meet recordkeeping and legal requirements.

No wonder some defendants simply throw in the towel and agree to pay at the very beginning.

In Summary

The final two steps in the EDRM model depend on the quality and quantity of ESI collected in the earlier steps. There are many dimensions of risk, some of which are legal, organizational, and IT subject matter experts.  Responding to an eDiscovery mandate requires many disparate resources and skill sets.  The proverbial proactive ounce of prevention applies here. Having an ongoing, documented, and understood information management process at the beginning is key. Part of that plan, an up-to-date file plan, is also required. Do these proactively and you’ve invested an ounce that can become worth a pound of after-the-fact cure.

Wednesday, November 24, 2010

Electronic Discovery Reference Model - Processing, Review and Analysis

Next Phases: Processing, Review and Analysis

Well we’ve made it half way through the Electronic Discovery Reference Model and are now at the stack of blue processes: Processing, Review, and Analysis. The triggering event has occurred, and it is now time to execute your eDiscovery plan as cost effectively as is possible.

What distinguishes this point in the eDiscovery process is its cost. Even though you’re half way through the model and nearly at the point where you’ve culled the content down to the point of increasing relevancy, this is the ouch point. Luckily you’ve culled the collection somewhat, because this review is labor intensive, considering relevance, privilege, confidentiality, and then tagging what you’ve found. According to Attorney Ralph Losey at the recent EMC Writer’s summit, the cost to process and review each digital file averages $5. With 16,000 files on average per gigabyte, that’s $80,000. Suppose there are 10 custodians (those responsible for controlling and granting access to enterprise electronic files and protecting them) and each is responsible for 50,000 emails with attachments. That’s a half million pieces of electronically stored information (ESI). You can see how costs can mount up.

Yet the processing phase is where lots of problems can occur and the money meter is spinning. IT (custodian for enterprise information) and attorneys speak very different languages. Information –in the gigabytes—are all over the place. Moreover, general purpose ECM systems in general do not manage email well – and email could provide a modern-day paraphrase of Willie Sutton: “because that’s where the evidence is,” the treasure trove for litigation.

Not to harp too much on the Information Management part of the model, but it is critical to prepare for triggering events by having a litigation response plan or LRP. Without a well managed Information Management plan, you will have too much information to deal with. You do not want to produce information that is outside the discovery timeframe (more time to process and potential legal exposure). You also want a credible, effective way to comply with the request for information in the triggering event. If you have been proactive regarding eDiscovery events, you already have a litigation response plan or LRP. Among other things, this plan includes a data gathering strategy for ALL your Electronically Stored Information. In essence, the LRP includes creating a map of all your information systems. Among other things, this includes network drives, content management systems, PDAs, cell phones, personal computers, email, even instant messages and text messages. You also need an efficient backup strategy that is more than just backing up email PST files once a week to tape. And of course, document all your LRP work and keep it up to date, saving each version so you can refer to the scope of the plan for the timeframe relevant to the triggering event. If it isn’t documented, opposing counsel may rightly assert that it doesn’t exist.

I believe that, among other things, you need the right kind of tools to manage email. You also should have enterprise policies for using email that reduce legal exposure as part of an overall litigation response plan. You can’t overestimate the importance of that very first step, Information Management – getting control over all information, including email, to reduce effort and cost throughout the cycle. Ursula Talley of StoredIQ told me that eDiscovery is a systemic concern, and that “. . . proactive information management will reduce costs and time of eDiscovery” while providing better long-term results. Analysis requires the use of search systems, but this goes way beyond a Google Appliance. She suggests full-text indexing for relatively static content and “thindexing” (an index of metadata) applied to content such as email belonging to the custodians.
Even assuming you have somehow transformed your Digital Landfill (unmanaged and unmapped content of all kinds) into a digital Greenfield (managed and mapped), you must still be sure that any underlying eDiscovery tools you have to work through all your content is scalable. Andrew Cohen, EMC’s VP for Compliance Solutions, said to me that the critical functions for your eDiscovery technical solution include: The ability to scale across 100’s of Terabytes of content within your enterprise; the ability to apply that tool to all content repositories (even laptops and desktops), and a consistent set of policies managed by Legal and IT.

How much does an eDiscovery system cost? Assuming you’re merely adding to your technology stack from a single ECM vendor, the costs go way beyond product costs. Again, Cohen said that the cost breakdown over an eDiscovery implementation is roughly as follows: product cost (35% of the total), implementation costs (10%), planning/policy making (50%), and annual maintenance 5%.

In Summary

Efficient, credible processing, reviewing and analyzing your electronically stored information requires a litigation response plan, developed collaboratively with your legal, records management, and IT organizations. The more conscientious you have been in the earlier stages, the less costly will these phases be and the more likely you will survive the eDiscovery challenge. It is expensive and time-consuming to clean up your digital landfill and step-by-step get it under control as a virtual Greenfield, those costs pale compared to the costs of ad hoc reaction to eDiscovery events.