Sunday, March 22, 2020

Growing up in northern NH

Here's a little something about me and my blog, "Growing up in northern NH." See the picture of the market to the left?  That was "Lewis Market." I worked there when in high school.  I remember slinging 100 lb bags of chicken feed off the dock onto pickup trucks. 

After my first evening working for Mr. Lewis, I went to the evening showing of "Tammy and the Bachelor." Now that really dates me!

Wednesday, March 28, 2012

Moving (slowly) to WordPress

Don't know if it will be permanent or not, but I've had so many issues with MS Word and Blogger that I've decided to give WordPress a try.  (Hey, next step will be to use WordPress for my website!).

Go here to view my latest WordPress blog post!

Friday, March 16, 2012

ECM systems and Search - Either-Or?

So you want to manage your content and then you want to find it too. Sounds reasonable. Must you pick just an enterprise content management (ECM) or a Search system, one or the other but not both? That seems like a crazy proposition. Although most people (myself included) consider Search and Enterprise Content Management solutions complementary, you’d be surprised what some others think, especially when there is a shortage of funds. In this era of tight budgets, it isn’t surprising to find some who reason “well, if I can’t manage my stuff, at least I can find it when I need to manage it.” A shortsighted view at best, you may rightly think. If you don’t manage your files, and you do search and fine them, then you may find a great many versions and not be sure which one is the authentic version. Oh yes, you can always check the file creation date. And if that file is a Word document, and someone clicked “save” after printing a Word document, then that date itself isn’t meaningful. So you can find things but you can't be sure about the value of what you've found.

And what about the reverse: Managing your files but having no search solution? Luckily that is less of an issue. Every ECM system, from SharePoint to Documentum to Alfresco to name-your-favorite has a built-in search system. Why? Because it doesn’t do any good to store and manage you’re your files if you can’t locate them. Findability is easy when there are only a few hundred files and a few people storing them. That would be the first hour of the first day when an ECM system is first set up. Findability gets hard really fast after that.

Instead of thinking of the "ECM or Search" choice as an either-or, think about the critical benefits each provides the other. Search and ECM together are better than the the combination of each separately. To get the most out of an ECM system (even with its built-in search system) it is worth thinking about how search improves ECM and vice-versa. Ultimately, neither is effective without some good understanding of this symbiotic interplay, and each requires a commitment to improving its use, oversight and continuous improvement. When you appreciate this interplay, and the responsibilities that follow after you get an ECM system running, then you might even consider stepping up to a search system that goes beyond what you normally get for “free” with ECM. Interestingly, two of my favorite ECM systems, Documentum from EMC and Alfresco, both now use Lucene/Solr, and this open source search system doesn’t skimp on power. For more about the benefits of search to ECM, and ECM to search, go to my Guident blog post. You’ll be surprised how these two systems work together, the whole being bigger than the sum of their parts. I’d welcome your comments and thoughts there (and here too).

Saturday, August 27, 2011

Social Media in the Cloud

Here are some random musings as I wait for IRENE amidst the clouds and rain… and realize how to use Word simply to create my post for blogspot and yet retain style control... but now on to the post itself.

Most people don’t need to care and couldn’t find out anyway where their Facebook page is in the cloud. They just click on the link facebook/myfacebookname and voila: It’s there, ready for me to read my wall messages, upload some more pictures, or divulge personal information.

If however, you’re contemplating creating a corporate social media system like Facebook (whether inside or outside the firewall), cloud-based storage is appealing. If employees flock to it and upload vacation videos, you may need more storage than you expected and you can scale up rapidly. There is far less need for creating a complex infrastructure if you lease the cloud-space.

Cloud-based social systems outside the firewall can provide a myriad of benefits: Keeping closer to constituents, gauging outsider sentiments of corporate performance, and as a great public relations tool – especially in the face of a disaster such as the BP Oil Spill.

There are some concerns to consider before embarking on such a social media project however. As with any cloud-based application, know your vendor and consider vendor continuity. Is the vendor financially secure? And even if secure, how do you get your application (and more importantly, all your data) should the vendor get acquired and the new owner decides to eliminate that service?  How will your system be backed up, and –if you want to pick up your social marbles and go elsewhere—what format will you get your data in? Will you get both the content and the information about the content? Metadata can be as important as the content it describes.

If you share the cloud space, can you guarantee privacy (if needed) and maintain control over the application and data? Can you assure that you can put sections “on hold” if you receive a formal eDiscovery request for information? Suddenly that fun social media becomes “Electronically Stored Information,” and you will have to decide which constitute records you cannot destroy and maintain free of changes. Of course this assumes you considered the records retention aspect of that media to begin with.

It is hard to decide which is more challenging: Setting up the social media application in the cloud, or  deciding governance policies to oversee the cloud content.  I guess that’s why they call it cloudy.

Sunday, July 24, 2011

Email Dysfunction – Information Overload

Dysfunctional email

How long did it take for automobiles to standardize somewhat their controls, the way you drive them? Although it was long before my time, Charles Duryea built a three-wheeled automobile in 1893. This was not the first automobile (in 1803 Richard Trevithick of England built and ran a steam-powered carriage, called the Puffing Devil).  Antique cars today are generally considered to be those 45 years old or older. Still, if you get into an antique auto you can easily figure out how to drive it. Email of one sort or another has been around nearly as long as the automobile, if you consider digital Morse Code telegraph messages. You could consider the first widespread use of email to be Unix mail in 1972. I remember using a similar Data General email system in the late 70s. You could argue that the adoption of email is far greater than the adoption of automobiles. So why is each version so different? Comcast email does not look or work the same way as Google Gmail by a long shot. Delete one message in a related group ("conversation") and you delete the whole group. Use the Microsoft Outlook client, and things are even more dissimilar. Download Gmail to Outlook and you get lots of surprises – "sent mail" appears in the Outlook inbox. Use the Gmail Outlook client and things are similar but differ in important details. Are these differences due to each vendor wanting to maintain a competitive advantage? Why this dysfunctionality?

And all that just deals with usability. There are other "oops" events. Email vendors losing email. Differing SPAM policies so you either don't get email or you find it –well after it is of any use. Come on guys and gals, take a hint from the automobile industry (or kitchen appliances or house paints or… any modern product you can think of). You don't need to read the owner's manual to drive away a rental car. After nearly 40 years, this communication medium should be easier and consistent to use, and far more dependable.

And then there's information overload

I get about 200 emails each day. Some are due to "subscriptions" I never initiated; some are spam; most are authentic and worth reviewing. Fearing I may want to use some press releases, I create a full-text searchable collection every six months of over 1,000. SharePoint in my day job has become – to use AIIM's expression—a digital landfill. Government Computer News in October of 2010 wrote an article "You want the data? You can't handle the data!" GCN quotes a joint Avanade-Accenture study of over 500 C-level executives. 62 percent said they are "frequently interrupted by incoming data." 56% feel overwhelmed. Yet in the same study, 61% said they want faster access to data. How about those 20,000 search results from every Google query (delivered in .1 seconds, no less)?

A work colleague bragged to me recently that she had over 850 LinkedIn connections. I asked her if she was joking, and she said "absolutely not." She was proud of this achievement, and I'm guessing soon she'll be bragging that she has hit the millennium mark. My RSS reader shows 50 or so articles I might be interested in.

I think we are all addicted to information, myself included. There is no way we can consume all this information, much less pick out every critical needle in the haystacks. Luddite solutions may be part of the answer. I recently disabled text messages from my cell phone. At a recent company meeting, we were asked to vote on an issue via our cell phones, and those of us who couldn't do that were asked –a joke—to walk the "hall of shame" to come up and use a paper ballot. It was surprising how many fellow-luddites made the walk.

As with any addiction, curing it or at least making it manageable is painful. Maybe one way to start is to "just say no" to some of these information channels. We can always say we didn't get the email.

Tuesday, May 24, 2011

Thoughts on Enterprise Search Summit 2011

It has been a couple of weeks since ESS 2011 in NYC, and I've had a chance to collect my thoughts about the conference.
The conference, as usual, was a don't-miss opportunity for anyone interested in search systems, search projects, or practical ways to improve search satisfaction. There were more attendees than last year. I found a surprising dichotomy among the attendees and vendors. As to the attendees, they were either new to search or long-time search professionals. Vendors included only one big name (Google, who else?) and many smaller vendors from both the US (such as Basis Technology, H5, and Vivissimo) to vendors from Europe and Australia (Raytion –Germany—and SpringSense –Australia).
As to the themes, they were many. Some could have been from a conference 5 years ago. The enduring themes dealt with such topics as Search projects, and bringing failing search projects back on track (my own presentation) to newer themes of integrating search with social media and search on mobile devices. I was very surprised and pleased to see eDiscovery as a topic and to see at least two vendors offering eDiscovery products and services (H5 and Clearwell).
About 1/3 of the attendees at my presentation requested a copy of Guident's free "Findability Checklist," now expanded with attributed quotes anyone can use in their own search presentation. I've expanded that to include general ECM quotes too. If you want one too, send me an email.
Other observations:
  • Google was somewhat cocksure about its position in the commercial search market. This is the second year in a row when I found their presentation hard to understand, hard to hear (speakers need professional presentation training), and as much marketing as new material. Nice water bottles at their booth though ;-). I still believe their search appliance inside the firewall is up against competition, from vendors small and large.
  • Improving Search user satisfaction. These systems must be intuitive, and in this respect, Google sets the standard.
  • No Bing. No surprise.
  • Delivering search on mobile devices, although that is still a nascent theme inside the firewall.
  • Personalizing search also remains a holy grail.
  • Search systems still have plenty of differentiation, and there is plenty of room for vendors (such as SpringSearch) to add value to others' systems.

Best of show, IMHO, was RealStory's "Search Vendors in 30 minutes." The only disappointment (and a big one) is that they did not make their presentation available after the show.
All in all, a show worth attending.

Friday, December 10, 2010

Electronic Discovery Reference Model –Production and Presentation

Well we now in the final phases. You’ve identified, preserved, and collected all information relevant to the triggering event. You’ve winnowed it down to a potpourri of email, Office files, rich media, and highly complex things like CAD files . Now you’re in the final stretch, getting that Electronically Stored Information (ESI) that both sides in the lawsuit agree is relevant and must be handed over. If you’re lucky, both sides have agreed to have a settlement conference where the matter can end there. If you’re not lucky, you proceed with the two final steps.

How do you complete these last two steps, Production and Presentation? What unexpected hazards should you watch out for so you can be confident that you’ve done everything legally required and can step aside, letting the prosecuting and defense lawyers take over from here?



Here is where opposing attorneys “meet and confer,” agreeing (among other things) on the format that the relevant material will be produced for presentation, how to preserve or select metadata, and redaction rules. Production of ESI is not as simple as you might think, for two reasons – one that is just emerging, and one that is obvious.  Let’s take the obvious one first: Format. And to keep things simpler than they are becoming, lets restrict this discussion to email, Office documents, and complex design files. You may not be so lucky to be able to exclude social and rich media. Rich media can be everything from recorded voicemail to video. Social media means everything from Text messages to blog entries, and often is produced collaboratively.
For simplicity’s sake, the EDRM standard presents four options:
  • Paper (gasp!)
  • Quasi-Paper
  • Quasi-Native and
  • Native.

Essentially these options range from the easiest to produce to the easiest to use.  Printing to paper may seem both archaic and easy. Paper is fragile but stable and usually easy to produce. However, volume can be an issue (think time and money), and paper is not easy to search with anything but eyeballs. Still, paper is easy to understand and often a favorite.  However, if the information is not in your country’s most common format –- 8 and 1/2 inches in the U.S.—you will have to decide whether to tile information (hard to use) or find and pay for services to print non-standard sizes like architectural drawings. And if your firm is international, you’ll have to deal with several dimensions of paper.  And of course if Computer-Aided-Design or other multi-layer artifacts are in the mix, “printing” the CAD file just got immeasurably harder. Tiling or printing on large format paper can make it hard to correlate the layers. And always remember: If the files can have attachments or links to other files, those other things are also in the collection you must produce.
Quasi-paper refers to a high-fidelity digital format that looks like the paper original but can offer the advantage of speedy production and full-text (or simple string) searching.  The two most common forms of quasi-paper are TIFF for images, and Adobe PDF.  TIFF can be slightly higher fidelity than PDF but does not allow for text searching. And TIFF –even when compressed—can be very large.  Yet once again, problems arise that are similar to those of paper.
Even more easy to produce, and possibly easier to use, is “Quasi-Native” digitization.  Quasi-native formats are often exports from databases to flat files, ASCII, CSV or similar formats. They have the advantage of not requiring the opposing attorney to scrounge up the application to match your particular database or other complex information. Yet clearly the export formats, while searchable, do not capture much structure and can be quite hard to use.
Lastly, the easiest format to produce and use, with limitations, is native format. This simply means the format –whatever it is—that applications containing the information normally use to create, edit and store the files. This is the most useful format of all, and may require the opposing side to acquire applications (of certain versions) that they do not have. Additionally, native formats may be impossible to redact or to select and extract “pages” from the native files for the case.
What else could go wrong? After all, you’ve covered the formats and earlier you secured those files. The biggest issues could be where those files are located.  If inside a content management system, access controls could slow down the process. If inside an email backup file, they could be very difficult to selectively find and manage.  Those examples are inside your firewall. Suppose, like many firms you’ve begun storing your files or using applications in a public or multi-tenant web cloud. Who cares? It’s the same data, right? Yes, but where is the cloud, are there many clouds, and what agreements with the cloud service providers do you have for eDiscovery?

If your firm is located in only one country, the job gets easier but can still be difficult. Let’s take the example of your cloud service being only in the U.S. There are many privacy laws that could be barriers to complete production. These could affect both you and your cloud provider. Examples include the Health Insurance Portability and Protection Act (HIPPA), the Gramm-Leach-Biley Act (GLBA) also known as the Financial Services Modernization Act. And if your information is kept internationally (or if your cloud vendors are), each country has its own privacy law counterparts, most different from the others.


Finally, there is presentation, where the rubber meets the road.  The definition provided by EDRM.NET is “Displaying ESI before audiences (at depositions, hearings, trials, etc.), especially in native & near-native forms, to elicit further information, validate existing facts or positions, or persuade an audience.” EDRM.NET decomposes this final stage as a series of processes, from start to finish: “Develop Presentation Strategy / Plan, Select Exhibits / Format, Prepare and Test Exhibits, Present Exhibits, and finally Store / Maintain Exhibits.”
You can think of this as the lifecycle of a CSI show, running from the initial script and production planning through the actual courtroom drama. And since the presentation materials may be needed in an appeal, or at least must be treated as an official record, you must keep and maintain them in ways that meet recordkeeping and legal requirements.

No wonder some defendants simply throw in the towel and agree to pay at the very beginning.

In Summary

The final two steps in the EDRM model depend on the quality and quantity of ESI collected in the earlier steps. There are many dimensions of risk, some of which are legal, organizational, and IT subject matter experts.  Responding to an eDiscovery mandate requires many disparate resources and skill sets.  The proverbial proactive ounce of prevention applies here. Having an ongoing, documented, and understood information management process at the beginning is key. Part of that plan, an up-to-date file plan, is also required. Do these proactively and you’ve invested an ounce that can become worth a pound of after-the-fact cure.