Monday, July 24, 2006

Secret to Content Longevity?

The Wrapper, not the Gum
One goal of FDsys is to preserve content for future access and repurposing. So naturally the questions are: 1) Are you using XML? and 2) If so, what DTD or schema? I was expecting to hear "WordML" (Microsoft Word's XML standard, which is really more an XML expression of its RTF or Rich Text Format; I was also hoping to here OpenDocument, the rich XML office standard on which OpenOffice is constructed. The answers surprised me, but in retrospect should not have. FDsys's plan is to take the content in whatever format it arrives --preferably in a reasonably small number of common formats-- and to concentrate on the metadata wrapper itself, for accessibility. Here's what Mike Wash said.

"We have developed requirements for the information packages that will
exist in FDsys. FDsys architecture is based on the Open Archival
Information System (OAIS) model which develops the concept of
submission, archival and dissemination packages. The excerpts from the
Requirements Document will help you understand our approach to
structuring submission packages and dissemination packages."

And now the details, obviously too much for my 800-word Information Insider column. By "RD" Wash means the FDsys "Requirements Document.

"Page 31 in the RD 2.0 Document Submission Information Packages (SIP)
This section specifies the packaging details for the Submission
Information Package (SIP), and describes how digital content and its
associated metadata are logically packaged for submission to FDsys.
A SIP contains the target digital object(s) and associated descriptive and
administrative metadata. It will be the vehicle whereby content packages
are submitted to FDsys by Content Originators. The concept of the SIP in
the OAIS (Open Archival Information System) model provides a starting
point for the specification of content and associated metadata, but it does
not specify how it is packaged. It is necessary that a SIP follow prespecified
rules so that FDsys can validate and accept the content for

Associated with the SIP are three types of information:
* Content Information (digital object(s) and Representation Information),
* Packaging Information, and
* Descriptive Information.
Packaging Information is the information that binds or encapsulates the
Content Information. To accomplish this, a SIP will include a binding
metadata file (sip.xml) that relates the digital objects and metadata
together to form a system-compliant SIP. The Metadata Encoding and
Transmission Standard (METS) schema shall be adopted as the encoding
standard for the sip.xml file, and GPO will specify profiles for METS to
drive its implementation for FDsys.

Descriptive Information is the metadata that allows users to discover the
Content Information in the system.

All file components of the SIP will be populated within a structured file
system directory hierarchy and are then aggregated into a single file or
entity for transmission and ingest into the system."

Wash elaborates further:
"Page 42 in the RD 2.0 Document Dissemination Information Package (DIP)
Dissemination Information Packages (DIPs) are transient copies of digital
objects, associated content metadata, and business process information
that are delivered from the system to fulfill End User requests and Content
Originator orders. As necessary, DIPs should follow the concept of a DIP
as outlined in the OAIS (Open Archival Information System) model.

The DIP is created as part of delivery processing and digital objects may
be adjusted based on orders and requests to support the delivery of hard
copy output, electronic presentation, and digital media.

The DIP should include all digital objects and/or metadata necessary to
fulfill requests and orders. The DIP may also include a binding metadata
file that relates the digital objects and metadata together to form a
package. The Metadata Encoding and Transmission Standard (METS)
schema has been adopted for the SIP and AIP and may be used as the
encoding standard for the binding metadata file, if a binding metadata file
is created."

Standardized, format neutral, and concentrating on the information about the content rather than the content itself. That is the long view, because when you are dealing with a very large (and unpredictable) number for format types, you have to concentrate on the access and delivery of these things.

More Q&A to follow soon.