July 13, 2006

T3 7/13: Distributed Archiving for Long-Term Storage

Talks, Visitors

In mid-July it’s our pleasure to host the much-honored researcher (and former chair of the HTTP Working Group) Larry Masinter to talk about an area he’s been interested in for the past few years: long-term reliance on electronic documents. From an abstract of a paper he recently co-authored with Michael Welch on the topic:

This paper analyzes the requirements and describes a system
designed for retaining records and ensuring their legibility,
interpretability, availability, and provable authenticity over long
periods of time. In general, information preservation is
accomplished not by any one single technique, but by avoiding all
of the many possible events that might cause loss. The focus of the
system is on preservation in the 10 to 100 year time span—a long
enough period such that many difficult problems are known and
can be addressed, but not unimaginable in terms of the longevity of
computer systems and technology.

The general approach focuses on eliminating single points of
failure – single elements whose failure would cause information
loss – combined with active detection and repair in the event of
failure. Techniques employed include secret sharing, aggressive
“preemptive” format conversion, metadata acquisition, active
monitoring, and using standard Internet storage services in a novel

[Larry Masinter is] a Principal Scientist in the Office of Technology at Adobe, where I’ve been since late 2000. These days, I’m looking at product interoperability within Adobe. In the past, I’ve worked on forms technology and the problems of long-term document archives and document validity; a paper describes some work in that area.

In 2000, I had a brief tenure at AT&T Labs, where I learned a lot about the telecommunications industry, standards, and corporate politics. Before that, I was at Xerox PARC: in the ’90s, I worked mainly on document management, Web and Internet standards, and Internet-based document services; in the ’70s and ’80s, I worked on the Interlisp system (from microcode to programmer tools to the graphics environment) and the Common Lisp standard. In the early 70’s, I worked on the DENDRAL project at Stanford.