Introduction  Software 
Hardware  Business/Social 
Hardware/Network Architecture
Long term digital data preservation is an inherently difficult task,
because computer/network hardware tends to be relatively short lived
(5-30 years) and usually becomes obsolete before it fails. Our
approach to build a robust, long lived system is to create multiple
dimensions of duplication and design for easy forward migration of
components and architecture.
Techniques for Reliable Data Storage
Hardware Reliability
Duplication of data (symmetric redundancy)
RAID 5 per server
Two mirrored servers per location
Diversity of copies (asymmetric redundancy)
Media
Hard Drive
CD-ROM
Norsam disk
Location
San Francisco
Ely, Nevada
Additional (undisclosed) location
Rigorous Migration (Regular staged replacement of hardware)
Expansion of Mirror sites
Mirroring Architecture
Hardware Component Candidates
RAIDZONE sells
the
RS15-R1200 server that stores about 1TB, has dual CPUs running
Linux, up to 1GB of RAM, and dual ethernet ports. Thus, we can likely
store the live repository and run the server software on the same
machine. Because it uses cost effective IDE hard drives, it only
costs about $21K in a 1TB configuration.
JVC has a 100 disk archival CD-ROM jukebox, the MC-2102 for
about $12K with an internal color
printer. It can record about 80GB/day in even a minimal
configuration, and looks like is a good choice for mass recovery of
data as well as custom one-off backups (e.g. for a particular client).
Mitsui offers an
printable, archival CD-R with a claimed lifespan of 100 years for
less than $2 apiece in bulk (about $3K to backup the 1TB server)