Chapter 1: Introduction 3
Data Integrity
• Small files are prepackaged with tar (on UNIX-based systems) into units of at least 10 MB in
size.
• No compression or encryption is used before sending the data to the restorer.
Data Integrity
The Data Domain OS Data Invulnerability Architecture™ protects against data loss from hardware
and software failures.
• When writing to disk, the Data Domain OS creates and stores self-describing metadata for all
data received. After writing the data to disk, the Data Domain OS then creates metadata from
the data on the disk and compares it to the original metadata.
• An append-only write policy guards against overwriting valid data.
• After a backup completes, a validation process looks at what was written to disk to see that all
file segments are logically correct within the file system and that the data is the same on the
disk as it was before being written to disk.
• In the background, the Online Verify operation continuously checks that data on the disks is
still correct and that nothing has changed since the earlier validation process.
• The storage disks in a restorer are set up in a double parity RAID 6 configuration (two parity
drives) with a hot spare in 15-disk systems. Eight-disk systems have no hot spare. Each parity
stripe has block checksums to ensure that data is correct. The checksums are constantly used
during the online verify operation and when data is read from the restorer. With double parity,
the system can fix simultaneous errors on up to two disks.
• To keep data synchronized during a hardware or power failure, the restorer uses NVRAM
(non-volatile RAM) to track outstanding I/O operations.
• When reading data back for a restore operation, the Data Domain OS uses multiple layers of
consistency checks to verify that restored data is correct.
Data Compression
The Data Domain OS compression algorithms:
• store only unique data. Through Global Compression, a restorer pools redundant data from
each backup image. Any duplicated data or repeated patterns from multiple backups are stored
only once. The storage of unique data is invisible to backup software, which sees the entire
virtual file system.
• are independent of data format. Data can be structured, such as databases, or unstructured, such
as text files. Data can be from file systems or raw volumes. All forms are compressed.