Snapshot (computer storage)
From Wikipedia, the free encyclopedia
In computer file systems, a snapshot is a copy of a set of files and directories as they were at a particular point in the past. The term was coined as an analogy to that in photography.
Snapshots provide a solution of a "backup window"[1] problem: a full backup of a large data set may take a long time to complete. On multi-tasking or multi-user systems, there may be writes to that data while it is being backed up. This prevents the backup from being atomic and introduces a version skew that may result in data corruption. For example, if a user moves a file into a directory that has already been backed up, from a directory that has not yet been backed up, that file would be completely missing on the backup media.
There are several approaches to backing up live data. One of the approaches is to disable write access to data temporarily (making it read-only), do a backup, then allow read-write access again. Most databases and operating systems offer a locking API with which exclusive read access can be enforced.
This approach is tolerable for low-availability systems (for example, in a small office, backups could be done every night, when no one's working with the data), but for high-availability 24/7 systems, different approaches are required. Snapshots come useful for avoiding version skew when backing up volatile data sets, such as tables in a busy database or the folder store of a busy mail server without taking offline any data storage.
Creation of a snapshot is usually an O(1) complexity task, whereas the time needed for a direct backup is proportional to the size of the data. System administrator issues a single command, thus marking a point in time. After that, users continue to work normally with full read-write access to their data, while system administrator gets a read-only access to a frozen copy of data in the past — a snapshot. Contents of a snapshot can be copied to backup media to make a normal backup (taking relatively long time) or used read-only to do any other lengthly operations, such as virus scans. Snapshots can be deleted after use (i.e. after completing backup to external media), or can be just stored as is, as a backup.
There are several approaches to implement snapshot functionality.
Some file systems, such as WAFL, fossil for Plan 9 from Bell Labs or ODS-5, internally track old versions of files and make snapshots available through a special namespace. Others, like NTFS or UFS2, provide an operating system API for accessing file histories.
Some Unix systems (including Linux and HP-UX) may also have snapshot-capable logical volume managers. These implement copy-on-write on entire block devices by copying changed blocks—just before they are to be overwritten—to other storage, thus preserving a self-consistent past image of the block device. Filesystems on this image can later be mounted as if it were on read-only media. Block-level snapshotting is almost always less space-efficient than direct file system support for snapshots. Notable exceptions are ZFS and BtrFS, where snapshots are an integral component of the file system.
Read-write snapshots are sometimes called branching snapshots, because they implicitly create diverging versions of their data. In ZFS, they are called "clones".
Shadow paging and write ahead logging are similar snapshot-like mechanisms used internally by many databases to implement transactions.
The concept of a snapshot can also be applied to data structures held only in memory, for example in the implementation of software transactional memory. A "version" of a persistent data structure is effectively a snapshot.
Some backup software provide this service as a separated option. For example, in Backup Exec, it is called Open File Option (OFO).
Microsoft provided a similar service in system level, called Volume Shadow Service (VSS) in Windows XP and Windows Server 2003. It is also included in Windows Vista as Shadow Copy.
Time Machine, developed by Apple and included with their Mac OS X v10.5 "Leopard" operating system utilises snapshots of all files on the volume so that users can retrieve deleted files or folders (or previous versions of them) directly from the Finder (the shell). It also allows for retrieval of deleted (or previous versions) of other data, such as calendar entries, address book contacts, photos, email messages, etc. from within individual applications like Address Book, iPhoto, Mail, et al.
- ^ Harwood, Mike (2003-09-24). Storage Basics: Backup Strategies.