Journaling file system

From Wikipedia, the free encyclopedia

(Redirected from Journaling filesystem)
Jump to: navigation, search

A journaling file system is a file system that logs changes to a journal (usually a circular log in a specially-allocated area) before actually writing them to the main file system. Such file systems are less likely to become corrupted in the event of power failure or system crash.

Contents

File systems tend to be very large data structures; updating them to reflect changes to files and directories usually requires many separate write operations. This introduces a race condition, in which an interruption (like a power failure or system crash) can leave data structures in an invalid intermediate state.

For example, deleting a file on a Unix file system involves two steps:

  1. Removing its directory entry.
  2. Marking space for the file and its inode as free in the free space map.

If step 1 occurs just before a crash, there will be an orphaned inode and hence a storage leak. On the other hand, if only step 2 is performed first before the crash, the not-yet-deleted file will be marked free and possibly be overwritten by something else.

Recovery in a non-journaled file system requires a complete walk of its data structures to find and correct any inconsistencies. This can be slow if the file system is large, and there is little available I/O bandwidth.

A journaled file system maintains a journal of the changes it intends to make, ahead of time. After a crash, recovery simply involves replaying changes in the journal until the file system is consistent again. The changes are thus said to be atomic (or indivisible) in that they either:

  • succeed (have succeeded originally or be replayed completely during recovery), or
  • are not replayed at all.

Some file systems allow the journal to grow, shrink and be re-allocated just as would a regular file; most, however, put the journal in a contiguous area or a special hidden file that is guaranteed not to move or change size while the file system is mounted.

A physical journal logs verbatim copies of blocks that will be written later, such as ext3.[1] A logical journal logs information about the changes in a special, compact format, such as XFS. This reduces the amount of data that needs to be read from and written to the journal in large, metadata-heavy operations (for example, deleting a large directory tree).

Some UFS implementations avoid journaling and instead implement soft updates: they order their writes in such a way the on-disk file system is never inconsistent, or inconsistent only in cases of storage leaks. To recover from these leaks, the free space map is reconciled against a full walk of the file system at next mount. This garbage collection is usually done in the background.[2]

Journaling can have a severe impact on performance because it requires that all data be written twice.[3] Metadata-only journaling is a compromise between reliability and performance that stores only changes to file metadata in the journal. This still ensures that the file system can recover quickly when next mounted, but leaves an opportunity for data corruption because unjournaled file data and journaled metadata can fall out of sync.

For example, appending to a file on a Unix file system involves three steps:

  1. Increasing the size of the file in its inode.
  2. Allocating space for the extension in the free space map.
  3. Writing the appended data to the newly-allocated space.

In a metadata-only journal, step 3 would not be logged. If step 3 was not done, but steps 1 and 2 are replayed during recovery, the file will be appended with garbage.

The write cache in most operating systems sorts its writes (with elevator sort or some similar scheme) to maximize throughput. To avoid an out-of-order write hazard with a metadata-only journal, writes for file data must be sorted so that they are committed to storage before their associated metadata. This can be tricky to implement because it requires coordination within the operating system kernel between the file system driver and write cache. An out-of-order write hazard can also exist if the underlying storage:

  • cannot write blocks atomically, or
  • re-sorted its writes, or
  • does not honor requests to flush its write cache.

In log-structured file systems, write-twice penalty does not apply because the journal itself is the file system. Most Unix file systems are not log-structured, but some implement similar techniques in order to avoid the double-write penalty. In particular, Reiser4 can group many separate writes into a single contiguously-written chunk, then extend the head of the journal to enclose the newly-written chunk. The tail of the journal retracts from the chunk after it has been committed to storage.[4]

  1. ^ Tweedie, Stephen C (1998), "Journaling the Linux ext2fs Filesystem", The Fourth Annual Linux Expo, <http://donner.cs.uni-magdeburg.de/bs/lehre/wise0001/bs2/journaling/journal-design.pdf>.
  2. ^ Seltzer, Margo I; Ganger, Gregory R & McKusick, M Kirk, ""Journaling Versus Soft Updates: Asynchronous Meta-data Protection in File Systems"", 2000 USENIX Annual Technical Conference (USENIX Association), <http://www.usenix.org/event/usenix2000/general/full_papers/seltzer/seltzer_html>.
  3. ^ Prabhakaran, Vijayan; Arpaci-Dusseau, Andrea C & Arpaci-Dusseau, Remzi H, "Analysis and Evolution of Journaling File Systems", 2005 USENIX Annual Technical Conference (USENIX Association), <https://www.usenix.org/events/usenix05/tech/general/full_papers/prabhakaran/prabhakaran.pdf>.
  4. ^ Reiser, Hans (October 2003), Reiser4 white paper, <http://namesys.com/v4/v4.html>. Retrieved on 27 July 2007.

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.