Reed–Solomon error correction
From Wikipedia, the free encyclopedia
Reed–Solomon error correction is an error-correcting code that works by oversampling a polynomial constructed from the data. The polynomial is evaluated at several points, and these values are sent or recorded. By sampling the polynomial more often than is necessary, the polynomial is over-determined. As long as "many" of the points are received correctly, the receiver can recover the original polynomial even in the presence of a "few" bad points.
Reed–Solomon codes are used in a wide variety of commercial applications, most prominently in CDs and DVDs, in data transmission technologies such as DSL & WiMAX, and broadcast systems such as DVB and ATSC.
Contents |
Reed-Solomon codes are block codes. This means that a fixed block of input data is processed into a fixed block of output data. In the case of the most commonly used R-S code (255, 223) -- 223 Reed-Solomon input symbols (each eight bits long) are encoded into 255 output symbols.
- Most R-S EEC schemes are systematic.
- Systematic codes mean that some portion of the codeword contains the input data in unalterable form.
- A Reed-Solomon symbol size of eight bits was chosen because the decoders for larger symbol sizes would be difficult to implement with current technology. This design choice forces the longest codeword length to be 255 symbols.
- The standard (255, 223) Reed-Solomon code is capable of correcting up to 16 Reed-Solomon symbol errors in each codeword. Since each symbol is actually eight bits, this means that the code can correct up to 16 short bursts of error due to the inner convolutional decoder.
The Reed-Solomon code, like the convolutional code, is a transparent code. This means that if the channel symbols have been inverted somewhere along the line, the decoders will still operate. The result will be the complement of the original data. However, the Reed-Solomon code loses its transparency if virtual zero fill is used. For this reason it is mandatory that the sense of the data (i.e., true or complemented) be resolved before Reed-Solomon decoding.
In the case of the Voyager Program R-S codes reach near optimal performance when concatenated with the (7, 1/2) convolutional (Viterbi) inner code. Since two check symbols are required for each error to be corrected, this results in a total of 32 check symbols and 223 information symbols per codeword.
In addition, the Reed-Solomon codewords can be interleaved on a symbol basis before being convolutionally encoded. Since this separates the symbols in a codeword, it becomes less likely that a burst from the Viterbi decoder disturbs more than one Reed-Solomon symbol in any one codeword.
The key idea behind a Reed-Solomon code is that the data encoded is first visualized as a polynomial. The code relies on a theorem from linear algebra that states that any k distinct points uniquely determine a polynomial of degree at most k-1.
The polynomial is then "encoded" by its evaluation at various points, and these values are what is actually sent. During transmission, some of these values may become corrupted. Therefore, more than k points are actually sent. As long as sufficient values are received correctly, the receiver can deduce what the original polynomial was, and hence decode the original data.
Given a finite field F and polynomial ring F[x], let n and k be chosen such that 1 ≤ k ≤ n ≤ | F |. Pick n distinct elements of F, denoted { x1, x2, ... , xn }. Then, the codebook C is created from the tuplets of values obtained by evaluating every polynomial (over F) of degree less than k at each xi; that is,
C is a [n, k, n-k+1] code; in other words, it is a linear code of length n (over F) with dimension k and minimum distance n-k+1.
The data points are sent as encoded blocks. The total number of m-bit symbols in the encoded block is n = 2m − 1. Thus a Reed-Solomon code operating on 8-bit symbols has n = 28 − 1 = 255 symbols per block. (This is a very popular value because of the prevalence of byte-oriented computer systems.) The number k, with k < n, of data symbols in the block is a design parameter. A commonly used code encodes k = 223 8-bit data symbols plus 32 8-bit parity symbols in an n = 255-symbol block; this is denoted as a (n,k) = (255,223) code, which is capable of correcting up to 16 symbol errors per block.
The data symbol is represented by the coefficients of a polynomial over a finite field. The polynomial is then evaluated at numerous points over the field, and these values are sent as the block of the encoded message. The number of points evaluated is larger than the degree of the polynomial so that the polynomial is overdetermined; the coefficients can therefore be recovered from subsets of the plotted points. In the same sense that one can correct a curve by interpolating past a gap, a Reed-Solomon code can bridge a series of errors in a block of data to recover the coefficients of the polynomial that drew the original curve.
The error-correcting ability of any Reed-Solomon code is determined by n − k, the measure of redundancy in the block. If the locations of the errored symbols are not known in advance, then a Reed–Solomon code can correct up to
erroneous symbols, i.e., it can correct half as many errors as there are redundant symbols added to the block. Sometimes error locations are known in advance (e.g., “side information” in demodulator signal-to-noise ratios)—these are called erasures. A Reed–Solomon code (like any linear code) is able to correct twice as many erasures as errors, and any combination of errors and erasures can be corrected as long as the inequality 2E + S < n − k is satisfied, where E is the number of errors and S is the number of erasures in the block.
The properties of Reed-Solomon codes make them especially well-suited to applications where errors occur in bursts. This is because it does not matter to the code how many bits in a symbol are in error—if multiple bits in a symbol are corrupted it only counts as a single error. Conversely, if a data stream is not characterized by error bursts or drop-outs but by random single bit errors, a Reed-Solomon code is usually a poor choice.
Designers are not required to use the "natural" sizes of Reed-Solomon code blocks. A technique known as “shortening” can produce a smaller code of any desired size from a larger code. For example, the widely used (255,223) code can be converted to a (160,128) code by padding the unused portion of the block (usually the beginning) with 95 binary zeroes and not transmitting them. At the decoder, the same portion of the block is loaded locally with binary zeroes. The compact disc is an example of an application of shortened Reed-Solomon codes.
In 1999 Madhu Sudan and Venkatesan Guruswami at MIT, published “Improved Decoding of Reed-Solomon and Algebraic-Geometry Codes” introducing an algorithm that allowed for the correction of errors beyond half the minimum distance of the code. It applies to Reed–Solomon codes and more generally to algebraic geometry codes. This algorithm produces a list of codewords (it is a list-decoding algorithm) and is based on interpolation and factorization of polynomials over GF(2m) and its extensions.
The code was invented in 1960 by Irving S. Reed and Gustave Solomon, who were then members of MIT Lincoln Laboratory. Their seminal article was "Polynomial Codes over Certain Finite Fields." When it was written, digital technology was not advanced enough to implement the concept. The key to application of Reed-Solomon codes was the invention of an efficient decoding algorithm by Elwyn Berlekamp, a professor of electrical engineering at the University of California, Berkeley. Today they are used in disk drives, CDs, telecommunication and digital broadcast protocols.
Reed-Solomon coding is very widely used in mass storage systems to correct the burst errors associated with media defects.
Reed-Solomon coding is a key component of the compact disc. It was the first use of strong error correction coding in a mass-produced consumer product, and DAT and DVD use similar schemes. In the CD, two layers of Reed-Solomon coding separated by a 28-way convolutional interleaver yields a scheme called Cross-Interleaved Reed Solomon Coding (CIRC). The first element of a CIRC decoder is a relatively weak inner (32,28) Reed-Solomon code, shortened from a (255,251) code with 8-bit symbols. This code can correct up to 2 byte errors per 32-byte block. More importantly, it flags as erasures any uncorrectable blocks, i.e., blocks with more than 2 byte errors. The decoded 28-byte blocks, with erasure indications, are then spread by the deinterleaver to different blocks of the (28,24) outer code. Thanks to the deinterleaving, an erased 28-byte block from the inner code becomes a single erased byte in each of 28 outer code blocks. The outer code easily corrects this, since it can handle up to 4 such erasures per block.
The result is a CIRC that can completely correct error bursts up to 4000 bits, or about 2.5 mm on the disc surface. This code is so strong that most CD playback errors are almost certainly caused by tracking errors that cause the laser to jump track, not by uncorrectable error bursts.[citation needed]
Another product which incorporates Reed–Solomon coding is Nintendo's e-Reader. This is a video-game delivery system which uses a two-dimensional "barcode" printed on trading cards. The cards are scanned using a device which attaches to Nintendo's Game Boy Advance game system.
Reed-Solomon error correction is also used in parchive files which are commonly posted accompanying multimedia files on USENET.
Specialized forms of Reed-Solomon codes specifically Cauchy-RS and Vandermonde-RS can be used to overcome the unreliable nature of data transmission over erasure channels. The encoding process assumes a code of RS(N,K) which results in N codewords of length N symbols each storing K symbols of data, being generated, that are then sent over an erasure channel.
Any combination of K codewords received at the other end is enough to reconstruct all of the N codewords. The code rate is generally set to 1/2 unless the channel's erasure likelihood can be adequately model and is seen to be less. In conclusion N is usually 2K, meaning that at least half of all the codewords sent must be received in order to reconstruct all of the codewords sent.
Reed-Solomon codes are also used in xDSL systems and CCSDS's Space Communications Protocol Specifications as a form of Forward Error Correction.
Paper bar codes such as PostBar and MaxiCode use Reed–Solomon error correction to correct for encoding errors on paper.
One significant application of Reed–Solomon coding was to encode the digital pictures sent back by the Voyager space probe.
Voyager introduced Reed–Solomon coding in conjunction with ML convolutional codes, a practice that has since become very widespread in deep space and satellite (e.g., direct digital broadcasting) communications.
Viterbi decoders tend to produce errors in short bursts. Correcting these burst errors is a job best done by short or simplified Reed-Solomon codes.
Modern versions of concatenated Reed-Solomon/Viterbi-decoded convolutional coding were and are used on the Mars Pathfinder, Galileo, Mars Exploration Rover and Cassini missions, where they perform within about 1–1.5 dB of the ultimate limit imposed by the Shannon capacity.
These concatenated codes are now being replaced by more powerful turbo codes where the transmitted data does not need to be decoded immediately.
- Schifra Open Source C++ Reed-Solomon Codec
- ECC-Page
- A collection of links to books, online articles and source code
- Henry Minsky's RSCode library, Reed-Solomon encoder/decoder
- A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems
- A free tool for testing the Reed-Solomon Algorithm (German)
- An application note from 4i2i on some specific implementations
- A thesis on Algebraic soft-decoding of Reed-Solomon codes. It explains the basics as well.
![\mathbf{C} = \left\{ \left( f(x_1), f(x_2), ..., f(x_n) \right), f \in F[x], deg(f)<k \right\}](http://upload.wikimedia.org/math/6/7/0/670247edb37182b46632743b724687d2.png)