Bi-directional text

From Wikipedia, the free encyclopedia

Jump to: navigation, search
Unicode
Encodings
UCS
Mapping
Bi-directional text
BOM
Han unification
Unicode and HTML
Unicode and E-mail
Unicode typefaces

Some writing systems of the world, notably the Arabic (including variants such as Perso-Arabic or Nasta'liq) and Hebrew scripts, are written in a form known as right-to-left (RTL), in which writing begins at the right-hand side of a page and concludes at the left-hand side. This is different from the left-to-right (LTR) direction in which languages using the Latin alphabet (such as English) are written. When LTR text is mixed with RTL in the same paragraph, each type of text should be written in its own direction, which is known as bi-directional text. This can get rather complex when multiple levels of quotation are used.

Many computer programs fail to display bi-directional text correctly. For example, the Hebrew name Sarah (שרה) should be spelled shin (ש) resh (ר) heh (ה) from right to left. Some Web browsers may display the Hebrew text in this article in the opposite direction.

There are very few scripts that can be written in either direction. Such was the case with Egyptian hieroglyphics, where the signs had a distinct "head" that faced the beginning of a line and "tail" that faced the end. Chinese can also be written in either direction, especially in signs (but the orientation of the individual characters is never changed).

Another variety of writing style, called boustrophedon, was used in some ancient Greek inscriptions, Tuareg, and Hungarian runes. This method of writing alternates direction on each successive line.

Bidirectional script support is the capability of a computer system to correctly display bi-directional text. The term is often shortened to the jargon term BiDi or bidi.

Early computer installations were designed only to support a single writing system, typically for left-to-right scripts based on the Latin alphabet only. Adding new character sets and character encodings enabled a number of other left-to-right scripts to be supported, but did not easily support right-to-left scripts such as Arabic or Hebrew, and mixing the two was not practical. It is possible to simply flip the left-to-right display order to a right-to-left display order, but doing this sacrifices the ability to correctly display left-to-right scripts. With bidirectional script support, it is possible to mix scripts from different scripts on the same page, regardless of writing direction.

In particular, the Unicode standard provides foundations for complete BiDi support, with detailed rules as to how mixtures of left-to-right and right-to-left scripts are to be encoded and displayed.

In Unicode encoding, all non-punctuation characters are stored in writing order. This means that the writing direction of characters is stored within the characters. If this is the case, the character is called "strong". Punctuation characters however, can appear in both LTR and RTL languages. They are called "weak" characters because they do not contain any directional information. So it is up to the software to decide in which direction these "weak" characters will be placed. Sometimes (in mixed-directions text) this leads to display errors, caused by the bidi-algorithm that runs through the text and identifies LTR and RTL strong characters and assigns a direction to weak characters, according to the algorithm's rules.

In the algorithm, each sequence of concatenated strong characters is called a "run". A weak character that is located between two strong characters with the same orientation will inherit their orientation. A weak character that is located between two strong characters with a different writing direction, will inherit the main context's writing direction (in a LTR document the character will become LTR, in an RTL document, it will become RTL). If a "weak" character is followed by another "weak" character, the algorithm will look at the first neighbouring "strong" character. Sometimes this leads to unintentional display errors. To correct or prevent these errors, you can use "pseudo-strong" characters. These Unicode control characters are called "marks". The mark (\u200E LTR or \u200F RTL - Alt+8206 and Alt+8207 respectively) is to be inserted into a location to make an enclosed weak character inherit its writing direction.

For example, to have the trademark symbol (\u8482) for an English name brand (LTR) in an Arabic (RTL) passage display correctly, you need to add a LTR mark after the trademark symbol if the symbol is not followed by LTR text. This is because if you do not add the LTR mark, the weak character \u8482 will be neighboured by a strong LTR character and a strong RTL character. Hence, in an RTL context, it will be considered to be RTL, and displayed in an incorrect order.

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.