Mojibake

From Wikipedia, the free encyclopedia

The Japanese Wikipedia article for mojibake with improper encoding.
The Japanese Wikipedia article for mojibake with improper encoding.

Mojibake is the phenomenon of incorrect, unreadable characters (garbage characters) shown when computer software fails to render a text correctly according to its associated character encoding. It is a loanword from Japanese.

Contents

The Japanese word 文字化け (mojibake) is composed of 文字 (moji), which means letter, character, and 化け (bake), from the verb 化ける (bakeru), which means to appear in disguise, to take the form of, to change for the worse. Literally, it means "character changing".

Mojibake is often caused by forced display of writing systems or character encodings that are "foreign" to the user's computer system: if a computer does not have the software required to process a foreign language's characters, it will attempt to process them in its default language encoding, usually resulting in gibberish. Messages transferred between different encodings of the same language can also have mojibake problems. Japanese language users, with several different encodings historically employed, would encounter this problem relatively often. An improperly configured or badly written web browser may not distinguish a page coded in EUC-JP and another in Shift-JIS if the coding scheme is not assigned explicitly using the HTTP headers sent along with the documents, or the HTML document's meta tags that are used to substitute for missing HTTP headers if the server cannot be configured to send the proper HTTP headers. A well-defined dictionary can usually avoid this problem.

As an example, the intended word "文字化け", encoded in UTF-8, might be incorrectly displayed as "•¶Žš‰»‚¯" in software that is not correctly configured to handle Japanese or Unicode.

In the mid 1990s, as this problem became common, several websites featured mojibake not as a problem to be tackled but simply for amusement. Words and even sentences were "deciphered" with meanings made up to deliver funny messages.

Mojibake can also occur among same font sets. It often occurs between Windows users and Macintosh users because their font set's name is the same, but each system includes extra characters in their font set. Many people don't know about their extra characters and use them in websites, e-mails, blogs, and so on as common characters, and as a result, mojibake occurs in same font sets.

In Chinese, this phenomenon is called luanma Simplified Chinese: 乱码; Traditional Chinese: 亂碼; pinyin: luànmǎ; literally "haphazard code".

In Hebrew it is usually called jibrish (ג'יבריש)[citation needed]. This word is originally from Yiddish.

Users of Central and Eastern European languages can also be affected. Because most computers were not connected to any network, during the mid- to late eighties there were different character encodings for every language with diacritical characters.

Handwritten krakozyabry corrected by a postal employee.
Handwritten krakozyabry corrected by a postal employee.

In Russian, mojibake is called krakozyabry (кракозя́бры). During the 1990s, several different encodings for the Cyrillic alphabet (Unix KOI8-R, Windows CP-1251, DOS 866, standard ISO 8859-5, and several others) competed. Badly configured servers and lack of compatibility made garbled text a common and frustrating experience. Many e-mail servers stripped the 8th bit from the characters as permitted by earlier standards (which renders UTF-8 unreadable, as well as all of the above). For this reason many Cyrillic users resorted to Volapuk encoding. An even more frustrating problem emerged in the early 2000s, when the popular e-mail client Microsoft Outlook started to replace correctly entered Cyrillic characters with question marks when replying to or forwarding messages created in competing encodings.

In Bulgarian, mojibake is often called maymunitsa (маймуница), meaning monkey's alphabet.

In Poland every company selling early DOS computers created its own encoding, and simply reprogrammed the EPROMs of the video cards (typically CGA, EGA or Hercules) with the according character shapes. Additionally, users of then-popular home computers (such as the Amiga and Atari ST) invented their own encodings, incompatible with international standards (ISO 8859-2), vendor standards (IBM CP852, Windows CP1250) and locally agreed-upon PC/MS DOS standards (Mazovia). The situation began to improve when, after pressure from academic and user groups, ISO 8859-2 succeeded as the "Internet standard" with limited support of the dominant vendor's software. With the numerous problems caused by the variety of encodings, even today some users tend to refer to Polish diacritical characters as krzaki ("bushes").

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.