“Emojibake” are considered harmful

Fredrick Brennan
2 min readSep 5, 2021

Defined correctly, mojibake (日本語:文字化け) refers to the characters displayed when one views a document intended for one encoding with the other encoding. You can easily create mojibake on the console:

$ echo うれしいこと 悲しいことも 全部まるめて | iconv -f latin1 -t utf-8  | (tr -d '[[\000-\037][\177-\237]]'; echo)
ããããã㨠æ²ãããã¨ã å¨é¨ã¾ããã¦

In the common language, mojibake is any occurrence of tofu, garbled characters, or question marks, regardless of technical pathology.

Mojibake used to be incredibly common online, but has luckily mostly gone away with the widespread adoption of Unicode.

文字化け on Wikipedia (© Wikimedia Foundation 2011 CC 3.0 via Commons)

I am proposing a new term, emojibake (絵文字化け), for a 21ˢᵗ century mojibake. This occurs when a large Silicon Valley tech company decides that it wants to end-run the Unicode character proposal process, and starts letting its users use a proprietary font with a glyph (or glyphs) in one of the Private Use Areas of Unicode.

An example is Twitter’s Chirp font, which includes at U+EA00 a character best described as TWITTER LOGO:

Space-separated hex codepoint values: 46 72 65 64 72 69 63 6B 20 42 72 65 6E 6E 61 6E 20 EA00

This will lead to users seeing  all over the place. How that appears depends on your browser, on mine (Firefox under Arch Linux, FreeType 2.10.4):


This is bad for text interchange, bad for users, bad for the internet, bad for font authors, bad for everybody except Twitter.

Emojibake should be avoided. Fonts used to render plaintext submitted by users should not make use of the Private Use Areas at all. This is not their intended purpose.

They are intended to allow certain fonts to support unencoded characters for documents that will never be interchanged as plaintext. Social media posts do not count.

Update 2021-09-08: You can download a font I made, TwitterEA00, which has only a single useful glyph, uea00 , which contains the Twitter logo. ⬇️Download⬇️