Character encodings in HTML

Character encodings in HTML
HTML (Hypertext Markup Language) has been in use since 1991, but HTML 4.0 (December 1997) was the first standardized version where international characters were given reasonably complete treatment.

ASCII was the first character encoding standard (also called character set). It defines 127 different alphanumeric characters that could be used on the internet.

ASCII supported numbers (0-9), English letters (A-Z), and some special characters like ! $ + – ( ) @ < > .

ANSI (Windows-1252) was the original Windows character set. It supported 256 different character codes.

ISO-8859-1 was the default character set for HTML 4. It also supported 256 different character codes.

Because ANSI and ISO was limited, the default character encoding was changed to UTF-8 in HTML5.

UTF-8 (Unicode) covers almost all of the characters and symbols in the world.

When an HTML document includes special characters outside the range of seven-bit ASCII two goals are worth considering: the information’s integrity, and universal browser display.

In HTML5: Unicode UTF-8

Because the character sets listed above are limited, and not compatible in multilingual environments, the Unicode Consortium developed the Unicode Standard.

The Unicode Standard covers (almost) all the characters, punctuations, and symbols in the world.

Unicode enables processing, storage, and transport of text, independent of platform and language.

The default character encoding in HTML5 is UTF-8.

