UTF-8

From IndieWeb


UTF-8 is a way to encode Unicode characters in variable number of bytes per character. This is known as a multi-byte encoding scheme. UTF-8 is the most widely used encoding scheme for HTML pages on the web.[1]

Why

You should use UTF-8 in your text editor or whatever tools you use to create posts so you can more easily type and paste characters like curly quotes or names with accent marks and have them show up properly on your site.

How to

Using UTF-8

When writing your HTML, or your scripting language that generates the HTML (PHP, Python, etc.) set the encoding in your text editor to UTF-8. Then we need to tell the browser when it receives the HTML that we are using UTF-8. There are two ways of doing this. Firstly set the Content-Type HTTP response header, e.g.

Content-Type: text/html; charset=utf-8

Secondly to include the charset within the HTML document. The recommended way to do this in a HTML5 document is to use a meta tag early on like so:

<!DOCYTPE html>
<html>
  <head>
    <meta charset="UTF-8">
    ...

Warning, if these charset values donโ€™t match, the browser will prioritise the charset defined in the HTTP header over any charset defined within the document itself.

References

See Also