Different Worlds: An Introduction to HTML

Languages & Special Characters

The LANG attribute

Not everyone uses English as their first language. And even on a page where English is the primary language, there are times when you might want to quote something in a different language. It's helpful if you indicate what language is being used, either on the page as a whole, or in a specific section. This is particularly true for users (e.g. blind and visually impaired people) who are using text-to-speech software to access your pages.

You can do this using the LANG attribute. It's value is a two-letter code indicating the language being used. Some common values for this attribute are:

  • English = en
  • French = fr
  • German = de
  • Italian = it
  • Dutch = nl
  • Greek = el
  • Spanish = es
  • Portuguese = pt
  • Arabic = ar
  • Hebrew = he
  • Russian = ru
  • Chinese = zh
  • Japanese = ja
  • Hindi - hi
  • Urdu = ur
  • Sanskrit = sa

To indicate the language for the whole page, add this attribute to the opening HTML tag. For example, this page is in English, so the HTML tag has the attribute LANG="en", like this:

<html lang="en">

If you want to quote something in a different language to the language used in the rest of your page, you can add the LANG attribute to the appropriate element - paragraph, td, blockquote, etc. So, for example, if I wanted to quote a paragraph in French, I might add the LANG attribute to the opening BLOCKQUOTE tag, like this:

<blockquote lang="fr">

Using the SPAN and DIV elements

Since one of the purposes of this attribute is to enable speech software to pronounce words correctly, it is also useful to use it for a word that derives from a different language, and which is usually pronounced as it would be in that language. For example, in English, we sometimes use the French word "voila" (e.g. "And 'Voila!', it's done!"). When coding this, there isn't any other code enclosing just the word "voila", so there isn't an obvious tag that you can add the LANG attribute to. In instances like this, we can use the SPAN element. It's purpose is simply to define a section or part of a larger element (e.g. a paragraph), to which some formatting or other code can be applied. Using the sentence shown above as an example, we can indicate that the word "voila" is French like this:

<p>And "<span lang="fr">Voila!</span>", it's done!</p>

The SPAN element is an "inline" element, so can only be used inside block level elements like P. To enclose one or more block level elements in a similar way, we use the DIV element. When you view the source code of other people's pages, you will often see DIV being used to align several sequential block level elements - for example:

<div align="center">
.....
</div>

Special characters

If you want to display, on your page, characters other than the standard letters (A to Z and a to z), numbers (0 to 9) and simple punctuation, you need to use special character codes or "entities". For example, the word "fiancée" contains the French "e acute", and to display examples of HTML code, you need to be able to display "<" and ">" symbols, but if you simply type those in directly, the user's browser will try to interpret them as code, rather than displaying them.

A character code consists of an ampersand ( & ) followed by a numeric code which identifies the character you want, and ended by a semicolon ( ; ). A character entity is similar, but instead of the numeric code, a unique alphabetic code is used. A full list of the many character and entity codes can be found in the HTML 4 specification at http://www.w3.org/TR/html.

Some useful character entities that you should know about are:

  • < is &lt;
  • > is &gt;
  • & is &amp;
  • " is &quot;
  • é (e acute) is &eacute;
  • A non-breaking space is &nbsp; (useful if you don't want text to "break" in the middle of a name, for example)

To display the code for a character entity rather than the character itself (as I've done above), you have to use "&amp;" for the opening ampersand, followed by the rest of the entity code, like this:

&amp;eacute;

When that is displayed by the browser, it will be displayed as "&eacute;".

Next: Visual Presentation


Valid HTML 4.01

http://www.users.zetnet.co.uk/dms/htmlguide/html5.html
© 1998-2001 Donna Smillie <dms@zetnet.co.uk>