Wednesday, August 6, 2008

Character Formatting

HTML has two types of styles for individual words or sentences: logical and physical. Logical styles tag text according to its meaning, while physical styles indicate the specific appearance of a section. For example, in the preceding sentence, the words "logical styles" was tagged as "emphasis." The same effect (formatting those words in italics) could have been achieved via a different tag that tells your browser to "put these words in italics."

Logical Versus Physical Styles

If physical and logical styles produce the same result on the screen, why are there both?

In the ideal SGML universe, content is divorced from presentation. Thus SGML tags a level-one heading as a level-one heading, but does not specify that the level-one heading should be, for instance, 24-point bold Times centered. The advantage of this approach (it's similar in concept to style sheets in many word processors) is that if you decide to change level-one headings to be 20-point left-justified Helvetica, all you have to do is change the definition of the level-one heading in your Web browser. Indeed, many browsers today let you define how you want the various HTML tags rendered on-screen using what are called cascading style sheets, or CSS. CSS is more advanced than HTML, though, and will not be covered in this Primer. (You can learn more about CSS at the World Wide Web Consortium CSS site.)

Another advantage of logical tags is that they help enforce consistency in your documents. It's easier to tag something as

than to remember that level-one headings are 24-point bold Times centered or whatever. For example, consider the tag. Most browsers render it in bold text. However, it is possible that a reader would prefer that these sections be displayed in red instead. (This is possible using a local cascading style sheet on the reader's own computer.) Logical styles offer this flexibility.

Of course, if you want something to be displayed in italics (for example) and do not want a browser's setting to display it differently, you should use physical styles. Physical styles, therefore, offer consistency in that something you tag a certain way will always be displayed that way for readers of your document.

Try to be consistent about which type of style you use. If you tag with physical styles, do so throughout a document. If you use logical styles, stick with them within a document. Keep in mind that future releases of HTML might not support certain logical styles, which could mean that browsers will not display your logical-style coding. (For example, the tag -- short for "definition", and typically displayed in italics -- is not widely supported and will be ignored if the reader's browser does not understand it.)

Logical Styles

for a word being defined. Typically displayed in italics. (NCSA Mosaic is a World Wide Web browser.)

for emphasis. Typically displayed in italics. (Consultants cannot reset your password unless you call the help line.)

for titles of books, films, etc. Typically displayed in italics. (A Beginner's Guide to HTML)

for computer code. Displayed in a fixed-width font. (The header file)

for user keyboard entry. Typically displayed in plain fixed-width font. (Enter passwd to change your password.)

for a sequence of literal characters. Displayed in a fixed-width font. (Segmentation fault: Core dumped.)

for strong emphasis. Typically displayed in bold. (NOTE: Always check your links.)

for a variable, where you will replace the variable with specific information. Typically displayed in italics. (rm filename deletes the file.)
Physical Styles

bold text

italic text

typewriter text, e.g. fixed-width font.
Escape Sequences (a.k.a. Character Entities)

Character entities have two functions:

escaping special characters
displaying other characters not available in the plain ASCII character set (primarily characters with diacritical marks)
Three ASCII characters--the left angle bracket (<), the right angle bracket (>), and the ampersand (&)--have special meanings in HTML and therefore cannot be used "as is" in text. (The angle brackets are used to indicate the beginning and end of HTML tags, and the ampersand is used to indicate the beginning of an escape sequence.) Double quote marks may be used as-is but a character entity may also be used (").

To use one of the three characters in an HTML document, you must enter its escape sequence instead:

the escape sequence for <
the escape sequence for >
the escape sequence for &

Additional escape sequences support accented characters, such as:

a lowercase o with an umlaut: ö
a lowercase n with a tilde: ñ
an uppercase E with a grave accent: È
You can substitute other letters for the o, n, and E shown above. Visit the World Wide Web Consortium for a complete list of special characters.

NOTE: Unlike the rest of HTML, the escape sequences are case sensitive. You cannot, for instance, use < instead of <.

No comments:


My Blog List