Pages

Thursday, June 17, 2010

HTML Document Structure

In this post, I will talk about HTML Document Structure. This post is not a tutorial on HTML for which you can find many useful links online. This post will give you an overview of what a typical HTML Document looks like. So let us dive in straight.

What is HTML

Hyper Text Markup Language or HTML is a markup language (and not a programming language) used for creating web pages. One may ask what a markup language is. Well a markup language uses tags to create different parts of a document. To better understand this, you must notice when a person reviews a document and marks (underline, highlight) any spelling, grammatical or technical mistakes. At the end of the review, the document will probably have several markups. The same concept applies to a markup language where different tags create the contents of a webpage.

A HTML document contains different parts known as elements. For example, to make some text bold, we use the <b></b> element. Each element can be divided into three parts; Start Tag, Content and End Tag. A tag is a ‘markup’ which is delimited by < and >. The End tag has an additional ‘/’ after <. In the above example, the start tag is <b> and the end tag is </b>. The contents of the element are surrounded by the start and end tags. Some elements (such as new-line break </br>) do not need a closing tag as they have no Content.

Each element can also have attributes. An attribute represents a property of the element. For example, an input element has a type attribute where the type may specify it as button or text box. Attribute values must be delimited in single or double quotes. Attributes are only included in the start tag and are case-insensitive. However, their values can be case-sensitive.

The following listing shows what a typical HTML document looks like:


Listing 1

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" >

<head>

<title>Untitled Page</title>

</head>

<body>

<h1>HTML Document</h1>

<p>This post talks about a HTML Document</p>

</body>

</html>



Now let us explore the above document in greater detail.


!DOCTYPE Declaration

Each HTML Document begins with Document Type Declaration or DOCTYPE. This is a standard defined under HTML rules. The !DOCTYPE defines the version of HTML used by the HTML document. This information is used by web browsers to validate the document’s syntax.

The DOCTYPE instructs the browser to associate a particular Document Type Definition or DTD - which defines set of rules for a markup language – with an HTML document. Typically, a browser consists of a layout engine which performs “switching” to switch over to the particular DTD defined in the DOCTYPE definition. This way, the browser knows precisely which DTD Rules to apply to render the HTML Web Page correctly. Following are the different DOCTYPE definitions supported by HTML 4.01:

HTML 4.01 Strict : This DTD includes standard HTML elements and attributes but does not include presentational elements such as fonts. This DTD does not allow Framesets. The definition takes the following form:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

HTML 4.01 Transitional : This DTD includes standard HTML elements and attributes in addition to presentation elements including fonts. Frameset is not allowed. The definition takes the following form:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

HTML 4.01 Frameset : This DTD is the same as HTML 4.01 Transitional in addition to Frameset support. The definition takes the following form:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">

The above are the standard DTD types supported by HTML 4.01. With the release of XTHML (HTML which conforms to XML standards - more on this in my following post), the following additional DTD types have been introduced:

XHTML 1.0 Strict: Similar to its HTML 4.01 Strict counterpart with support for XHTML. The definition is of the following form:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

XHTML 1.0 Transitional : Similar to its HTML 4.01 Transitional counterpart with support for XHTML. The definition takes the following form:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

XHTML 1.0 Frameset : Again similar to its HTML 4.01 Frameset counterpart with support for XHTML. The definition takes the following form:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

XHTML 1.1: Similar to its XHTML 1.0 Strict in addition to allowing adding modules. The definition takes the following form:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">



HTML Element

The HTML element is the root element and acts as the parent container for the rest of the elements. The rest of the elements are contained within the start and end tag or the HTML element as shown in Listing 1. The two child elements of the HTML element are discussed in the following paragraphs.

HEAD Element

The HEAD element holds set of different tags which provide page related information. This section is the first part to be loaded in the browser. The HEAD section contains the following elements:

Title : Provides a brief description of the webpage such as ‘HTML Tutorial’ or ‘Welcome to my homepage’
STYLE : This element defines the stylesheet for a webpage. A stylesheet defines the visual layout of a webpage including colors, margins, background etc.
META : META element allows defining information about the document. It is like information about information. It does not describe the contents of he webpage. For example, this information can include the author of the page, page language, date-created etc. This information is in the form of name/value pairs and is also used by Search Engines while indexing the webpage.
SCRIPT :This element allows defining scripts such as JavaScript, VB Script for the webpage.

Amongst the above, only the TITLE element is a visual. The rest are non-visual elements used for information keeping. As mentioned above, the HEAD sections loads first in the browser that’s why you can see the title even before the page is rendered.


BODY Element

The BODY element defines the contents of a webpage. The contents of the webpage may include headings, tables, paragraphs, images, hyperlinks etc. The body may be implemented using the BODY or the FRAMESET element. Framesets are used to divide the webpage into different portions.

No comments:

Post a Comment