7.3.2
Static Web Documents
The basis of the Web is transferring
Web pages from server to client. In the simplest form, Web pages are static,
that is, are just files sitting on some server waiting to be retrieved. In this
context, even a video is a static Web page because it is just a file. In this
section we will look at static Web pages in detail. In the next one, we will
examine dynamic content.
Web pages are currently written in a
language called HTML (HyperText Markup Language). HTML allows users to produce
Web pages that include text, graphics, and pointers to other Web pages. HTML is
a markup language, a language for describing how documents are to be formatted.
The term ''markup'' comes from the old days when copyeditors actually marked up
documents to tell the printer—in those days, a human being—which fonts to use,
and so on. Markup languages thus contain explicit commands for formatting. For
example, in HTML, <b> means start boldface mode, and </b> means leave boldface mode. The advantage of a markup
language over one with no explicit markup is that writing a browser for it is
straightforward: the browser simply has to understand the markup commands. TeX
and troff are other well-known examples of markup languages.
By embedding all the markup commands
within each HTML file and standardizing them, it becomes possible for any Web
browser to read and reformat any Web page. Being able to reformat Web pages
after receiving them is crucial because a page may have been produced in a 1600
x 1200 window with 24-bit color but may have to be displayed in a 640 x 320
window configured for 8-bit color.
Below we will give a brief
introduction to HTML, just to give an idea of what it is like. While it is
certainly possible to write HTML documents with any standard editor, and many
people do, it is also possible to use special HTML editors or word processors
that do most of the work (but correspondingly give the user less control over
all the details of the final result).
A Web page consists of a head and a
body, each enclosed by <html> and </html> tags (formatting commands), although most browsers do not
complain if these tags are missing. As can be seen from Fig. 7-26(a), the head is bracketed by the <head> and </head> tags and the body is bracketed by the <body> and </body> tags. The strings inside the tags are called directives.
Most HTML tags have this format, that is they use, <something> to mark the beginning of something and </something> to mark its end. Most browsers have a menu item VIEW SOURCE
or something like that. Selecting this item displays the current page's HTML
source, instead of its formatted output.
Tags can be in either lower case or
upper case. Thus, <head> and <HEAD> mean the same thing, but newer versions of the standard
require lower case only. Actual layout of the HTML document is irrelevant. HTML
parsers ignore extra spaces and carriage returns since they have to reformat
the text to make it fit the current display area. Consequently, white space can
be added at will to make HTML documents more readable, something most of them
are badly in need of. As another consequence, blank lines cannot be used to separate
paragraphs, as they are simply ignored. An explicit tag is required.
Some tags have (named) parameters,
called attributes. For example,
<img
src="abc" alt="foobar">
is a tag, <img>, with parameter src set equal to abc and parameter alt set
equal to foobar. For each tag, the HTML standard gives a list of what the
permitted parameters, if any, are, and what they mean. Because each parameter
is named, the order in which the parameters are given is not significant.
Technically, HTML documents are
written in the ISO 8859-1 Latin-1 character set, but for users whose keyboards
support only ASCII, escape sequences are present for the special characters,
such as è. The list of special characters is given in the standard. All of them
begin with an ampersand and end with a semicolon. For example, produces a space, è produces
è and é produces é. Since <, >, and & have special
meanings, they can be expressed only with their escape sequences, <,
>, and &, respectively.
The main item in the head is the
title, delimited by <title> and </title>, but certain kinds of meta-information may also be present.
The title itself is not displayed on the page. Some browsers use it to label
the page's window.
Let us now take a look at some of
the other features illustrated in Fig. 7-26. All of the tags used in Fig. 7-26 and some others are shown in Fig. 7-27. Headings are generated by an <hn>
tag, where n is a digit in the range 1 to 6. Thus <h1> is the most important heading; <h6> is the least important one. It is up to the browser to
render these appropriately on the screen. Typically the lower numbered headings
will be displayed in a larger and heavier font. The browser may also choose to
use different colors for each level of heading. Typically <h1> headings are large and boldface with at least one blank
line above and below. In contrast, <h2> headings
are in a smaller font with less space above and below.
The tags <b> and <i> are used to enter boldface and italics mode, respectively.
If the browser is not capable of displaying boldface and italics, it must use
some other method of rendering them, for example, using a different color for
each or perhaps reverse video.
HTML provides various mechanisms for
making lists, including nested lists. Lists are started with <ul> or <ol>, with <li> used to mark the start of the items in both cases. The <ul> tag starts an unordered list. The individual items, which
are marked with the <li> tag in the source, appear with bullets (•) in front of
them. A variant of this mechanism is <ol>, which is
for ordered lists. When this tag is used, the <li> items are
numbered by the browser. Other than the use of different starting and ending
tags, <ul> and <ol> have the same syntax and similar results.
The <br>, <p>, and <hr> tags all indicate a boundary between sections of text. The
precise format can be determined by the style sheet (see below) associated with
the page. The <br> tag just forces a line break. Typically, browsers do not
insert a blank line after <br>. In contrast, <p> starts a
paragraph, which might, for example, insert a blank line and possibly some
indentation. (Theoretically, </p> exists to mark the end of a paragraph, but it is rarely
used; most HTML authors do not even know it exists.) Finally, <hr> forces a break and draws a horizontal line across the
screen.
HTML allows images to be included
in-line on a Web page. The <img> tag specifies that an image is to be displayed at the
current position in the page. It can have several parameters. The src parameter
gives the URL of the image. The HTML standard does not specify which graphic
formats are permitted. In practice, all browsers support GIF amd JPEG files.
Browsers are free to support other formats, but this extension is a two-edged
sword. If a user is accustomed to a browser that supports, say, BMP files, he
may include these in his Web pages and later be surprised when other browsers
just ignore all of his wonderful art.
Other parameters of <img> are align, which controls the alignment of the image with
respect to the text baseline (top, middle, bottom), alt, which provides text to
use instead of the image when the user has disabled images, and ismap,a flag
indicating that the image is an active map (i.e., clickable picture).
Finally, we come to hyperlinks,
which use the <a> (anchor) and </a> tags.
Like <img>, <a> has various parameters, including href (the URL) and name
(the hyperlink's name). The text between the <a> and </a> is displayed. If it is selected, the hyperlink is followed
to a new page. It is also permitted to put an <img> image
there, in which case clicking on the image also activates the hyperlink.
As an example, consider the
following HTML fragment:
<a
href="http://www.nasa.gov"> NASA's home page </a>
When a page with this fragment is
displayed, what appears on the screen is
NASA's
home page
If the user subsequently clicks on
this text, the browser immediately fetches the page whose URL is http://www.nasa.gov
and displays it.
As a second example, now consider
<a
href="http://www.nasa.gov"> <img src="shuttle.gif"
alt="NASA"> </a>
When displayed, this page shows a
picture (e.g., of the space shuttle). Clicking on the picture switches to
NASA's home page, just as clicking on the underlined text did in the previous
example. If the user has disabled automatic image display, the text NASA will
be displayed where the picture belongs.
The <a> tag can
take a parameter name to plant a hyperlink, to allow a hyperlink to point to
the middle of a page. For example, some Web pages start out with a clickable
table of contents. By clicking on an item in the table of contents, the user
jumps to the corresponding section of the page.
HTML keeps evolving. HTML 1.0 and
HTML 2.0 did not have tables, but they were added in HTML 3.0. An HTML table
consists of one or more rows, each consisting of one or more cells. Cells can
contain a wide range of material, including text, figures, icons, photographs,
and even other tables. Cells can be merged, so, for example, a heading can span
multiple columns. Page authors have limited control over the layout, including
alignment, border styles, and cell margins, but the browsers have the final say
in rendering tables.
An HTML table definition is listed
in Fig. 7-28(a) and a possible rendition is shown in
Fig. 7-28(b). This example just shows a few of
the basic features of HTML tables. Tables are started by the <table> tag. Additional information can be provided to describe
general properties of the table.
The <caption>
tag can be used to provide a figure caption. Each row begins with a <tr> (Table Row) tag. The individual cells are marked as <th> (Table Header) or <td> (Table
Data). The distinction is made to allow browsers to use different renditions
for them, as we have done in the example.
Numerous attributes are also allowed
in tables. They include ways to specify horizontal and vertical cell
alignments, justification within a cell, borders, grouping of cells, units, and
more.
In HTML 4.0, more new features were
added. These include accessibility features for handicapped users, object
embedding (a generalization of the <img> tag so
other objects can also be embedded in pages), support for scripting languages
(to allow dynamic content), and more.
When a Web site is complex,
consisting of many pages produced by multiple authors working for the same
company, it is often desirable to have a way to prevent different pages from
having a different appearance. This problem can be solved using style sheets.
When these are used, individual pages no longer use physical styles, such as
boldface and italics. Instead, page authors use logical styles such as <dn> (define), <em> (weak emphasis), <strong>
(strong emphasis), and <var> (program variables). The logical styles are defined in the
style sheet, which is referred to at the start of each page. In this way all
pages have the same style, and if the Webmaster decides to change <strong> from 14-point italics in blue to 18-point boldface in
shocking pink, all it requires is changing one definition to convert the entire
Web site. A style sheet can be compared to an #include file in a
C program: changing one macro definition there changes it in all the program
files that include the header.
HTML 1.0 was basically one-way.
Users could call up pages from information providers, but it was difficult to
send information back the other way. As more and more commercial organizations
began using the Web, there was a large demand for two-way traffic. For example,
many companies wanted to be able to take orders for products via their Web
pages, software vendors wanted to distribute software via the Web and have
customers fill out their registration cards electronically, and companies
offering Web searching wanted to have their customers be able to type in search
keywords.
These demands led to the inclusion
of forms starting in HTML 2.0. Forms contain boxes or buttons that allow users
to fill in information or make choices and then send the information back to
the page's owner. They use the <input> tag for this purpose. It has a
variety of parameters for determining the size, nature, and usage of the box displayed.
The most common forms are blank fields for accepting user text, boxes that can
be checked, active maps, and submit buttons. The example of Fig. 7-29 illustrates some of these choices.
Let us start our discussion of forms
by going over this example. Like all forms, this one is enclosed between the <form> and </form> tags. Text not enclosed in a tag is just displayed. All the
usual tags (e.g., <b>) are allowed in a form. Three kinds of input boxes are used
in this form.
The first kind of input box follows
the text ''Name''. The box is 46 characters wide and expects the user to type
in a string, which is then stored in the variable customer for later
processing. The <p> tag instructs the browser to display subsequent text and
boxes on the next line, even if there is room on the current line. By using <p> and other layout tags, the author of the page can control
the look of the form on the screen.
The next line of the form asks for
the user's street address, 40 columns wide, also on a line by itself. Then
comes a line asking for the city, state, and country. No <p> tags are used between the fields here, so the browser
displays them all on one line if they will fit. As far as the browser is
concerned, this paragraph just contains six items: three strings alternating
with three boxes. It displays them linearly from left to right, going over to a
new line whenever the current line cannot hold the next item. Thus, it is
conceivable that on a 1600 x 1200 screen all three strings and their
corresponding boxes will appear on the same line, but on a 1024 x 768 screen
they might be split over two lines. In the worst scenario, the word ''Country''
is at the end of one line and its box is at the beginning of the next line.
The next line asks for the credit
card number and expiration date. Transmitting credit card numbers over the
Internet should only be done when adequate security measures have been taken.
Following the expiration date we
encounter a new feature: radio buttons. These are used when a choice must be
made among two or more alternatives. The intellectual model here is a car radio
with half a dozen buttons for choosing stations. The browser displays these
boxes in a form that allows the user to select and deselect them by clicking on
them (or using the keyboard). Clicking on one of them turns off all the other
ones in the same group. The visual presentation is up to the browser. Widget
size also uses two radio buttons. The two groups are distinguished by their name
field, not by static scoping using something like <radiobutton> ... </radiobutton>.
The value parameters are used to
indicate which radio button was pushed. Depending on which of the credit card
options the user has chosen, the variable cc will be set to either the string
''mastercard'' or the string ''visacard''.
After the two sets of radio buttons,
we come to the shipping option, represented by a box of type checkbox. It can
be either on or off. Unlike radio buttons, where exactly one out of the set
must be chosen, each box of type checkbox can be on or off, independently of
all the others. For example, when ordering a pizza via Electropizza's Web page,
the user can choose sardines and onions and pineapple (if she can stand it),
but she cannot choose small and medium and large for the same pizza. The pizza
toppings would be represented by three separate boxes of type checkbox, whereas
the pizza size would be a set of radio buttons.
As an aside, for very long lists
from which a choice must be made, radio buttons are somewhat inconvenient.
Therefore, the <select> and </select> tags are provided to bracket a list of alternatives, but
with the semantics of radio buttons (unless the multiple parameter is given, in
which case the semantics are those of checkable boxes). Some browsers render
the items located between <select> and </select> as a drop-down menu.
We have now seen two of the built-in
types for the <input> tag: radio and checkbox. In fact, we have already
seen a third one as well: text. Because this type is the default, we did not
bother to include the parameter type = text, but we could have. Two other types
are password and textarea. A password box is the same as a text box, except
that the characters are not displayed as they are typed. A textarea box is also
the same as a text box, except that it can contain multiple lines.
Getting back to the example of Fig. 7-29, we now come across an example of a submit
button. When this is clicked, the user information on the form is sent back to
the machine that provided the form. Like all the other types, submit is a
reserved word that the browser understands. The value string here is the label
on the button and is displayed. All boxes can have values; only here we needed
that feature. For text boxes, the contents of the value field are displayed
along with the form, but the user can edit or erase it. checkbox and radio
boxes can also be initialized, but with a field called checked (because value
just gives the text, but does not indicate a preferred choice).
When the user clicks the submit
button, the browser packages the collected information into a single long line
and sends it back to the server for processing. The & is used to separate
fields and + is used to represent space. For our example form, the line might
look like the contents of Fig. 7-30 (broken into three lines here because
the page is not wide enough):
The string would be sent back to the
server as one line, not three. If a checkbox is not selected, it is omitted
from the string. It is up to the server to make sense of this string.
No comments:
Post a Comment
silahkan membaca dan berkomentar