Translate

Wednesday, September 7, 2016

Static Web Documents



7.3.2 Static Web Documents

The basis of the Web is transferring Web pages from server to client. In the simplest form, Web pages are static, that is, are just files sitting on some server waiting to be retrieved. In this context, even a video is a static Web page because it is just a file. In this section we will look at static Web pages in detail. In the next one, we will examine dynamic content.
HTML—The HyperText Markup Language
Web pages are currently written in a language called HTML (HyperText Markup Language). HTML allows users to produce Web pages that include text, graphics, and pointers to other Web pages. HTML is a markup language, a language for describing how documents are to be formatted. The term ''markup'' comes from the old days when copyeditors actually marked up documents to tell the printer—in those days, a human being—which fonts to use, and so on. Markup languages thus contain explicit commands for formatting. For example, in HTML, <b> means start boldface mode, and </b> means leave boldface mode. The advantage of a markup language over one with no explicit markup is that writing a browser for it is straightforward: the browser simply has to understand the markup commands. TeX and troff are other well-known examples of markup languages.
By embedding all the markup commands within each HTML file and standardizing them, it becomes possible for any Web browser to read and reformat any Web page. Being able to reformat Web pages after receiving them is crucial because a page may have been produced in a 1600 x 1200 window with 24-bit color but may have to be displayed in a 640 x 320 window configured for 8-bit color.
Below we will give a brief introduction to HTML, just to give an idea of what it is like. While it is certainly possible to write HTML documents with any standard editor, and many people do, it is also possible to use special HTML editors or word processors that do most of the work (but correspondingly give the user less control over all the details of the final result).
A Web page consists of a head and a body, each enclosed by <html> and </html> tags (formatting commands), although most browsers do not complain if these tags are missing. As can be seen from Fig. 7-26(a), the head is bracketed by the <head> and </head> tags and the body is bracketed by the <body> and </body> tags. The strings inside the tags are called directives. Most HTML tags have this format, that is they use, <something> to mark the beginning of something and </something> to mark its end. Most browsers have a menu item VIEW SOURCE or something like that. Selecting this item displays the current page's HTML source, instead of its formatted output.
Figure 7-26. (a) The HTML for a sample Web page. (b) The formatted page.
 
Tags can be in either lower case or upper case. Thus, <head> and <HEAD> mean the same thing, but newer versions of the standard require lower case only. Actual layout of the HTML document is irrelevant. HTML parsers ignore extra spaces and carriage returns since they have to reformat the text to make it fit the current display area. Consequently, white space can be added at will to make HTML documents more readable, something most of them are badly in need of. As another consequence, blank lines cannot be used to separate paragraphs, as they are simply ignored. An explicit tag is required.
Some tags have (named) parameters, called attributes. For example,
<img src="abc" alt="foobar">
is a tag, <img>, with parameter src set equal to abc and parameter alt set equal to foobar. For each tag, the HTML standard gives a list of what the permitted parameters, if any, are, and what they mean. Because each parameter is named, the order in which the parameters are given is not significant.
Technically, HTML documents are written in the ISO 8859-1 Latin-1 character set, but for users whose keyboards support only ASCII, escape sequences are present for the special characters, such as è. The list of special characters is given in the standard. All of them begin with an ampersand and end with a semicolon. For example, &nbsp; produces a space, &egrave; produces è and &eacute; produces é. Since <, >, and & have special meanings, they can be expressed only with their escape sequences, &lt;, &gt;, and &amp;, respectively.
The main item in the head is the title, delimited by <title> and </title>, but certain kinds of meta-information may also be present. The title itself is not displayed on the page. Some browsers use it to label the page's window.
Let us now take a look at some of the other features illustrated in Fig. 7-26. All of the tags used in Fig. 7-26 and some others are shown in Fig. 7-27. Headings are generated by an <hn> tag, where n is a digit in the range 1 to 6. Thus <h1> is the most important heading; <h6> is the least important one. It is up to the browser to render these appropriately on the screen. Typically the lower numbered headings will be displayed in a larger and heavier font. The browser may also choose to use different colors for each level of heading. Typically <h1> headings are large and boldface with at least one blank line above and below. In contrast, <h2> headings are in a smaller font with less space above and below.
Figure 7-27. A selection of common HTML tags. Some can have additional parameters.
The tags <b> and <i> are used to enter boldface and italics mode, respectively. If the browser is not capable of displaying boldface and italics, it must use some other method of rendering them, for example, using a different color for each or perhaps reverse video.
HTML provides various mechanisms for making lists, including nested lists. Lists are started with <ul> or <ol>, with <li> used to mark the start of the items in both cases. The <ul> tag starts an unordered list. The individual items, which are marked with the <li> tag in the source, appear with bullets (•) in front of them. A variant of this mechanism is <ol>, which is for ordered lists. When this tag is used, the <li> items are numbered by the browser. Other than the use of different starting and ending tags, <ul> and <ol> have the same syntax and similar results.
The <br>, <p>, and <hr> tags all indicate a boundary between sections of text. The precise format can be determined by the style sheet (see below) associated with the page. The <br> tag just forces a line break. Typically, browsers do not insert a blank line after <br>. In contrast, <p> starts a paragraph, which might, for example, insert a blank line and possibly some indentation. (Theoretically, </p> exists to mark the end of a paragraph, but it is rarely used; most HTML authors do not even know it exists.) Finally, <hr> forces a break and draws a horizontal line across the screen.
HTML allows images to be included in-line on a Web page. The <img> tag specifies that an image is to be displayed at the current position in the page. It can have several parameters. The src parameter gives the URL of the image. The HTML standard does not specify which graphic formats are permitted. In practice, all browsers support GIF amd JPEG files. Browsers are free to support other formats, but this extension is a two-edged sword. If a user is accustomed to a browser that supports, say, BMP files, he may include these in his Web pages and later be surprised when other browsers just ignore all of his wonderful art.
Other parameters of <img> are align, which controls the alignment of the image with respect to the text baseline (top, middle, bottom), alt, which provides text to use instead of the image when the user has disabled images, and ismap,a flag indicating that the image is an active map (i.e., clickable picture).
Finally, we come to hyperlinks, which use the <a> (anchor) and </a> tags. Like <img>, <a> has various parameters, including href (the URL) and name (the hyperlink's name). The text between the <a> and </a> is displayed. If it is selected, the hyperlink is followed to a new page. It is also permitted to put an <img> image there, in which case clicking on the image also activates the hyperlink.
As an example, consider the following HTML fragment:
<a href="http://www.nasa.gov"> NASA's home page </a>
When a page with this fragment is displayed, what appears on the screen is
NASA's home page
If the user subsequently clicks on this text, the browser immediately fetches the page whose URL is http://www.nasa.gov and displays it.
As a second example, now consider
<a href="http://www.nasa.gov"> <img src="shuttle.gif" alt="NASA"> </a>
When displayed, this page shows a picture (e.g., of the space shuttle). Clicking on the picture switches to NASA's home page, just as clicking on the underlined text did in the previous example. If the user has disabled automatic image display, the text NASA will be displayed where the picture belongs.
The <a> tag can take a parameter name to plant a hyperlink, to allow a hyperlink to point to the middle of a page. For example, some Web pages start out with a clickable table of contents. By clicking on an item in the table of contents, the user jumps to the corresponding section of the page.
HTML keeps evolving. HTML 1.0 and HTML 2.0 did not have tables, but they were added in HTML 3.0. An HTML table consists of one or more rows, each consisting of one or more cells. Cells can contain a wide range of material, including text, figures, icons, photographs, and even other tables. Cells can be merged, so, for example, a heading can span multiple columns. Page authors have limited control over the layout, including alignment, border styles, and cell margins, but the browsers have the final say in rendering tables.
An HTML table definition is listed in Fig. 7-28(a) and a possible rendition is shown in Fig. 7-28(b). This example just shows a few of the basic features of HTML tables. Tables are started by the <table> tag. Additional information can be provided to describe general properties of the table.
Figure 7-28. (a) An HTML table. (b) A possible rendition of this table.
 
The <caption> tag can be used to provide a figure caption. Each row begins with a <tr> (Table Row) tag. The individual cells are marked as <th> (Table Header) or <td> (Table Data). The distinction is made to allow browsers to use different renditions for them, as we have done in the example.
Numerous attributes are also allowed in tables. They include ways to specify horizontal and vertical cell alignments, justification within a cell, borders, grouping of cells, units, and more.
In HTML 4.0, more new features were added. These include accessibility features for handicapped users, object embedding (a generalization of the <img> tag so other objects can also be embedded in pages), support for scripting languages (to allow dynamic content), and more.
When a Web site is complex, consisting of many pages produced by multiple authors working for the same company, it is often desirable to have a way to prevent different pages from having a different appearance. This problem can be solved using style sheets. When these are used, individual pages no longer use physical styles, such as boldface and italics. Instead, page authors use logical styles such as <dn> (define), <em> (weak emphasis), <strong> (strong emphasis), and <var> (program variables). The logical styles are defined in the style sheet, which is referred to at the start of each page. In this way all pages have the same style, and if the Webmaster decides to change <strong> from 14-point italics in blue to 18-point boldface in shocking pink, all it requires is changing one definition to convert the entire Web site. A style sheet can be compared to an #include file in a C program: changing one macro definition there changes it in all the program files that include the header.
Forms
HTML 1.0 was basically one-way. Users could call up pages from information providers, but it was difficult to send information back the other way. As more and more commercial organizations began using the Web, there was a large demand for two-way traffic. For example, many companies wanted to be able to take orders for products via their Web pages, software vendors wanted to distribute software via the Web and have customers fill out their registration cards electronically, and companies offering Web searching wanted to have their customers be able to type in search keywords.
These demands led to the inclusion of forms starting in HTML 2.0. Forms contain boxes or buttons that allow users to fill in information or make choices and then send the information back to the page's owner. They use the <input> tag for this purpose. It has a variety of parameters for determining the size, nature, and usage of the box displayed. The most common forms are blank fields for accepting user text, boxes that can be checked, active maps, and submit buttons. The example of Fig. 7-29 illustrates some of these choices.
Figure 7-29. (a) The HTML for an order form. (b) The formatted page.
 
Let us start our discussion of forms by going over this example. Like all forms, this one is enclosed between the <form> and </form> tags. Text not enclosed in a tag is just displayed. All the usual tags (e.g., <b>) are allowed in a form. Three kinds of input boxes are used in this form.
The first kind of input box follows the text ''Name''. The box is 46 characters wide and expects the user to type in a string, which is then stored in the variable customer for later processing. The <p> tag instructs the browser to display subsequent text and boxes on the next line, even if there is room on the current line. By using <p> and other layout tags, the author of the page can control the look of the form on the screen.
The next line of the form asks for the user's street address, 40 columns wide, also on a line by itself. Then comes a line asking for the city, state, and country. No <p> tags are used between the fields here, so the browser displays them all on one line if they will fit. As far as the browser is concerned, this paragraph just contains six items: three strings alternating with three boxes. It displays them linearly from left to right, going over to a new line whenever the current line cannot hold the next item. Thus, it is conceivable that on a 1600 x 1200 screen all three strings and their corresponding boxes will appear on the same line, but on a 1024 x 768 screen they might be split over two lines. In the worst scenario, the word ''Country'' is at the end of one line and its box is at the beginning of the next line.
The next line asks for the credit card number and expiration date. Transmitting credit card numbers over the Internet should only be done when adequate security measures have been taken.
Following the expiration date we encounter a new feature: radio buttons. These are used when a choice must be made among two or more alternatives. The intellectual model here is a car radio with half a dozen buttons for choosing stations. The browser displays these boxes in a form that allows the user to select and deselect them by clicking on them (or using the keyboard). Clicking on one of them turns off all the other ones in the same group. The visual presentation is up to the browser. Widget size also uses two radio buttons. The two groups are distinguished by their name field, not by static scoping using something like <radiobutton> ... </radiobutton>.
The value parameters are used to indicate which radio button was pushed. Depending on which of the credit card options the user has chosen, the variable cc will be set to either the string ''mastercard'' or the string ''visacard''.
After the two sets of radio buttons, we come to the shipping option, represented by a box of type checkbox. It can be either on or off. Unlike radio buttons, where exactly one out of the set must be chosen, each box of type checkbox can be on or off, independently of all the others. For example, when ordering a pizza via Electropizza's Web page, the user can choose sardines and onions and pineapple (if she can stand it), but she cannot choose small and medium and large for the same pizza. The pizza toppings would be represented by three separate boxes of type checkbox, whereas the pizza size would be a set of radio buttons.
As an aside, for very long lists from which a choice must be made, radio buttons are somewhat inconvenient. Therefore, the <select> and </select> tags are provided to bracket a list of alternatives, but with the semantics of radio buttons (unless the multiple parameter is given, in which case the semantics are those of checkable boxes). Some browsers render the items located between <select> and </select> as a drop-down menu.
We have now seen two of the built-in types for the <input> tag: radio and checkbox. In fact, we have already seen a third one as well: text. Because this type is the default, we did not bother to include the parameter type = text, but we could have. Two other types are password and textarea. A password box is the same as a text box, except that the characters are not displayed as they are typed. A textarea box is also the same as a text box, except that it can contain multiple lines.
Getting back to the example of Fig. 7-29, we now come across an example of a submit button. When this is clicked, the user information on the form is sent back to the machine that provided the form. Like all the other types, submit is a reserved word that the browser understands. The value string here is the label on the button and is displayed. All boxes can have values; only here we needed that feature. For text boxes, the contents of the value field are displayed along with the form, but the user can edit or erase it. checkbox and radio boxes can also be initialized, but with a field called checked (because value just gives the text, but does not indicate a preferred choice).
When the user clicks the submit button, the browser packages the collected information into a single long line and sends it back to the server for processing. The & is used to separate fields and + is used to represent space. For our example form, the line might look like the contents of Fig. 7-30 (broken into three lines here because the page is not wide enough):
Figure 7-30. A possible response from the browser to the server with information
The string would be sent back to the server as one line, not three. If a checkbox is not selected, it is omitted from the string. It is up to the server to make sense of this string.

No comments:

Post a Comment

silahkan membaca dan berkomentar