XML
and XSL
HTML, with or without forms, does
not provide any structure to Web pages. It also mixes the content with the
formatting. As e-commerce and other applications become more common, there is
an increasing need for structuring Web pages and separating the content from
the formatting. For example, a program that searches the Web for the best price
for some book or CD needs to analyze many Web pages looking for the item's
title and price. With Web pages in HTML, it is very difficult for a program to
figure out where the title is and where the price is.
For this reason, the W3C has
developed an enhancement to HTML to allow Web pages to be structured for
automated processing. Two new languages have been developed for this purpose.
First, XML (eXtensible Markup Language) describes Web content in a structured
way and second, XSL (eXtensible Style Language) describes the formatting
independently of the content. Both of these are large and complicated topics,
so our brief introduction below just scratches the surface, but it should give
an idea of how they work.
Consider the example XML document of
Fig. 7-31. It defines a structure called book_list,
which is a list of books. Each book has three fields, the title, author, and year
of publication. These structures are extremely simple. It is permitted to have
structures with repeated fields (e.g., multiple authors), optional fields
(e.g., title of included CD-ROM), and alternative fields (e.g., URL of a
bookstore if it is in print or URL of an auction site if it is out of print).
In this example, each of the three
fields is an indivisible entity, but it is also permitted to further subdivide
the fields. For example, the author field could have been done as follows to
give a finer-grained control over searching and formatting:
<author>
<first_name> Andrew </first_name>
<last_name> Tanenbaum
</last_name>
</author>
Each field can be subdivided into
subfields and subsubfields arbitrarily deep.
All the file of Fig. 7-31 does is define a book list containing
three books. It says nothing about how to display the Web page on the screen.
To provide the formatting information, we need a second file, book_list.xsl,
containing the XSL definition. This file is a style sheet that tells how to
display the page. (There are alternatives to style sheets, such as a way to
convert XML into HTML, but these alternatives are beyond the scope of this
book.)
A sample XSL file for formatting Fig. 7-31 is given in Fig. 7-32. After some necessary declarations,
including the URL of the XSL standard, the file contains tags starting with <html> and <body>. These define the start of the Web page, as
usual. Then comes a table
definition, including the headings for the three columns. Note that in addition
to the <th> tags there are </th> tags as
well, something we did not bother with so far. The XML and XSL specifications
are much stricter than HTML specification. They state that rejecting
syntactically incorrect files is mandatory, even if the browser can determine
what the Web designer meant. A browser that accepts a syntactically incorrect
XML or XSL file and repairs the errors itself is not conformant and will be
rejected in a conformance test. Browsers are allowed to pinpoint the error,
however. This somewhat draconian measure is needed to deal with the immense
number of sloppy Web pages currently out there.
The statement
<xsl:for-each
select="book_list/book">
is analogous to a for
statement in C. It causes the browser to iterate the loop body (ended by <xsl:for-each>) one iteration for each book. Each iteration outputs five lines:
<tr>, the title, author, and year, and </tr>. After the loop, the closing tags </body> and </html> are output. The result of the browser's interpreting this
style sheet is the same as if the Web page contained the table in-line.
However, in this
format, programs can analyze the XML
file and easily find books published after 2000, for example. It is worth
emphasizing that even though our XSL file contained a kind of a loop, Web pages
in XML and XSL are still static since they simply contain instructions to the
browser about how to display the page, just as HTML pages do. Of course, to use
XML and XSL, the browser has to be able to interpret XML and XSL, but most of
them already have this capability. It is not yet clear whether XSL will take
over from traditional style sheets.
We have not shown how to do this,
but XML allows the Web site designer to make up definition files in which the
structures are defined in advance. These definition files can be included,
making it possible to use them to build complex Web pages. For additional
information on this and the many other features of XML and XSL, see one of the
many books on the subject. Two examples are (Livingston, 2002; and Williamson,
2001).
Before ending our discussion of XML
and XSL, it is worth commenting on a ideological battle going on within the WWW
consortium and the Web designer community. The original goal of HTML was to
specify the structure of the document, not its appearance. For example,
<h1>
Deborah's Photos </h1>
instructs the browser to emphasize
the heading, but does not say anything about the typeface, point size, or
color. That was left up to the browser, which knows the properties of the
display (e.g., how many pixels it has). However, many Web page designers wanted
absolute control over how their pages appeared, so new tags were added to HTML
to control appearance, such as
<font
face="helvetica" size="24" color="red">
Deborah's Photos </font>
Also, ways were added to control
positioning on the screen accurately. The trouble with this approach is that it
is not portable. Although a page may render perfectly with the browser it is
developed on, with another browser or another release of the same browser or a
different screen resolution, it may be a complete mess. XML was in part an
attempt to go back to the original goal of specifying just the structure, not
the appearance of a document. However, XSL is also provided to manage the
appearance. Both formats can be misused, however. You can count on it.
XML can be used for purposes other
than describing Web pages. One growing use of it is as a language for
communication between application programs. In particular, SOAP (Simple Object
Access Protocol) is a way for performing RPCs between applications in a
language- and system-independent way. The client constructs the request as an
XML message and sends it to the server, using the HTTP protocol (described
below). The server sends back a reply as an XML formatted message. In this way,
applications on heterogeneous platforms can communicate.
HTML keeps evolving to meet new
demands. Many people in the industry feel that in the future, the majority of
Web-enabled devices will not be PCs, but wireless, handheld PDA-type devices.
These devices have limited memory for large browsers full of heuristics that
try to somehow deal with syntactically incorrect Web pages. Thus, the next step
after HTML 4 is a language that is Very Picky. It is called XHTML (eXtended
HyperText Markup Language) rather than HTML 5 because it is essentially HTML 4
reformulated in XML. By this we mean that tags such as <h1> have no intrinsic meaning. To get the HTML 4 effect, a
definition is needed in the XSL file. XHTML is the new Web standard and should
be used for all new Web pages to achieve maximum portability across platforms
and browsers.
There are six major differences and
a variety of minor differences between XHTML and HTML 4, Let us now go over the
major differences. First, XHTML pages and browsers must strictly conform to the
standard. No more shoddy Web pages. This property was inherited from XML.
Second, all tags and attributes must
be in lower case. Tags like <HTML> are not valid in XHTML. The use of tags like <html> is now mandatory. Similarly, <img SRC="pic001.jpg"> is also forbidden because it contains an upper-case
attribute.
Third, closing tags are required,
even for </p>. For tags that have no natural closing tag, such as <br>, <hr>, and <img>, a slash must precede the closing ''>,'' for example
<img
src="pic001.jpg" />
Fourth, attributes must be contained
within quotation marks. For example,
<img
SRC="pic001.jpg" height=500 />
is no longer allowed. The 500 has to
be enclosed in quotation marks, just like the name of the JPEG file, even
though 500 is just a number.
Fifth, tags must nest properly. In
the past, proper nesting was not required as long as the final state achieved
was correct. For example,
<center>
<b> Vacation Pictures </center> </b>
used to be legal. In XHTML it is
not. Tags must be closed in the inverse order that they were opened.
Sixth, every document must specify
its document type. We saw this in Fig. 7-32, for example. For a discussion of all
the changes, major and minor, see www.w3.org.
No comments:
Post a Comment
silahkan membaca dan berkomentar