Translate

Wednesday, September 7, 2016

XML and XSL



XML and XSL

HTML, with or without forms, does not provide any structure to Web pages. It also mixes the content with the formatting. As e-commerce and other applications become more common, there is an increasing need for structuring Web pages and separating the content from the formatting. For example, a program that searches the Web for the best price for some book or CD needs to analyze many Web pages looking for the item's title and price. With Web pages in HTML, it is very difficult for a program to figure out where the title is and where the price is.
For this reason, the W3C has developed an enhancement to HTML to allow Web pages to be structured for automated processing. Two new languages have been developed for this purpose. First, XML (eXtensible Markup Language) describes Web content in a structured way and second, XSL (eXtensible Style Language) describes the formatting independently of the content. Both of these are large and complicated topics, so our brief introduction below just scratches the surface, but it should give an idea of how they work.
Consider the example XML document of Fig. 7-31. It defines a structure called book_list, which is a list of books. Each book has three fields, the title, author, and year of publication. These structures are extremely simple. It is permitted to have structures with repeated fields (e.g., multiple authors), optional fields (e.g., title of included CD-ROM), and alternative fields (e.g., URL of a bookstore if it is in print or URL of an auction site if it is out of print).
Figure 7-31. A simple Web page in XML.
In this example, each of the three fields is an indivisible entity, but it is also permitted to further subdivide the fields. For example, the author field could have been done as follows to give a finer-grained control over searching and formatting:
<author>
  <first_name> Andrew </first_name>
  <last_name> Tanenbaum </last_name>
</author>
Each field can be subdivided into subfields and subsubfields arbitrarily deep.
All the file of Fig. 7-31 does is define a book list containing three books. It says nothing about how to display the Web page on the screen. To provide the formatting information, we need a second file, book_list.xsl, containing the XSL definition. This file is a style sheet that tells how to display the page. (There are alternatives to style sheets, such as a way to convert XML into HTML, but these alternatives are beyond the scope of this book.)
A sample XSL file for formatting Fig. 7-31 is given in Fig. 7-32. After some necessary declarations, including the URL of the XSL standard, the file contains tags starting with <html> and <body>. These define the start of the Web page, as
Figure 7-32. A style sheet in XSL.
usual. Then comes a table definition, including the headings for the three columns. Note that in addition to the <th> tags there are </th> tags as well, something we did not bother with so far. The XML and XSL specifications are much stricter than HTML specification. They state that rejecting syntactically incorrect files is mandatory, even if the browser can determine what the Web designer meant. A browser that accepts a syntactically incorrect XML or XSL file and repairs the errors itself is not conformant and will be rejected in a conformance test. Browsers are allowed to pinpoint the error, however. This somewhat draconian measure is needed to deal with the immense number of sloppy Web pages currently out there.
The statement
<xsl:for-each select="book_list/book">
is analogous to a for statement in C. It causes the browser to iterate the loop body (ended by <xsl:for-each>) one iteration for each book. Each iteration outputs five lines: <tr>, the title, author, and year, and </tr>. After the loop, the closing tags </body> and </html> are output. The result of the browser's interpreting this style sheet is the same as if the Web page contained the table in-line. However, in this
format, programs can analyze the XML file and easily find books published after 2000, for example. It is worth emphasizing that even though our XSL file contained a kind of a loop, Web pages in XML and XSL are still static since they simply contain instructions to the browser about how to display the page, just as HTML pages do. Of course, to use XML and XSL, the browser has to be able to interpret XML and XSL, but most of them already have this capability. It is not yet clear whether XSL will take over from traditional style sheets.
We have not shown how to do this, but XML allows the Web site designer to make up definition files in which the structures are defined in advance. These definition files can be included, making it possible to use them to build complex Web pages. For additional information on this and the many other features of XML and XSL, see one of the many books on the subject. Two examples are (Livingston, 2002; and Williamson, 2001).
Before ending our discussion of XML and XSL, it is worth commenting on a ideological battle going on within the WWW consortium and the Web designer community. The original goal of HTML was to specify the structure of the document, not its appearance. For example,
<h1> Deborah's Photos </h1>
instructs the browser to emphasize the heading, but does not say anything about the typeface, point size, or color. That was left up to the browser, which knows the properties of the display (e.g., how many pixels it has). However, many Web page designers wanted absolute control over how their pages appeared, so new tags were added to HTML to control appearance, such as
<font face="helvetica" size="24" color="red"> Deborah's Photos </font>
Also, ways were added to control positioning on the screen accurately. The trouble with this approach is that it is not portable. Although a page may render perfectly with the browser it is developed on, with another browser or another release of the same browser or a different screen resolution, it may be a complete mess. XML was in part an attempt to go back to the original goal of specifying just the structure, not the appearance of a document. However, XSL is also provided to manage the appearance. Both formats can be misused, however. You can count on it.
XML can be used for purposes other than describing Web pages. One growing use of it is as a language for communication between application programs. In particular, SOAP (Simple Object Access Protocol) is a way for performing RPCs between applications in a language- and system-independent way. The client constructs the request as an XML message and sends it to the server, using the HTTP protocol (described below). The server sends back a reply as an XML formatted message. In this way, applications on heterogeneous platforms can communicate.
XHTML—The eXtended HyperText Markup Language
HTML keeps evolving to meet new demands. Many people in the industry feel that in the future, the majority of Web-enabled devices will not be PCs, but wireless, handheld PDA-type devices. These devices have limited memory for large browsers full of heuristics that try to somehow deal with syntactically incorrect Web pages. Thus, the next step after HTML 4 is a language that is Very Picky. It is called XHTML (eXtended HyperText Markup Language) rather than HTML 5 because it is essentially HTML 4 reformulated in XML. By this we mean that tags such as <h1> have no intrinsic meaning. To get the HTML 4 effect, a definition is needed in the XSL file. XHTML is the new Web standard and should be used for all new Web pages to achieve maximum portability across platforms and browsers.
There are six major differences and a variety of minor differences between XHTML and HTML 4, Let us now go over the major differences. First, XHTML pages and browsers must strictly conform to the standard. No more shoddy Web pages. This property was inherited from XML.
Second, all tags and attributes must be in lower case. Tags like <HTML> are not valid in XHTML. The use of tags like <html> is now mandatory. Similarly, <img SRC="pic001.jpg"> is also forbidden because it contains an upper-case attribute.
Third, closing tags are required, even for </p>. For tags that have no natural closing tag, such as <br>, <hr>, and <img>, a slash must precede the closing ''>,'' for example
<img src="pic001.jpg" />
Fourth, attributes must be contained within quotation marks. For example,
<img SRC="pic001.jpg" height=500 />
is no longer allowed. The 500 has to be enclosed in quotation marks, just like the name of the JPEG file, even though 500 is just a number.
Fifth, tags must nest properly. In the past, proper nesting was not required as long as the final state achieved was correct. For example,
<center> <b> Vacation Pictures </center> </b>
used to be legal. In XHTML it is not. Tags must be closed in the inverse order that they were opened.
Sixth, every document must specify its document type. We saw this in Fig. 7-32, for example. For a discussion of all the changes, major and minor, see www.w3.org.

No comments:

Post a Comment

silahkan membaca dan berkomentar