UNIFORM
RESOURCE LOCATORS (URL) AND UNIVERSAL
RESOURCE
IDENTIFIERS (URI)
Before
turning to a description of the Hypertext Transfer Protocol (HTTP), we
need
to examine two important concepts: the Uniform Resource Locator (URL)
and
the Universal Resource Identifier (URI).
Uniform Resource Locator
A
key concept in the operation of the World-Wide Web (WWW) is that of Uniform
Resource
Locator (URL). In the defining documents (RFC 1738, 1808), the URL,
is
characterized as follows:
A
Uniform Resource Locator (UlU) is a compact representation of the location
and
access method for a resource available via the Internet. UlUs are used to
locate
resources by providing an abstract identification of the resource location.
Having
located a resource, a system may perform a variety of operations on the
resource,
as might be characterized by such words as access, update, replace, and
find
attributes. In general, only the access method needs to be specified for any
URL
scheme.
A
resource
is
any object that can be accessed by the Internet, and includes file
directories,
files, documents, images, audio or video clips, and any other data that
may
be stored on an Internet-connected computer. The term resource
in
this context
also
includes electronic mail addresses, the results of a finger or archie command,
USENET
newsgroups, and individual messages in a USENET newsgroup.
With
the exception of certain dynamic URLs, such as the email address, we
can
think of a URL as a networked extension of a filename. The URL provides a
pointer
to any object that is accessible on any machine connected to the Internet.
Furthermore,
because different objects are accessible in different ways (e.g., via
Web,
FTP, Gopher, etc., the URL also indicates the access method that must be
used
to retrieve the object.
The
general form of a URL is as follows:
The
URL consists of the name of the access scheme being used, followed by a
colon,
and then by an identifier of a resource whose format is specific to the scheme
being
used.
Although
the scheme-specific formats differ, they have a number of points in
common,
as we will see. In particular, many of the access schemes support the use
of
hierarchical structures, similar to the hierarchical directory and file
structures
common
to file systems such as UNIX. For the URL, the components of the hierarchy
are
separated by a "I", similar to the UNIX approach.
RFC
1738 defines URL formats for the following access schemes:
,
File
Transfer Protocol (FTP)
The
FTP URL scheme designates files and directories accessible using the FTP
protocol.
In
its simplest form, an FTP URL has the following format:
After the specification of the host, with an optional user-ID and password,
and
a port number, a slash indicates the beginning of the file designation. Each of
the
<cwd> elements is a directory name, or, more precisely, an argument to a
CWD
(change
working directory) command, such as is used in UNIX. The <name> value,
if
present, is the name of a file. Finally, the <typecode> value can be used
to designate
a
particular type of file; otherwise, the type defaults in an
implementationdependent
way.
Hypertext
Transfer Protocol (HTTP)
The
HTTP URL scheme designates accessible Internet resources, using the HTTP
protocol,
and, in particular, designates web sites. In its simplest form, an HTTP
URL
has the following format:
The
Gopher Protocol
The
FTP URL scheme designates files and directories accessible using the FTP
protocol.
A
Gopher URL takes the form:
Selects
the Gopher-accessible telephone directory at M.I.T. This directory is
searchable
by
keyword. A user who accesses this directory can then interactively enter a key
word
to initiate a search. Alternatively, this can be part of the URL; for example
Electronic
Mail Address
The
mailto URL scheme designates the Internet mailing address of an individual or
service.
When invoked by a web client, it triggers the creation of an email message
to
be sent by Internet electronic mail. For example,
USENET
News
The
news URL scheme designates either a news group or the individual articles of
USENET
news. For example,
USENET
News Using NNTP Access
The
NNTP URL scheme is an alternative way of designating news articles, useful
for
specifying articles from NNTP servers. The general form is
Reference
to Interactive Sessions (TELNET)
The
TELNET URL scheme designates interactive services accessible by the
TELNET
protocol. Thus, this URL does not designate a data object but a service.
Wide
Area Information Servers (WAIS)
The
WAIS URL scheme designates WAIS databases, searches, or individual documents
available
from a WAIS database. A WAIS takes one of the following forms:
The
first form designates a WAIS database. The second form designates a
search
submitted to a database. The third form designates a particular document
within
a database, where <wtype> is the WAIS designation of the document type.
Host-Specific
File Names
The
file URL scheme differs from other URL schemes in that it does not designate
an
Internet-accessible object or service. It provides a way of uniquely
identifying a
directory
or file on an Internet-addressable host, but does not designate an access
protocol.
Thus, it has limited utility in a network context.
Prospero
Directory Service
The
Prospero URL scheme designates resources that are accessed via the Prospero
Directory
Service. A prosper0 URL takes the form
where
<hsoname> is the host-specific object name in the Prospero protocol. The
optional
clause <field>=<value> serves to identify a particular target
entry.
Universal Resource Identifier
Universal
Resource Identifier (URI) is a term for a generic WWW identifier. The
URI
specification (RFC 1630) defines a syntax for encoding arbitrary naming or
addressing
schemes, and provides a list of such schemes. The concept of a URI, and
in
particular its details, are still evolving. The URL is a type of URI, in which
an
access
protocol is designated and a specific Internet address is provided.
The
potential advantage of the URI is that it decouples the name of a resource
from
its location and even from its access method. With the URL, a specific instance
of
a resource at a specific location is designated. If there are multiple instances, and
that
specific instance is unavailable at the time of a request, then a requester
must
determine
an alternative URL and try that. In principle, with a URI, this process
could
be automated. In practice, documents such as the HTTP specification refer to
the
use of URIs, but are currently implemented using only URLs.
HYPERTEXT
TRANSFER PROTOCOL (HTTP)
The
Hypertext Transfer Protocol (HTTP) is the foundation protocol of the worldwide
web
(WWW) and can be used in any client-server application involving
hypertext.
The name is somewhat misleading in that HTTP is not a protocol for
transferring
hypertext; rather, it is a protocol for transmitting information with
the
efficiency necessary for making hypertext jumps. The data transferred by the
protocol
can be plain text, hypertext, audio, images, or any Internet-accessible
information.
We
begin with an overview of HTTP concepts and operation and then look
at
some of the details4 A number of important terms defined in the HTTP
speclfication
are
summarized in Table 19.11; these will be introduced as the
discussion
proceeds.
HTTP
Overview
HTTP
is a transaction-oriented clientlserver protocol. The most typical use of
HTTP
is between a web browser and a web server. To provide reliability, HTTP
makes
use of TCP. Nevertheless, HTTP is a "stateless" protocol: Each
transaction
is
treated independently. Accordingly, a typical implementation will create a new
TCP
connection between client and server for each transaction and then terminate
the
connection as soon as the transaction completes, although the specification
does
not dictate this one-to-one relationship between transaction and connection
lifetimes.
The
stateless nature of HTTP is well-suited to its typical application. A normal
session
of a user with a web browser involves retrieving a sequence of web
pages
and documents. The sequence is, ideally, performed rapidly, and the locations
of
the various pages and documents may be a number of widely distributed servers.
Another
important feature of HTTP is that it is flexible in the formats that it
can
handle. When a client issues a request to a server, it may include a
prioritized
list
of formats that it can handle, and the server replies with the appropriate
format.
For
example, a Lynx browser cannot handle images, so a web server need not transmit
any
images on web pages. This arrangement prevents the transmission of unnecessary
information
and provides the basis for extending the set of formats with new
standardized
and proprietary specifications.
Figure
19.13 illustrates three examples of HTTP operation. The simplest case
is
one in which a user agent establishes a direct connection with an origin
server. The
user agent is the client that initiates the
request, such as a web browser being run on
behalf
of an end user. The origin server is the server on
which a resource of interest
resides;
an example is a web server at which a desired web home page resides. For
this
case, the client opens a TCP connection that is end-to-end between the client
and
the server. The client then issues an HTTP request. The request consists of a
specific
command, referred to as a
method,
a URL, and a MIME-like message containing
request
parameters, information about the client, and perhaps some additional
content
information.
When
the server receives the request, it attempts to perform the requested
action
and then returns an HTTP response. The response includes status information,
a
success/error code, and a MIME-like message containing information about
the
server, information about the response itself, and possible body content. The
TCP
connection is then closed.
The
middle part of Figure 19.13 shows a case in which there is not an end-toend
TCP
connection between the user agent and the origin server. Instead, there
are
one or more intermediate systems with TCP connections between logically
adjacent
systems.
Each intermediate system acts as a relay, so that a request initiated by
the
client is relayed through the intermediate systems to the server, and the
response
from the server is relayed back to the client.
Three
forms of intermediate systems are defined in the HTTP specification:
proxy,
gateway, and
tunnel, all of which are illustrated in Figure 19.14.
Proxy
A proxy acts on behalf of other clients and presents
requests from other clients to
a
server. The proxy acts as a server in interacting with a client, and as a
client in
interacting
with a server. There are several scenarios that call for the use of a proxy:
1. Security intermediary. The client and
server may be separated by a security
intermediary
such as a firewall, with the proxy on the client side of the firewall.
Typically,
the client is part of a network secured by a firewall, and the
server
is external to the secured network. In this case, the server must authenticate
itself
to the firewall to set up a connection with the proxy. The proxy
accepts
responses after they have passed through the firewall.
2.
Different versions of HTTP. If the client and server are running
different versions
of
HTTP, then the proxy can implement both versions and perform the
required
mapping.
In
summary, a proxy is a forwarding agent, receiving a request for a URL
object,
modifying the request, and forwarding that request toward the server identified
in
the URL.
Gateway
A
gateway is a server that appears to the client as if it were an origin server.
It acts
on
behalf of other servers that may not be able to communicate directly with a
client.
There are several scenarios in which servers can be used:
1.
Security intermediary. The client and server may be separated by a security
intermediary
such as a firewall, with the gateway on the server side of the firewall.
Typically,
the server is connected to a network protected by a firewall,
with
the client external to the network. In this case, the client must authenticate
itself
to the proxy, which can then pass the request on to the server.
2.
Non-HTTP server. Web browsers have built into them the capability to
contact
servers
for protocols other than HTTP, such as FTP and Gopher servers.
This
capability can also be provided by a gateway. The client makes an HTTP
request
to a gateway server. The gateway server then contacts the relevant
FTP
or Gopher server to obtain the desired result. This result is then converted
into
a form suitable for HTTP and transmitted back to the client.
Tunnel
Unlike
the proxy and the gateway, the tunnel performs no operations on HTTP
requests
and responses. Instead, a tunnel is simply a relay point between two TCP
connections,
and the HTTP messages are passed unchanged as if there were a single
HTTP
connection between user agent and origin server. Tunnels are used when
there
must be an intermediary system between client and server, but it is not
necessary
for
that system to understand the contents of messages. An example is a firewall
in
which a client or server external to a protected network can establish an
authenticated
connection, and which can then maintain that connection for purposes
of
HTTP transactions.
Cache
Returning
to Figure 19.13, the lowest portion of the figure shows an example of a
cache.
A cache is a facility that may store previous requests and responses for
handling
new
requests. If a new request arrives that is the same as a stored request, then
the
cache can supply the stored response rather than accessing the resource
indicated
in
the URL. The cache can operate on a client or server, or on an intermediate
system
other than a tunnel. In the figure, intermediary B has cached a
requestlresponse
transaction, so that a corresponding new request from the client
need
not travel the entire chain to the origin server, but is handled by B.
Not
all transactions can be cached, and a client or server can dictate that a
certain
transaction
may be cached only for a given time limit.
Messages
The
best way to describe the functionality of HTTP is to describe the individual
elements
of
the HTTP message. HTTP consists of two types of messages: requests
from
clients to servers, and responses from servers to clients. The general
structure
of
such messages is shown in Figure 19.15. More formally, using enhanced BNF
(Backus-Naur
Form) notation (Table 19.12), we have
HTTP-Message
= Simple-Request I Simple-Response I Full-Request I Full-
Response
Full-Request
= Request-Line
*(
General-Header
I
Request-Header
I
Entity-Header
)
CRLF
[ Entity-Body
]
Full-Response
= Status-Line
*(
General-Header
I
Response-Header
I
Entity-Header
)
CRLF
[ Entity-Body
]
Simple-Request
= "GET"
SP Request-URI CRLF
Simple-Response
= [ Entity-Body ]
The
Simple-Request and Simple-Response messages were defined in
HTTPl0.9.
The request is a simple GET command with the requested URI; the
response
is simply a block containing the information identified in the URI. In
HTTPl1.1, the use of these simple forms is discouraged because it prevents the
client
from using content negotiation and the server from identifying the media type
of
the returned entity.
With
full requests and responses, the following fields are used:
Request-line.
Identifies
the message type and the requested resource.
Response-line.
Provides
status information about this response.
General-header.
Contains
fields that are applicable to both request and
response
messages, but which do not apply to the entity being transferred.
Request-header.
Contains
information about the request and the client.
Response-header.
Contains
information about the response.
Entity-header.
Contains
information about the resource identified by the
request
and information about the entity body.
Entity-body.
The
body of the message.
All
of the HTTP headers consist of a sequence of fields, following the same
generic
format as RFC 822 (described in Section 19.3). Each field begins on a new
line
and consists of the field name followed by a colon and the field value.
Although
the basic transaction mechanism is simple, there are a large number
of
fields and parameters defined in HTTP; these are listed in Table 19.13. In the
remainder
of this section, we look at the general header fields. Succeeding sections
describe
request headers, response headers, and entities.
General
Header Fields
General
header fields can be used in both request and response messages. These
fields
are applicable in both types of messages and contain information that does
not
directly apply to the entity being transferred. The fields are the following:
Cache-Control.
Specifies
directives that must be obeyed by any caching mechanisms
along
the requestlresponse chain; the purpose is to prevent a cache
from
adversely interfering with this particular request or response.
Connection.
Contains
a list of keywords and header-field names that only
apply
to this TCP connection between the sender and the nearest non-tunnel
recipient.
Data.
Data
and time at which the message originated.
Forwarded.
Used
by gateways and proxies to indicate intermediate steps
along
a request or response chain. Each gateway or proxy that handles a message
may
attach a Forwarded field that gives its URI.
Keep-Alive.
May
be present if the Keep-Alive keyword is present in an
incoming
Connection field, to provide information to the requester of the persistent
connection.
This field may indicate a maximum time that the sender
will
keep the connection open while waiting for the next request or the maximum
number
of additional requests that will be allowed on the current persistent
connection.
MIME-Version.
Indicates
that the message complies with the indicated version
of
MIME.
Pragma.
Contains
implementation-specific directives that may apply to any
recipient
along the requestlresponse chain.
Upgrade.
Used
in a request to specify what additional protocols the client
supports
and would like to use; used in a response to indicate which protocol
will
be used.
Two
of these fields warrant further elaboration: Cache-Control and Connection.
Cache-Control
A
Cache-Control field can be attached to either a request or a response. Any
caching
mechanisms that receive a message with this header must follow the directives
in
the header, which may mean deviating from the default caching action. This
field
has the following format:
That
is, this field consists of the phrase "Cache-Control:" followed by
one or more
directives.
A
cachable directive is included in a response to indicate that the server
generating
the
response declares it to be cachable. Any caching mechanism that forwards
this
response may cache it for future use.
A
max-age directive is used in a request to inform any caching mechanism
en
route
that it may use a cached response to this message only if it has a cached
response
that is no older than the age specified. A server may include this directive
in
a response to inform any caching mechanism en route that it may cache this
response
for future requests up to the max-age time limit.
A
private directive in a response indicates that parts of the response
message
are
intended for a single user and must not be cached except within a non-shared
cache
controlled by the user agent. If no field names are listed, the entire message
is
private.
A
no-cache directive in a request forces that request to be forwarded to
the
origin
server and not answered by an intermediate cache. This directive allows a
client
to request an authoritative response or to refresh a suspect cache. The list of
field
names is not used in a request message. In a response, the no-cache directive
indicates
that part or all of the message must not be cached for future use.
Connection
A
Connection field can be attached to either a request or a response. It is used
to
communicate
from one end point of a TCP connection to the other end point. Thus,
this
field is not end-to-end at the HTTP level. When an intermediary system
receives
and forwards a message containing this field, that system must remove the
field
prior to forwarding.
The
body of this field may include one or more field names for fields included
in
this message. These fields are to be processed by the recipient and not
forwarded
with
the rest of the message. Alternatively, the body may consist of one or more
keywords.
At present, only the Keep-Alive keyword is defined in version 1.1 of
HTTP;
this indicates that the sender would like a persistent TCP connection (one
that
remains open beyond the current transaction).
Request Messages
A
full-request message consists of a status line followed by one or more general,
request,
and entity headers, followed by an optional entity body.
Request
Methods
A
full request message always begins with a Request-Line, which has the following
format:
Request-Line
= Method SP
Request-URI SP HTTP-Version CRLF
The
Method parameter indicates the actual request command, called a
method
in
HTTP. Request-URI is the URI of the requested resource, and HTTPVersion
is
the version number of HTTP used by the sender.
The
following request methods are defined in HTTPl1.1:
OPTIONS.
A request for information about the options available for the
requestlresponse
chain identified by this URI.
GET.
A request to retrieve the information identified in the URI and return
it
in an entity body. A GET is conditional if the If-Modified-Since header field
is
included, and is partial if a Range header field is included.
HEAD.
This request is identical to a GET, except that the server's response
must
not include an entity body; all of the header fields in the response are the
same
as if the entity body were present; this enables a client to get information
about
a resource without transferring the entity body.
POST.
A
request to accept the attached entity as a new subordinate to the
identified
URI. The posted entity is subordinate to that URI in the same way
that
a file is subordinate to a directory containing it, a news article is
subordinate
to
a newsgroup to which it is posted, or a record is subordinate to a database.
PUT.
A
request to accept the attached entity and store it under the supplied
URI.
This may be a new resource with a new URI, or a replacement of the
contents
of an existing resource with an existing URI.
PATCH.
Similar
to a PUT, except that the entity contains a list of differences
from
the content of the original resource identified in the URI.
COPY.
Requests
that a copy of the resource identified by the URI in the
Request-Line
be copied to the location(s) given in the URI-Header field in
the
Entity-Header of this message.
MOVE.
Requests
that the resource identified by the URI in the Request-Line
be
moved to the location(s) given in the URI-Header field in the Entity-
Header
of this message; equivalent to a COPY followed by a DELETE.
DELETE.
Requests
that the origin server delete the resource identified by
the
URI in the Request-Line.
LINK.
Establishes
one or more link relationships from the resource identified
in
the Request-Line. The links are defined in the Link field in the Entity-
Header.
UNLINK.
Removes
one or more link relationships from the resource identified
in
the Request-Line. The links are defined in the Link field in the Entity-
Header.
TRACE.
Requests
that the server return whatever is received as the entity
body
of the response; this can be used for testing and diagnostic purposes.
WRAPPED.
Allows
a client to send one or more encapsulated requests. The
requests
may be encrypted or otherwise processed. The server must unwrap
the
requests and process accordingly.
Extension-method.
Allows
additional methods to be defined without changing
the
protocol, but these methods cannot be assumed to be recognizable by
the
recipient.
Request
Header Fields
Request
header fields function as request modifiers, providing additional information
and
parameters related to the request. The following fields are defined in
HTTPI1
.l:
Accept.
A
list of media types and ranges that are acceptable as a response to
this
request.
Accept-charset.
A
list of character sets acceptable for the response.
Accept-encoding.
List
of acceptable content encodings for the entity body.
Content
encodings are primarily used to allow a document to be compressed
or
encrypted. Typically, the resource is stored in this encoding and only
decoded
before actual use.
Accept-language.
Restricts
the set of natural languages that are preferred for
the
response.
Authorization.
Contains
a field value, referred to as credentials, used by the
client
to authenticate itself to the server.
From.
The
Internet e-mail address for the human user who controls the
requesting
user agent.
Host.
Specifies
the Internet host of the resource being requested.
If-modified-since.
Used
with the GET method. This header includes a
dateltime
parameter; the resource is to be transferred only if it has been modified
since
the dateltime specified. This feature allows for efficient cache
update.
A caching mechanism can periodically issue GET messages to an origin
server,
and will receive only a small response message unless an update is
needed.
Proxy-authorization.
Allows
the client to identify itself to a proxy that
requires
authentication.
Range.
For
future study. The intent is that, in a GET message, a client can
request
only a portion of the identified resource.
Referer.
The
URI of the resource from which the Request-URI was obtained.
This
enables a server to generate lists of back-links.
Unless.
Similar
in function to the If-Modified-Since field, with two differences:
(1)
It is not restricted to the GET method, and (2) comparison is based
on
any Entity-Header field value rather than a dateltime value.
User-agent.
Contains
information about the user agent originating this request.
This
is used for statistical purposes, the tracing of protocol violations,
and
automated recognition of user agents for the sake of tailoring responses
to
avoid particular user agent limitations.
Response
Messages
A
full-response message consists of a status line followed by one or more
general,
response,
and entity headers, followed by an optional entity body.
Status
Codes
A
full-response message always begins with a Status-Line, which has the following
format:
Status-Line
= HTTP-Version SP
Status-Code SP Reason-Phrase CRLF
The
HTTP-Version value is the version number of HTTP used by the sender.
The
Status-Code is a 3-digit integer that indicates the response to a received
request,
and the Reason-Phrase provides a short textual explanation of the status
code.
There
are a rather large number of status codes defined in HTTPl1.1; these
are
listed in Table 19.14, together with a brief definition. The codes are
organized
into
the following categories:
Informational.
The
request has been received and processing continues. No
entity
body accompanies this response.
Successful.
The
request was successfully received, understood, and accepted.
The
information returned in the response message depends on the request
method,
as follows:
-GET:
The contents of the entity-body corresponds to the requested
resource.
-HEAD:
No entity body is returned.
-POST:
The entity describes or contains the result of the action.
-TRACE:
The entity contains the request message.
-Other
methods: The entity describes the result of the action.
Redirection.
Further
action is required to complete the request.
Client
error. The
request contains a syntax error or the request cannot be fulfilled.
Server
error. The
server failed to fulfill an apparently valid request.
Response
Header Fields
Response
header fields providing additional information related to the response
that
cannot be placed in the Status-Line. The following fields are defined in
HTTPI1
.l:
Location.
Defines
the exact location of the resource identified by the
Request-URI.
Proxy-authenticate.
Included
with a response that has a status code of Proxy
Authentication
Required. This field contains a "challenge" that indicates the
authentication
scheme and parameters required.
Public.
Lists
the non-standard methods supported by this server.
Retry-after.
Included
with a response that has a status code of Service
Unavailable,
and indicates how long the service is expected to be unavailable.
Server.
Identifies
the software product used by the origin server to handle the
request.
WWW-authenticate.
Included
with a response that has a status code of Unauthorized.
This
field contains a challenge that indicates the authentication
scheme
and parameters required.
Entities
An
entity consists of an entity header and an entity body in a request or response
message.
An entity may represent a data resource, or it may constitute other information
supplied
with a request or response.
Entity
Header Fields
Entity
header fields provide optional information about the entity body or, if no
body
is present, about the resource identified by the request. The following fields
are
defined in HTTPl1.1:
Allow.
Lists
methods supported by the resource identified in the Request-
URI.
This field must be included with a response that has a status code of
Method
Not Allowed and may be included in other responses.
Content-encoding.
Indicates
what content encodings have been applied to the
resource.
The only encoding currently defined is zip compression.
Content-language.
Identifies
the natural language(s) of the intended audience
of
the enclosed entity.
Content-length.
The
size of the entity body in octets.
Content-MDS.
For
future study. MD5 refers to the MD5 hash code function,
described
in Lesson 18.
Content-range.
For
future study. The intent is that this designation will indicate
a
portion of the identified resource that is included in this response.
Content-type.
Indicates
the media type of the entity body.
Content-version.
A
version tag associated with an evolving entity.
Derived-from.
Indicates
the version tag of the resource from which this entity
was
derived before modifications were made by the sender. This field and the
Content-Version
field can be used to manage multiple updates by a group
of
users.
Expires.
Dateltime
after which the entity should be considered stale.
Last-modified.
Dateltime
that the sender believes the resource was last modified.
Link.
Defines
links to other resources.
Title.
A
textual title for the entity.
Transfer-encoding.
Indicates
what type of transformation has been applied to
the
message body to safely transfer it between the sender and the recipient.
The
only encoding defined in the standard is chunked. The chunked option
defines
a procedure for breaking an entity body into labeled chunks that are
transmitted
separately.
URI-header.
Informs
the recipient of other URIs by which the resource can
be
identified.
* Extension-header. Allows
additional fields to be defined without changing the
protocol,
but these fields cannot be assumed to be recognizable by the recipient.
Entity
Body
An
entity body consists of an arbitrary sequence of octets. HTTP is designed to be
able
to transfer any type of content, including text, binary data, audio, images,
and
video.
When an entity body is present in a message, the interpretation of the octets
in
the body is determined by the entity header fields Content-Encoding, Content-
Type,
and Transfer-Encoding. These define a three-layer, ordered encoding model:
entity-body
:= Transfer-Encoding(
Content-Encoding( Content-Type
(
data
1 1
1
The
data are the contents of a resource identified by a URI. The Content-
Type
field determines the way in which the data are interpreted. A Content-Encoding
may
be applied to the data and stored at the URI instead of the data. Finally,
on
transfer, a Transfer-Encoding may be applied to form the entity body of the
message.
Access Authentication
HTTPl1.1
defines a simple challenge-response technique for authentication. This
definition
does not restrict HTTP clients and servers from using other forms of
authentication,
but the current standard only covers this simple form.
Two
authentication exchanges are defined: one between a client and a server,
and
one between a client and a proxy. Both types of exchange use a
challengeresponse
mechanism.
The challenge, issued by a server or proxy, is of the form
challenge
= auth-scheme 1*SP
realm *(
","
auth-param
)
auth-scheme
= token
auth-param
= token "=" quoted-string
realm
= "realm"
"="
realm-value
realm-value
= quoted-string
Auth-scheme
is the name of a particular authentication scheme. The realm
defines
a particular protection space, which is simply a conceptual partition of
the
resource,
with its own authentication scheme and authorization database. For
example,
a resource may define several realms, one for end users and one for network
managers.
The latter realm may have more privileges and requires a more
powerful
authentication scheme.
In
response to an authentication challenge, a client must provide credentials.
These
are of the form
credentials
= basic-credentials
I
auth-scheme
*(","
auth-param
)
Basic
credentials are covered below. In the general case, the user would return
the
name of the authentication scheme and a set of parameters required to
authenticate
itself.
Client-Server
Authentication
A
user agent that wishes to authenticate itself with a server may do so by including
an
Authorization field in the request header; an agent may do this when initially
sending
the request. An alternative, which may be more common, is that a client
sends
a Request message without an Authorization field and is then required to
return
an authorization by the server. Figure 19.16 illustrates this scenario, which
involves
three steps:
The
client sends a request, such as a GET request to the server, with no
Authorization
field in the request header.
The
server returns a response with a status code in the Status line of Unauthorized
and
a WWW-Authenticate field in the response header. The
WWW-Authenticate
field consists of a challenge that indicates the type of
authentication
required and may include other parameters. No entity body is
returned.
The
client repeats the request but includes an Authorization field that contains
the
authorization data needed by the server.
If
authentication succeeds, the server returns a response with some other status
code
and without a WWW-Authenticate field. If authentication fails, the server
can
initiate a new authentication sequence by returning a response with a status of
Unauthorized
and a WWW-Authenticate field containing the (possibly new) challenge.
The
entity body should explain the reason for the refusal.
In
client-server authentication, any proxy or gateway must be transparent, as
far
as authentication is concerned. That is, the WWW-Authenticate and Authorization
fields
must be forwarded unmodified, and the response to a request containing
an
Authorization field must not be cached. This latter requirement dictates
that
the authentication always takes place between client and server and does not
simply
replay the server's prior acceptance of authentication.
Proxy
Authentication
A
proxy may be configured so that a client must first authenticate itself to the
proxy
before
being granted access to an origin server. The sequence is similar to that
described
for client-server authentication. In this case, the authentication information
is
carried in the Proxy-Authorization field in the request header. A client may
authenticate
itself when first issuing a Request message. Alternatively, a scenario
similar
to Figure 19.16 occurs:
1.
The
client sends a request, such as a GET request to the server, with no Proxy-
Authorization
field in the request header.
2.
The
proxy does not forward the request, but returns a response with a status
code
in the Status line of Proxy-Authentication Required and a Proxy-
Authenticate
field in the response header.
3.
The
client repeats the request but includes a Proxy-Authorization field that
contains
the authorization data needed by the proxy.
If
the request is authenticated, then the proxy may forward the request to a
server,
but will omit the Proxy-Authorization field. The proxy could also return a
cached
response.
Basic
Authentication Scheme
For
the basic authentication scheme, a user agent authenticates itself within a
particular
realm
by supplying a user ID and a password. This is the simplest form of
authentication,
comparable to logging on to a system. Within HTTP, there is no
provision
for protecting the user ID or password with encryption, so this method
provides
minimal security. The form of the credentials for basic authentication are
No comments:
Post a Comment
silahkan membaca dan berkomentar