7.1
DNS—The Domain Name System
Although programs theoretically
could refer to hosts, mailboxes, and other resources by their network (e.g.,
IP) addresses, these addresses are hard for people to remember. Also, sending
e-mail to tana@128.111.24.41 means that if Tana's ISP or organization moves the
mail server to a different machine with a different IP address, her e-mail
address has to change. Consequently, ASCII names were introduced to decouple
machine names from machine addresses. In this way, Tana's address might be
something like tana@art.ucsb.edu. Nevertheless, the network itself understands
only numerical addresses, so some mechanism is required to convert the ASCII
strings to network addresses. In the following sections we will study how this
mapping is accomplished in the Internet.
Way back in the ARPANET, there was
simply a file, hosts.txt, that listed all the hosts and their IP addresses.
Every night, all the hosts would fetch it from the site at which it was
maintained. For a network of a few hundred large timesharing machines, this
approach worked reasonably well.
However, when thousands of
minicomputers and PCs were connected to the net, everyone realized that this
approach could not continue to work forever. For one thing, the size of the
file would become too large. However, even more important, host name conflicts
would occur constantly unless names were centrally managed, something
unthinkable in a huge international network due to the load and latency. To
solve these problems, DNS (the Domain Name System) was invented.
The essence of DNS is the invention
of a hierarchical, domain-based naming scheme and a distributed database system
for implementing this naming scheme. It is primarily used for mapping host
names and e-mail destinations to IP addresses but can also be used for other
purposes. DNS is defined in RFCs 1034 and 1035.
Very briefly, the way DNS is used is
as follows. To map a name onto an IP address, an application program calls a
library procedure called the resolver, passing it the name as a parameter. We
saw an example of a resolver, gethostbyname, in Fig. 6-6. The resolver sends a UDP packet to a
local DNS server, which then looks up the name and returns the IP address to
the resolver, which then returns it to the caller. Armed with the IP address,
the program can then establish a TCP connection with the destination or send it
UDP packets.
Managing a large and constantly
changing set of names is a nontrivial problem. In the postal system, name management
is done by requiring letters to specify (implicitly or explicitly) the country,
state or province, city, and street address of the addressee. By using this
kind of hierarchical addressing, there is no confusion between the Marvin
Anderson on Main St. in White Plains, N.Y. and the Marvin Anderson on Main St.
in Austin, Texas. DNS works the same way.
Conceptually, the Internet is
divided into over 200 top-level domains, where each domain covers many hosts.
Each domain is partitioned into subdomains, and these are further partitioned,
and so on. All these domains can be represented by a tree, as shown in Fig. 7-1. The leaves of the tree represent
domains that have no subdomains (but do contain machines, of course). A leaf
domain may contain a single host, or it may represent a company and contain
thousands of hosts.
The top-level domains come in two
flavors: generic and countries. The original generic domains were com
(commercial), edu (educational institutions), gov (the U.S. Federal
Government), int (certain international organizations), mil (the U.S. armed
forces), net (network providers), and org (nonprofit organizations). The
country domains include one entry for every country, as defined in ISO 3166.
In November 2000, ICANN approved
four new, general-purpose, top-level domains, namely, biz (businesses), info
(information), name (people's names), and pro (professions, such as doctors and
lawyers). In addition, three more specialized top-level domains were introduced
at the request of certain industries. These are aero (aerospace industry), coop
(co-operatives), and museum (museums). Other top-level domains will be added in
the future.
As an aside, as the Internet becomes
more commercial, it also becomes more contentious. Take pro, for example. It
was intended for certified professionals. But who is a professional? And
certified by whom? Doctors and lawyers clearly are professionals. But what
about freelance photographers, piano teachers, magicians, plumbers, barbers,
exterminators, tattoo artists, mercenaries, and prostitutes? Are these
occupations professional and thus eligible for pro domains? And if so, who
certifies the individual practitioners?
In general, getting a second-level
domain, such as name-of-company.com, is easy. It merely requires going to a
registrar for the corresponding top-level domain (com in this case) to check if
the desired name is available and not somebody else's trademark. If there are
no problems, the requester pays a small annual fee and gets the name. By now,
virtually every common (English) word has been taken in the com domain. Try
household articles, animals, plants, body parts, etc. Nearly all are taken.
Each domain is named by the path
upward from it to the (unnamed) root. The components are separated by periods
(pronounced ''dot''). Thus, the engineering department at Sun Microsystems
might be eng.sun.com., rather than a UNIX-style name such as /com/sun/eng.
Notice that this hierarchical naming means that eng.sun.com. does not conflict
with a potential use of eng in eng.yale.edu., which might be used by the Yale
English department.
Domain names can be either absolute or
relative. An absolute domain name always ends with a period (e.g., eng.sun.com.),
whereas a relative one does not. Relative names have to be interpreted in some
context to uniquely determine their true meaning. In both cases, a named domain
refers to a specific node in the tree and all the nodes under it.
Domain names are case insensitive,
so edu, Edu, and EDU mean the same thing. Component names can be up to 63
characters long, and full path names must not exceed 255 characters.
In principle, domains can be
inserted into the tree in two different ways. For example, cs.yale.edu could
equally well be listed under the us country domain as cs.yale.ct.us. In
practice, however, most organizations in the United States are under a generic
domain, and most outside the United States are under the domain of their
country. There is no rule against registering under two top-level domains, but
few organizations except multinationals do it (e.g., sony.com and sony.nl).
Each domain controls how it
allocates the domains under it. For example, Japan has domains ac.jp and co.jp
that mirror edu and com. The Netherlands does not make this distinction and
puts all organizations directly under nl. Thus, all three of the following are
university computer science departments:
- cs.yale.edu (Yale University, in the United States)
- cs.vu.nl (Vrije Universiteit, in The Netherlands)
- cs.keio.ac.jp (Keio University, in Japan)
To create a new domain, permission
is required of the domain in which it will be included. For example, if a VLSI
group is started at Yale and wants to be known as vlsi.cs.yale.edu, it has to
get permission from whoever manages cs.yale.edu. Similarly, if a new university
is chartered, say, the University of Northern South Dakota, it must ask the
manager of the edu domain to assign it unsd.edu. In this way, name conflicts
are avoided and each domain can keep track of all its subdomains. Once a new
domain has been created and registered, it can create subdomains, such as cs.unsd.edu,
without getting permission from anybody higher up the tree.
Naming follows organizational
boundaries, not physical networks. For example, if the computer science and
electrical engineering departments are located in the same building and share
the same LAN, they can nevertheless have distinct domains. Similarly, even if
computer science is split over Babbage Hall and Turing Hall, the hosts in both
buildings will normally belong to the same domain.
Every domain, whether it is a single
host or a top-level domain, can have a set of resource records associated with
it. For a single host, the most common resource record is just its IP address,
but many other kinds of resource records also exist. When a resolver gives a
domain name to DNS, what it gets back are the resource records associated with
that name. Thus, the primary function of DNS is to map domain names onto
resource records.
A resource record is a five-tuple.
Although they are encoded in binary for efficiency, in most expositions,
resource records are presented as ASCII text, one line per resource record. The
format we will use is as follows:
Domain_name Time_to_live
Class Type Value
The Domain_name tells the domain to
which this record applies. Normally, many records exist for each domain and
each copy of the database holds information about multiple domains. This field
is thus the primary search key used to satisfy queries. The order of the
records in the database is not significant.
The Time_to_live field gives an indication
of how stable the record is. Information that is highly stable is assigned a
large value, such as 86400 (the number of seconds in 1 day). Information that
is highly volatile is assigned a small value, such as 60 (1 minute). We will
come back to this point later when we have discussed caching.
The third field of every resource
record is the Class. For Internet information, it is always IN. For
non-Internet information, other codes can be used, but in practice, these are
rarely seen.
The Type field tells what kind of
record this is. The most important types are listed in Fig. 7-2.
An SOA record provides the name of
the primary source of information about the name server's zone (described
below), the e-mail address of its administrator, a unique serial number, and
various flags and timeouts.
The most important record type is
the A (Address) record. It holds a 32-bit IP address for some host. Every
Internet host must have at least one IP address so that other machines can
communicate with it. Some hosts have two or more network connections, in which
case they will have one type A resource record per network connection (and thus
per IP address). DNS can be configured to cycle through these, returning the
first record on the first request, the second record on the second request, and
so on.
The next most important record type
is the MX record. It specifies the name of the host prepared to accept e-mail
for the specified domain. It is used because not every machine is prepared to
accept e-mail. If someone wants to send e-mail to, for example, bill@microsoft.com,
the sending host needs to find a mail server at microsoft.com that is willing
to accept e-mail. The MX record can provide this information.
The NS records specify name servers.
For example, every DNS database normally has an NS record for each of the
top-level domains, so, for example, e-mail can be sent to distant parts of the
naming tree. We will come back to this point later.
CNAME records allow aliases to be
created. For example, a person familiar with Internet naming in general and
wanting to send a message to someone whose login name is paul in the computer
science department at M.I.T. might guess that paul@cs.mit.edu will work.
Actually, this address will not work, because the domain for M.I.T.'s computer
science department is lcs.mit.edu. However, as a service to people who do not
know this, M.I.T. could create a CNAME entry to point people and programs in
the right direction. An entry like this one might do the job:
cs.mit.edu 86400
IN CNAME lcs.mit.edu
Like CNAME, PTR points to another
name. However, unlike CNAME, which is really just a macro definition, PTR is a
regular DNS datatype whose interpretation depends on the context in which it is
found. In practice, it is nearly always used to associate a name with an IP
address to allow lookups of the IP address and return the name of the
corresponding machine. These are called reverse lookups.
HINFO records allow people to find
out what kind of machine and operating system a domain corresponds to. Finally,
TXT records allow domains to identify themselves in arbitrary ways. Both of
these record types are for user convenience. Neither is required, so programs
cannot count on getting them (and probably cannot deal with them if they do get
them).
Finally, we have the Value field.
This field can be a number, a domain name, or an ASCII string. The semantics
depend on the record type. A short description of the Value fields for each of
the principal record types is given in Fig. 7-2.
For an example of the kind of
information one might find in the DNS database of a domain, see Fig. 7-3. This figure depicts part of a
(semihypothetical) database for the cs.vu.nl domain shown in Fig. 7-1. The database contains seven types of
resource records.
The first noncomment line of Fig. 7-3 gives some basic information about the
domain, which will not concern us further. The next two lines give textual
information about where the domain is located. Then come two entries giving the
first and second places to try to deliver e-mail sent to person@cs.vu.nl. The zephyr
(a specific machine) should be tried first. If that fails, the top should be
tried as the next choice.
After the blank line, added for
readability, come lines telling that the flits is a Sun workstation running
UNIX and giving both of its IP addresses. Then three choices are given for
handling e-mail sent to flits.cs.vu.nl. First choice is naturally the flits
itself, but if it is down, the zephyr and top are the second and third choices.
Next comes an alias, www.cs.vu.nl, so that this address can be used without
designating a specific machine. Creating this alias allows cs.vu.nl to change
its World Wide Web server without invalidating the address people use to get to
it. A similar argument holds for ftp.cs.vu.nl.
The next four lines contain a
typical entry for a workstation, in this case, rowboat.cs.vu.nl. The
information provided contains the IP address, the primary and secondary mail
drops, and information about the machine. Then comes an entry for a non-UNIX
system that is not capable of receiving mail itself, followed by an entry for a
laser printer that is connected to the Internet.
What are not shown (and are not in
this file) are the IP addresses used to look up the top-level domains. These
are needed to look up distant hosts, but since they are not part of the cs.vu.nl
domain, they are not in this file. They are supplied by the root servers, whose
IP addresses are present in a system configuration file and loaded into the DNS
cache when the DNS server is booted. There are about a dozen root servers
spread around the world, and each one knows the IP addresses of all the
top-level domain servers. Thus, if a machine knows the IP address of at least
one root server, it can look up any DNS name.
In theory at least, a single name
server could contain the entire DNS database and respond to all queries about
it. In practice, this server would be so overloaded as to be useless.
Furthermore, if it ever went down, the entire Internet would be crippled.
To avoid the problems associated
with having only a single source of information, the DNS name space is divided
into nonoverlapping zones. One possible way to divide the name space of Fig. 7-1 is shown in Fig. 7-4. Each zone contains some part of the
tree and also contains name servers holding the information about that zone.
Normally, a zone will have one primary name server, which gets its information
from a file on its disk, and one or more secondary name servers, which get
their information from the primary name server. To improve reliability, some
servers for a zone can be located outside the zone.
Where the zone boundaries are placed
within a zone is up to that zone's administrator. This decision is made in
large part based on how many name servers are desired, and where. For example,
in Fig. 7-4, Yale has a server for yale.edu that
handles eng.yale.edu but not cs.yale.edu, which is a separate zone with its own
name servers. Such a decision might be made when a department such as English
does not wish to run its own name server, but a department such as computer
science does. Consequently, cs.yale.edu is a separate zone but eng.yale.edu is
not.
When a resolver has a query about a
domain name, it passes the query to one of the local name servers. If the
domain being sought falls under the jurisdiction of the name server, such as ai.cs.yale.edu
falling under cs.yale.edu, it returns the authoritative resource records. An authoritative
record is one that comes from the authority that manages the record and is thus
always correct. Authoritative records are in contrast to cached records, which
may be out of date.
If, however, the domain is remote
and no information about the requested domain is available locally, the name
server sends a query message to the top-level name server for the domain
requested. To make this process clearer, consider the example of Fig. 7-5. Here, a resolver on flits.cs.vu.nl
wants to know the IP address of the host linda.cs.yale.edu. In step 1, it sends
a query to the local name server, cs.vu.nl. This query contains the domain name
sought, the type (A) and the class (IN).
Let us suppose the local name server
has never had a query for this domain before and knows nothing about it. It may
ask a few other nearby name servers, but if none of them know, it sends a UDP
packet to the server for edu given in its database (see Fig. 7-5), edu-server.net. It is unlikely that
this server knows the address of linda.cs.yale.edu, and probably does not know cs.yale.edu
either, but it must know all of its own children, so it forwards the request to
the name server for yale.edu (step 3). In turn, this one forwards the request
to cs.yale.edu (step 4), which must have the authoritative resource records.
Since each request is from a client to a server, the resource record requested
works its way back in steps 5 through 8.
Once these records get back to the cs.vu.nl
name server, they will be entered into a cache there, in case they are needed
later. However, this information is not authoritative, since changes made at cs.yale.edu
will not be propagated to all the caches in the world that may know about it.
For this reason, cache entries should not live too long. This is the reason
that the Time_to_live field is included in each resource record. It tells
remote name servers how long to cache records. If a certain machine has had the
same IP address for years, it may be safe to cache that information for 1 day.
For more volatile information, it might be safer to purge the records after a
few seconds or a minute.
It is worth mentioning that the
query method described here is known as a recursive query, since each server
that does not have the requested information goes and finds it somewhere, then
reports back. An alternative form is also possible. In this form, when a query
cannot be satisfied locally, the query fails, but the name of the next server
along the line to try is returned. Some servers do not implement recursive
queries and always return the name of the next server to try.
It is also worth pointing out that
when a DNS client fails to get a response before its timer goes off, it
normally will try another server next time. The assumption here is that the
server is probably down, rather than that the request or reply got lost.
While DNS
is extremely important to the correct functioning of the Internet, all it
really does is map symbolic names for machines onto their IP addresses. It does
not help locate people, resources, services, or objects in general. For
locating these things, another directory service has been defined, called LDAP
(Lightweight Directory Access Protocol). It is a simplified version of the OSI
X.500 directory service and is described in RFC 2251. It organizes information
as a tree and allows searches on different components. It can be regarded as a
''white pages'' telephone book. We will not discuss it further in this book,
but for more information see (Weltman and Dahbura, 2000).
No comments:
Post a Comment
silahkan membaca dan berkomentar