5.6.2
IP Addresses
Every host and router on the
Internet has an IP address, which encodes its network number and host number.
The combination is unique: in principle, no two machines on the Internet have
the same IP address. All IP addresses are 32 bits long and are used in the Source
address and Destination address fields of IP packets. It is important to note
that an IP address does not actually refer to a host. It really refers to a
network interface, so if a host is on two networks, it must have two IP
addresses. However, in practice, most hosts are on one network and thus have
one IP address.
For several decades, IP addresses
were divided into the five categories listed in Fig. 5-55. This allocation has come to be called classful
addressing.Itisno longer used, but references to it in the literature are still
common. We will discuss the replacement of classful addressing shortly.
The class A, B, C, and D formats
allow for up to 128 networks with 16 million hosts each, 16,384 networks with
up to 64K hosts, and 2 million networks (e.g., LANs) with up to 256 hosts each
(although a few of these are special). Also supported is multicast, in which a
datagram is directed to multiple hosts. Addresses beginning with 1111 are
reserved for future use. Over 500,000 networks are now connected to the
Internet, and the number grows every year. Network numbers are managed by a
nonprofit corporation called ICANN (Internet Corporation for Assigned Names and
Numbers) to avoid conflicts. In turn, ICANN has delegated parts of the address
space to various regional authorities, which then dole out IP addresses to ISPs
and other companies.
Network addresses, which are 32-bit
numbers, are usually written in dotted decimal notation. In this format, each
of the 4 bytes is written in decimal, from 0 to 255. For example, the 32-bit
hexadecimal address C0290614 is written as 192.41.6.20. The lowest IP address
is 0.0.0.0 and the highest is 255.255.255.255.
The values 0 and -1 (all 1s) have
special meanings, as shown in Fig. 5-56. The value 0 means this network or this
host. The value of -1 is used as a broadcast address to mean all hosts on the
indicated network.
The IP address 0.0.0.0 is used by
hosts when they are being booted. IP addresses with 0 as network number refer
to the current network. These addresses allow machines to refer to their own
network without knowing its number (but they have to know its class to know how
many 0s to include). The address consisting of all 1s allows broadcasting on
the local network, typically a LAN. The addresses with a proper network number
and all 1s in the host field allow machines to send broadcast packets to distant
LANs anywhere in the Internet (although many network administrators disable
this feature). Finally, all addresses of the form 127.xx.yy.zz are reserved for
loopback testing. Packets sent to that address are not put out onto the wire;
they are processed locally and treated as incoming packets. This allows packets
to be sent to the local network without the sender knowing its number.
As we have seen, all the hosts in a
network must have the same network number. This property of IP addressing can
cause problems as networks grow. For example, consider a university that
started out with one class B network used by the Computer Science Dept. for the
computers on its Ethernet. A year later, the Electrical Engineering Dept.
wanted to get on the Internet, so they bought a repeater to extend the CS
Ethernet to their building. As time went on, many other departments acquired
computers and the limit of four repeaters per Ethernet was quickly reached. A
different organization was required.
Getting a second network address
would be hard to do since network addresses are scarce and the university
already had enough addresses for over 60,000 hosts. The problem is the rule
that a single class A, B, or C address refers to one network, not to a
collection of LANs. As more and more organizations ran into this situation, a
small change was made to the addressing system to deal with it.
The solution is to allow a network
to be split into several parts for internal use but still act like a single
network to the outside world. A typical campus network nowadays might look like
that of Fig. 5-57, with a main router connected to an ISP
or regional network and numerous Ethernets spread around campus in different
departments. Each of the Ethernets has its own router connected to the main
router (possibly via a backbone LAN, but the nature of the interrouter
connection is not relevant here).
In the Internet literature, the
parts of the network (in this case, Ethernets) are called subnets. this usage
conflicts with ''subnet'' to mean the set of all routers and communication
lines in a network. Hopefully, it will be clear from the context which meaning
is intended. In this section and the next one, the new definition will be the
one used exclusively.
When a packet comes into the main
router, how does it know which subnet (Ethernet) to give it to? One way would
be to have a table with 65,536 entries in the main router telling which router
to use for each host on campus. This idea would work, but it would require a
very large table in the main router and a lot of manual maintenance as hosts
were added, moved, or taken out of service.
Instead, a different scheme was
invented. Basically, instead of having a single class B address with 14 bits
for the network number and 16 bits for the host number, some bits are taken
away from the host number to create a subnet number. For example, if the
university has 35 departments, it could use a 6-bit subnet number and a 10-bit
host number, allowing for up to 64 Ethernets, each with a maximum of 1022 hosts
(0 and -1 are not available, as mentioned earlier). This split could be changed
later if it turns out to be the wrong one.
To implement subnetting, the main
router needs a subnet mask that indicates the split between network + subnet
number and host, as shown in Fig. 5-58. Subnet masks are also written in
dotted decimal notation, with the addition of a slash followed by the number of
bits in the network + subnet part. For the example of Fig. 5-58, the subnet mask can be written as
255.255.252.0. An alternative notation is /22 to indicate that the subnet mask
is 22 bits long.
Outside the network, the subnetting
is not visible, so allocating a new subnet does not require contacting ICANN or
changing any external databases. In this example, the first subnet might use IP
addresses starting at 130.50.4.1; the second subnet might start at 130.50.8.1;
the third subnet might start at 130.50.12.1; and so on. To see why the subnets
are counting by fours, note that the corresponding binary addresses are as
follows:
Subnet
1: 10000010 00110010 000001|00
00000001
Subnet
2: 10000010 00110010 000010|00
00000001
Subnet
3: 10000010 00110010 000011|00
00000001
Here the vertical bar (|) shows the
boundary between the subnet number and the host number. To its left is the
6-bit subnet number; to its right is the 10-bit host number.
To see how subnets work, it is
necessary to explain how IP packets are processed at a router. Each router has
a table listing some number of (network, 0) IP addresses and some number of
(this-network, host) IP addresses. The first kind tells how to get to distant
networks. The second kind tells how to get to local hosts. Associated with each
table is the network interface to use to reach the destination, and certain
other information.
When an IP packet arrives, its
destination address is looked up in the routing table. If the packet is for a
distant network, it is forwarded to the next router on the interface given in
the table. If it is a local host (e.g., on the router's LAN), it is sent
directly to the destination. If the network is not present, the packet is
forwarded to a default router with more extensive tables. This algorithm means
that each router only has to keep track of other networks and local hosts, not
(network, host) pairs, greatly reducing the size of the routing table.
When subnetting is introduced, the
routing tables are changed, adding entries of the form (this-network, subnet,
0) and (this-network, this-subnet, host). Thus, a router on subnet k knows how
to get to all the other subnets and also how to get to all the hosts on subnet k.
It does not have to know the details about hosts on other subnets. In fact, all
that needs to be changed is to have each router do a Boolean AND with the network's
subnet mask to get rid of the host number and look up the resulting address in
its tables (after determining which network class it is). For example, a packet
addressed to 130.50.15.6 and arriving at the main router is ANDed with the
subnet mask 255.255.252.0/22 to give the address 130.50.12.0. This address is
looked up in the routing tables to find out which output line to use to get to
the router for subnet 3. Subnetting thus reduces router table space by creating
a three-level hierarchy consisting of network, subnet, and host.
IP has been in heavy use for
decades. It has worked extremely well, as demonstrated by the exponential
growth of the Internet. Unfortunately, IP is rapidly becoming a victim of its
own popularity: it is running out of addresses. This looming disaster has
sparked a great deal of discussion and controversy within the Internet
community about what to do about it. In this section we will describe both the
problem and several proposed solutions.
Back in 1987, a few visionaries
predicted that some day the Internet might grow to 100,000 networks. Most
experts pooh-poohed this as being decades in the future, if ever. The 100,000th
network was connected in 1996. The problem, as mentioned above, is that the
Internet is rapidly running out of IP addresses. In principle, over 2 billion
addresses exist, but the practice of organizing the address space by classes
(see Fig. 5-55) wastes millions of them. In
particular, the real villain is the class B network. For most organizations, a
class A network, with 16 million addresses is too big, and a class C network,
with 256 addresses is too small. A class B network, with 65,536, is just right.
In Internet folklore, this situation is known as the three bears problem (as in
Goldilocks and the Three Bears).
In reality, a class B address is far
too large for most organizations. Studies have shown that more than half of all
class B networks have fewer than 50 hosts. A class C network would have done
the job, but no doubt every organization that asked for a class B address
thought that one day it would outgrow the 8-bit host field. In retrospect, it
might have been better to have had class C networks use 10 bits instead of
eight for the host number, allowing 1022 hosts per network. Had this been the
case, most organizations would have probably settled for a class C network, and
there would have been half a million of them (versus only 16,384 class B
networks).
It is hard to fault the Internet
designers for not having provided more (and smaller) class B addresses. At the
time the decision was made to create the three classes, the Internet was a
research network connecting the major research universities in the U.S. (plus a
very small number of companies and military sites doing networking research).
No one then perceived the Internet as becoming a mass market communication
system rivaling the telephone network. At the time, someone no doubt said:
''The U.S. has about 2000 colleges and universities. Even if all of them
connect to the Internet and many universities in other countries join, too, we
are never going to hit 16,000 since there are not that many universities in the
whole world. Furthermore, having the host number be an integral number of bytes
speeds up packet processing.''
However, if the split had allocated
20 bits to the class B network number, another problem would have emerged: the
routing table explosion. From the point of view of the routers, the IP address
space is a two-level hierarchy, with network numbers and host numbers. Routers
do not have to know about all the hosts, but they do have to know about all the
networks. If half a million class C networks were in use, every router in the
entire Internet would need a table with half a million entries, one per
network, telling which line to use to get to that network, as well as providing
other information.
The actual physical storage of half
a million entry tables is probably doable, although expensive for critical
routers that keep the tables in static RAM on I/O boards. A more serious
problem is that the complexity of various algorithms relating to management of
the tables grows faster than linear. Worse yet, much of the existing router
software and firmware was designed at a time when the Internet had 1000
connected networks and 10,000 networks seemed decades away. Design choices made
then often are far from optimal now.
In addition, various routing
algorithms require each router to transmit its tables periodically (e.g.,
distance vector protocols). The larger the tables, the more likely it is that
some parts will get lost underway, leading to incomplete data at the other end
and possibly routing instabilities.
The routing table problem could have
been solved by going to a deeper hierarchy. For example, having each IP address
contain a country, state/province, city, network, and host field might work.
Then each router would only need to know how to get to each country, the states
or provinces in its own country, the cities in its state or province, and the
networks in its city. Unfortunately, this solution would require considerably
more than 32 bits for IP addresses and would use addresses inefficiently
(Liechtenstein would have as many bits as the United States).
In short, some solutions solve one
problem but create a new one. The solution that was implemented and that gave
the Internet a bit of extra breathing room is CIDR (Classless InterDomain
Routing). The basic idea behind CIDR, which is described in RFC 1519, is to
allocate the remaining IP addresses in variable-sized blocks, without regard to
the classes. If a site needs, say, 2000 addresses, it is given a block of 2048
addresses on a 2048-byte boundary.
Dropping the classes makes
forwarding more complicated. In the old classful system, forwarding worked like
this. When a packet arrived at a router, a copy of the IP address was shifted
right 28 bits to yield a 4-bit class number. A 16-way branch then sorted
packets into A, B, C, and D (if supported), with eight of the cases for class
A, four of the cases for class B, two of the cases for class C, and one each
for D and E. The code for each class then masked off the 8-, 16-, or 24-bit
network number and right aligned it in a 32-bit word. The network number was
then looked up in the A, B, or C table, usually by indexing for A and B
networks and hashing for C networks. Once the entry was found, the outgoing
line could be looked up and the packet forwarded.
With CIDR, this simple algorithm no
longer works. Instead, each routing table entry is extended by giving it a
32-bit mask. Thus, there is now a single routing table for all networks
consisting of an array of (IP address, subnet mask, outgoing line) triples.
When a packet comes in, its destination IP address is first extracted. Then
(conceptually) the routing table is scanned entry by entry, masking the
destination address and comparing it to the table entry looking for a match. It
is possible that multiple entries (with different subnet mask lengths) match,
in which case the longest mask is used. Thus, if there is a match for a /20
mask and a /24 mask, the /24 entry is used.
Complex algorithms have been devised
to speed up the address matching process (Ruiz-Sanchez et al., 2001).
Commercial routers use custom VLSI chips with these algorithms embedded in
hardware.
To make the forwarding algorithm easier
to understand, let us consider an example in which millions of addresses are
available starting at 194.24.0.0. Suppose that Cambridge University needs 2048
addresses and is assigned the addresses 194.24.0.0 through 194.24.7.255, along
with mask 255.255.248.0. Next, Oxford University asks for 4096 addresses. Since
a block of 4096 addresses must lie on a 4096-byte boundary, they cannot be
given addresses starting at 194.24.8.0. Instead, they get 194.24.16.0 through
194.24.31.255 along with subnet mask 255.255.240.0. Now the University of
Edinburgh asks for 1024 addresses and is assigned addresses 194.24.8.0 through
194.24.11.255 and mask 255.255.252.0. These assignments are summarized in Fig. 5-59.
The routing tables all over the
world are now updated with the three assigned entries. Each entry contains a
base address and a subnet mask. These entries (in binary) are:
Address Mask
C:
11000010 00011000 00000000 00000000 11111111 11111111 11111000 00000000
E:
11000010 00011000 00001000 00000000 11111111 11111111 11111100 00000000
O:
11000010 00011000 00010000 00000000 11111111 11111111 11110000 00000000
Now consider what happens when a packet
comes in addressed to 194.24.17.4, which in binary is represented as the
following 32-bit string
11000010
00011000 00010001 00000100
First it is Boolean ANDed with the
Cambridge mask to get
11000010
00011000 00010000 00000000
This value does not match the
Cambridge base address, so the original address is next ANDed with the
Edinburgh mask to get
11000010
00011000 00010000 00000000
This value does not match the
Edinburgh base address, so Oxford is tried next, yielding
11000010
00011000 00010000 00000000
This value does match the Oxford
base. If no longer matches are found farther down the table, the Oxford entry
is used and the packet is sent along the line named in it.
Now let us look at these three
universities from the point of view of a router in Omaha, Nebraska, that has
only four outgoing lines: Minneapolis, New York, Dallas, and Denver. When the
router software there gets the three new entries, it notices that it can
combine all three entries into a single aggregate entry 194.24.0.0/19 with a
binary address and submask as follows:
11000010
0000000 00000000 00000000 11111111 11111111 11100000 00000000
This entry sends all packets
destined for any of the three universities to New York. By aggregating the
three entries, the Omaha router has reduced its table size by two entries.
If New York has a single line to
London for all U.K. traffic, it can use an aggregated entry as well. However,
if it has separate lines for London and Edinburgh, then it has to have three
separate entries. Aggregation is heavily used throughout the Internet to reduce
the size of the router tables.
As a final note on this example, the
aggregate route entry in Omaha also sends packets for the unassigned addresses
to New York. As long as the addresses are truly unassigned, this does not
matter because they are not supposed to occur. However, if they are later
assigned to a company in California, an additional entry, 194.24.12.0/22, will
be needed to deal with them.
IP addresses are scarce. An ISP
might have a /16 (formerly class B) address, giving it 65,534 host numbers. If
it has more customers than that, it has a problem. For home customers with
dial-up connections, one way around the problem is to dynamically assign an IP
address to a computer when it calls up and logs in and take the IP address back
when the session ends. In this way, a single /16 address can handle up to
65,534 active users, which is probably good enough for an ISP with several
hundred thousand customers. When the session is terminated, the IP address is
reassigned to another caller. While this strategy works well for an ISP with a
moderate number of home users, it fails for ISPs that primarily serve business
customers.
The problem is that business
customers expect to be on-line continuously during business hours. Both small
businesses, such as three-person travel agencies, and large corporations have
multiple computers connected by a LAN. Some computers are employee PCs; others
may be Web servers. Generally, there is a router on the LAN that is connected
to the ISP by a leased line to provide continuous connectivity. This
arrangement means that each computer must have its own IP address all day long.
In effect, the total number of computers owned by all its business customers
combined cannot exceed the number of IP addresses the ISP has. For a /16
address, this limits the total number of computers to 65,534. For an ISP with
tens of thousands of business customers, this limit will quickly be exceeded.
To make matters worse, more and more
home users are subscribing to ADSL or Internet over cable. Two of the features
of these services are (1) the user gets a permanent IP address and (2) there is
no connect charge (just a monthly flat rate charge), so many ADSL and cable
users just stay logged in permanently. This development just adds to the
shortage of IP addresses. Assigning IP addresses on-the-fly as is done with
dial-up users is of no use because the number of IP addresses in use at any one
instant may be many times the number the ISP owns.
And just to make it a bit more
complicated, many ADSL and cable users have two or more computers at home,
often one for each family member, and they all want to be on-line all the time
using the single IP address their ISP has given them. The solution here is to
connect all the PCs via a LAN and put a router on it. From the ISP's point of
view, the family is now the same as a small business with a handful of
computers. Welcome to Jones, Inc.
The problem of running out of IP
addresses is not a theoretical problem that might occur at some point in the
distant future. It is happening right here and right now. The long-term
solution is for the whole Internet to migrate to IPv6, which has 128-bit
addresses. This transition is slowly occurring, but it will be years before the
process is complete. As a consequence, some people felt that a quick fix was
needed for the short term. This quick fix came in the form of NAT (Network
Address Translation), which is described in RFC 3022 and which we will
summarize below. For additional information, see (Dutcher, 2001).
The basic idea behind NAT is to
assign each company a single IP address (or at most, a small number of them)
for Internet traffic. Within the company, every computer gets a unique IP
address, which is used for routing intramural traffic. However, when a packet
exits the company and goes to the ISP, an address translation takes place. To
make this scheme possible, three ranges of IP addresses have been declared as
private. Companies may use them internally as they wish. The only rule is that
no packets containing these addresses may appear on the Internet itself. The
three reserved ranges are:
10.0.0.0 – 10.255.255.255/8 (16,777,216 hosts)
172.16.0.0 – 172.31.255.255/12 (1,048,576 hosts)
192.168.0.0
– 192.168.255.255/16 (65,536 hosts)
The first range provides for
16,777,216 addresses (except for 0 and -1, as usual) and is the usual choice of
most companies, even if they do not need so many addresses.
The operation of NAT is shown in Fig. 5-60. Within the company premises, every
machine has a unique address of the form 10.x.y.z. However, when a packet
leaves the company premises, it passes through a NAT box that converts the
internal IP source address, 10.0.0.1 in the figure, to the company's true IP
address, 198.60.42.12 in this example. The NAT box is often combined in a
single device with a firewall, which provides security by carefully controlling
what goes into the company and what comes out. It is also possible to integrate
the NAT box into the company's router.
So far we have glossed over one tiny
little detail: when the reply comes back (e.g., from a Web server), it is
naturally addressed to 198.60.42.12, so how does the NAT box know which address
to replace it with? Herein lies the problem with NAT. If there were a spare
field in the IP header, that field could be used to keep track of who the real
sender was, but only 1 bit is still unused. In principle, a new option could be
created to hold the true source address, but doing so would require changing
the IP code on all the machines on the entire Internet to handle the new
option. This is not a promising alternative for a quick fix.
What actually happened is as
follows. The NAT designers observed that most IP packets carry either TCP or
UDP payloads., we will see that both of these have headers containing a source
port and a destination port. Below we will just discuss TCP ports, but exactly
the same story holds for UDP ports. The ports are 16-bit integers that indicate
where the TCP connection begins and ends. These ports provide the field needed
to make NAT work.
When a process wants to establish a
TCP connection with a remote process, it attaches itself to an unused TCP port
on its own machine. This is called the source port and tells the TCP code where
to send incoming packets belonging to this connection. The process also
supplies a destination port to tell who to give the packets to on the remote
side. Ports 0–1023 are reserved for well-known services. For example, port 80
is the port used by Web servers, so remote clients can locate them. Each
outgoing TCP message contains both a source port and a destination port.
Together, these ports serve to identify the processes using the connection on
both ends.
An analogy may make the use of ports
clearer. Imagine a company with a single main telephone number. When people
call the main number, they reach an operator who asks which extension they want
and then puts them through to that extension. The main number is analogous to
the company's IP address and the extensions on both ends are analogous to the
ports. Ports are an extra 16-bits of addressing that identify which process
gets which incoming packet.
Using the Source port field, we can
solve our mapping problem. Whenever an outgoing packet enters the NAT box, the
10.x.y.z source address is replaced by the company's true IP address. In
addition, the TCP Source port field is replaced by an index into the NAT box's
65,536-entry translation table. This table entry contains the original IP
address and the original source port. Finally, both the IP and TCP header
checksums are recomputed and inserted into the packet. It is necessary to
replace the Source port because connections from machines 10.0.0.1 and 10.0.0.2
may both happen to use port 5000, for example, so the Source port alone is not
enough to identify the sending process.
When a packet arrives at the NAT box
from the ISP, the Source port in the TCP header is extracted and used as an
index into the NAT box's mapping table. From the entry located, the internal IP
address and original TCP Source port are extracted and inserted into the
packet. Then both the IP and TCP checksums are recomputed and inserted into the
packet. The packet is then passed to the company router for normal delivery
using the 10.x.y.z address.
NAT can also be used to alleviate
the IP shortage for ADSL and cable users. When the ISP assigns each user an
address, it uses 10.x.y.z addresses. When packets from user machines exit the
ISP and enter the main Internet, they pass through a NAT box that translates
them to the ISP's true Internet address. On the way back, packets undergo the
reverse mapping. In this respect, to the rest of the Internet, the ISP and its
home ADSL/cable users just looks like a big company.
Although this scheme sort of solves
the problem, many people in the IP community regard it as an
abomination-on-the-face-of-the-earth. Briefly summarized, here are some of the
objections. First, NAT violates the architectural model of IP, which states
that every IP address uniquely identifies a single machine worldwide. The whole
software structure of the Internet is built on this fact. With NAT, thousands
of machines may (and do) use address 10.0.0.1.
Second, NAT changes the Internet
from a connectionless network to a kind of connection-oriented network. The
problem is that the NAT box must maintain information (the mapping) for each
connection passing through it. Having the network maintain connection state is
a property of connection-oriented networks, not connectionless ones. If the NAT
box crashes and its mapping table is lost, all its TCP connections are
destroyed. In the absence of NAT, router crashes have no effect on TCP. The
sending process just times out within a few seconds and retransmits all
unacknowledged packets. With NAT, the Internet becomes as vulnerable as a
circuit-switched network.
Third, NAT violates the most
fundamental rule of protocol layering: layer k may not make any assumptions
about what layer k + 1 has put into the payload field. This basic principle is
there to keep the layers independent. If TCP is later upgraded to TCP-2, with a
different header layout (e.g., 32-bit ports), NAT will fail. The whole idea of
layered protocols is to ensure that changes in one layer do not require changes
in other layers. NAT destroys this independence.
Fourth, processes on the Internet
are not required to use TCP or UDP. If a user on machine A decides to use some
new transport protocol to talk to a user on machine B (for example, for a
multimedia application), introduction of a NAT box will cause the application
to fail because the NAT box will not be able to locate the TCP Source port
correctly.
Fifth, some applications insert IP
addresses in the body of the text. The receiver then extracts these addresses
and uses them. Since NAT knows nothing about these addresses, it cannot replace
them, so any attempt to use them on the remote side will fail. FTP, the
standard File Transfer Protocol works this way and can fail in the presence of
NAT unless special precautions are taken. Similarly, the H.323 Internet
telephony protocol has this property and can fail in the presence of NAT. It
may be possible to patch NAT to work with H.323, but having to patch the code
in the NAT box every time a new application comes along is not a good idea.
Sixth, since the TCP Source port
field is 16 bits, at most 65,536 machines can be mapped onto an IP address. Actually,
the number is slightly less because the first 4096 ports are reserved for
special uses. However, if multiple IP addresses are available, each one can
handle up to 61,440 machines.
These and other problems with NAT
are discussed in RFC 2993. In general, the opponents of NAT say that by fixing
the problem of insufficient IP addresses with a temporary and ugly hack, the
pressure to implement the real solution, that is, the transition to IPv6, is
reduced, and this is a bad thing.
No comments:
Post a Comment
silahkan membaca dan berkomentar