Transport
Protocols
he
transport protocol is the keystone of the whole concept of a computercommunications
architecture.
Lower-layer protocols are needed, to be sure,
but
they are less important for 1) pedagogical purposes, and 2) designing
purposes.
For one thing, lower-level protocols are better understood and, on the
whole,
less complex than transport protocols. Also, standards have settled out quite
well
for most kinds of layer 1 to 3 transmission facilities, and there
is a large body
of
experience behind their use.
Viewed
from the other side, upper-level protocols depend heavily on the
transport
protocol. The transport protocol provides the basic end-to-end service of
transferring
data between users and relieves applications and other upper-layer
protocols
from the need to deal with the characteristics of intervening communications
networks
and services.
We
begin by looking at the services that one might expect from a transport
protocol.
Next, we examine the protocol mechanisms required to provide these services.
We
find that most of the complexity relates to connection-oriented services.
As
might be expected, the less the network service provides, the more the
transport
protocol
must do. The remainder of the lesson looks at two widely used transport
protocols:
transmission control protocol (TCP) and user datagram protocol (UDP).
Figure
17.1 highlights the position of these protocols within the TCP/IP protocol
suite.
TRANSPORT
SERVICES
We
begin by looking at the kinds of services that a transport protocol can or
should
provide
to higher-level protocols. Figure 17.2 places the concept of transport services
in
context. In a system, there is a transport entity that provides services to TS
users,'
which might be an application process or a session-protocol entity. This local
transport
entity communicates with some remote-transport entity, using the services
of
some lower layer, such as the network layer.
We
have already mentioned that the general service provided by a transport
protocol
is the end-to-end transport of data in a way that shields the TS user from
the
details of the underlying communications systems. To be more specific, we must
consider
the specific services that a transport protocol can provide. The following
categories
of service are useful for describing the transport service:
Transport
entity
Type
of service
Quality
of service
Data
transfer
User
interface
Connection
management
Expedited
delivery
Status
reporting
Security
Type
of Service
Two
basic types of service are possible: connection-oriented and connectionless, or
datagram
service. A connection-oriented service provides for the establishment,
maintenance,
and termination of a logical connection between TS users. This has,
so
far, been the most common type of protocol service available and has a wide
variety
of
applications. The connection-oriented service generally implies that the
service
is
reliable.
The
strengths of the connection-oriented approach are clear. It allows for
connection-
related
features such as flow control, error control, and sequenced delivery.
Connectionless
service, however, is more appropriate in some contexts. At lower
layers
(internet, network), connectionless service is more robust (e.g., see
discussion
in
Section 9.1). In addition, it represents a "least common denominator"
of service
to
be expected at higher layers. Further, even at transport and above, there is
justification
for
a connectionless service. There are instances in which the overhead of
connection
establishment and maintenance is unjustified or even counterproductive.
Some
examples follow:
a
Inward
data collection. Involves
the periodic active or passive sampling of
data
sources, such as sensors, and automatic self-test reports from security
equipment
or network components. In a real-time monitoring situation, the
loss
of an occasional data unit would not cause distress, as the next report
should
arrive shortly.
Outward
data dissemination. Includes
broadcast messages to network users,
the
announcement of a new node or the change of address of a service, and
the
distribution of real-time clock values.
Request-response.
Applications
in which a transaction service is provided by
a
common server to a number of distributed TS users, and for which a single
request-response
sequence is typical. Use of the service is regulated at the
application
level, and lower-level connections are often unnecessary and
cumbersome.
Real-time
applications. Such
as voice and telemetry, involving a degree of
redundancy
and/or a real-time transmission requirement; these must not have
connection-oriented
functions, such as retransmission.
Thus,
there is a place at the transport level for both a connection-oriented and
a
connectionless type of service.
Quality
of
Service
The
transport protocol entity should allow the TS user to specify the quality of
transmission
service to be provided. The transport entity will attempt to optimize
the
use of the underlying link, network, and internet resources to the best of its
ability,
so
as to provide the collective requested services.
Examples
of services that might be requested are
Acceptable
error and loss levels
Desired
average and maximum delay
Desired
average and minimum throughput
a
Priority
levels
Of
course, the transport entity is limited to the inherent capabilities of the
underlying
service. For example, IP does provide a quality-of-service parameter. It
allows
for specification of eight levels of precedence or priority as well as a binary
specification
for normal or low delay, normal or high throughput, and normal or
high
reliability. Thus, the transport entity can "pass the buck" to the
internetwork
entity.
However, the internet protocol entity is itself limited; routers have some
freedom
to schedule items preferentially from buffers, but beyond that are still
dependent
on the underlying transmission facilities. Here is another example: X.25
provides
for throughput class negotiation as an optional user facility. The network
may
alter flow control parameters and the amount of network resources allocated
on
a virtual circuit to achieve desired throughput.
The
transport layer may also resort to other mechanisms to try to satisfy TS
user
requests, such as splitting one transport connection among multiple virtual
circuits
to
enhance throughput.
The
TS user of the quality-of-service feature needs to recognize that
Depending
on the nature of the transmission facility, the transport entity will
have
varying degrees of success in providing a requested grade of service.
There
is bound to be a trade-off among reliability, delay, throughput, and cost
of
services.
Nevertheless,
certain applications would benefit from, or even require, certain
qualities
of service and, in a hierarchical or layered architecture, the easiest way for
an
application to extract this quality of service from a transmission facility is
to pass
the
request down to the transport protocol.
Examples
of applications that might request particular qualities of service are
as
follows:
A
file transfer protocol might require high throughput. It may also require
high
reliability to avoid retransmissions at the file transfer level.
A
transaction protocol (e.g., web browser-web server) may require low delay.
An
electronic mail protocol may require multiple priority levels.
One
approach to providing a variety of qualities of service is to include a
quality-of-service
facility within the protocol; we have seen this with IP and will see
that
transport protocols typically follow the same approach. An alternative is to
provide
a different transport protocol for different classes of traffic; this is to
some
extent
the approach taken by the ISO-standard family of transport protocols.
Data Transfer
The
whole purpose, of course, of a transport protocol is to transfer data between
two
transport entities. Both user data and control data must be transferred, either
on
the same channel or separate channels. Full-duplex service must be provided.
Half-duplex
and simpkx modes may also be offered to support peculiarities of particular
TS
users.
User Interface
It
is not clear that the exact mechanism of the user interface to the transport
protocol
should
be standardized. Rather, it should be optimized to the station environment.
As
examples, a transport entity's services could be invoked by
* Procedure
calls.
* Passing
of data and parameters to a process through a mailbox.
* Use
of direct memory access (DMA) between a host user and a front-end
processor
containing the transport entity.
A
few characteristics of the interface may be specified, however. For example,
a
mechanism is needed to prevent the TS user from swamping the transport entity
with
data. A similar mechanism is needed to prevent the transport entity from
swamping
a TS user with data. Another aspect of the interface has to do with the
timing
and significance of confirmations. Consider the following: A TS user passes
data
to a transport entity to be delivered to a remote TS user. The local transport
entity
can acknowledge receipt of the data immediately, or it can wait until the
remote
transport entity reports that the data have made it through to the other end.
Perhaps
the most useful interface is one that allows immediate acceptance or rejection
of
requests, with later confirmation of the end-to-end significance.
Connection Management
When
connection-oriented service is provided, the transport entity is responsible
for
establishing and terminating connections. A symmetric connection-establishment
procedure
should be provided, which allows either TS user to initiate connection
establishment.
An asymmetric procedure may also be provided to support simplex
connections.
Connection
termination can be either abrupt or graceful. With an abrupt
termination,
data
in transit may be lost. A graceful termination prevents either side
from
shutting down until all data have been delivered.
Expedited Delivery
A
service similar to that provided by priority classes is the expedited delivery
of
data.
Some data submitted to the transport service may supersede data submitted
previously.
The transport entity will endeavor to have the transmission facility
transfer
the data as rapidly as possible. At the receiving end, the transport entity
will
interrupt the TS user to notify it of the receipt of urgent data. Thus, the
expedited
data
service is in the nature of an interrupt mechanism, and is used to transfer
occasional
urgent data, such as a break character from a terminal or an alarm
condition.
In contrast, a priority service might dedicate resources and adjust parameters
such
that, on average, higher priority data are delivered more quickly.
Status Reporting
A
status reporting service allows the TS user to obtain or be notified of
information
concerning
the condition or attributes of the transport entity or a transport connection.
Examples
of status information are
Performance
characteristics of a connection (e.g., throughput, mean delay)
Addresses
(network, transport)
Class
of protocol in use
Current
timer values
*
State
of protocol "machine" supporting a connection
Degradation
in requested quality of service
Security
The
transport entity may provide a variety of security services. Access control may
be
provided in the form of local verification of sender and remote verification of
receiver.
The transport service may also include encryptionldecryption of data on
demand.
Finally, the transport entity may be capable of routing through secure links
or
nodes if such a service is available from the transmission facility.
PROTOCOL
MECHANISM
It
is the purpose of this section to make good on our claim that a transport
protocol
may
need to be very complex. For purposes of clarity, we present the transport
protocol
mechanisms
in an evolutionary fashion. We begin with a network service that
makes
life easy for the transport protocol, by guaranteeing the delivery of all
transport
data
units in order, as well as defining the required mechanisms. Then we will
look
at the transport protocol mechanisms required to cope with an unreliable network
service.
Reliable
Sequencing Network Service
In
this case, we assume that the network service will accept messages of arbitrary
length
and will, with virtually 100% reliability, deliver them in sequence to the
destination.
Examples
of such networks follow:
A
highly reliable packet-switching network with an X.25 interface
A
frame relay network using the LAPF control protocol
An
IEEE 802.3 LAN using the connection-oriented LLC service
The
assumption of a reliable sequencing networking services allows the use of
a
quite simple transport protocol. Four issues need to be addressed:
Addressing
Multiplexing
Flow
control
Connection
establishmentltermination
Addressing
The
issue concerned with addressing is simply this: A user of a given transport
entity
wishes to either establish a connection with or make a connectionless data
transfer
to a user of some other transport entity. The target user needs to be specified
by
all of the following:
* User
identification
Transport
entity identification
* Station
address
* Network
number
The
transport protocol must be able to derive the information listed above
from
the TS user address. Typically, the user address is specified as station or
port.
The
port variable represents a particular TS user at the specified station;
in OSI, this
is
called a transport service access point (TSAP). Generally, there will be a
single
transport
entity at each station, so a transport entity identification is not needed. If
more
than one transport entity is present, there is usually only one of each type.
In
this
latter case, the address should include a designation of the type of transport
protocol
(e.g., TCP, UDP). In the case of a single network, station identifies an
attached
network device. In the case of an internet, station is a global internet
address.
In TCP, the combination of port and station is referred to as a socket.
Because
routing is not a concern of the transport layer, it simply passes the
station
portion of the address down to the network service. Port is included
in
a transport header, to be used at the destination by the destination transport
protocol.
One
question remains to be addressed: How does the initiating TS user know
the
address of the destination TS user? Two static and two dynamic strategies
suggest
themselves:
I.
The TS user must know the address it wishes to use ahead of time; this is
basically
a
system configuration function. For example, a process may be running
that
is only of concern to a limited number of TS users, such as a process that
collects
statistics on performance. From time to time, a central network management
routine
connects to the process to obtain the statistics. These processes
generally
are not, and should not be, well-known and accessible to all.
2.
Some
commonly used services are assigned "well-known addresses" (for
example,
time sharing and word processing).
3.
A
name server is provided. The TS user requests a service by some generic or
global
name. The request is sent to the name server, which does a directory
lookup
and returns an address. The transport entity then proceeds with the
connection.
This service is useful for commonly used applications that change
location
from time to time. For example, a data entry process may be moved
from
one station to another on a local network in order to balance load.
4.
In
some cases, the target user is to be a process that is spawned at request
time.
The initiating user can send a process request to a well-known address.
The
user at that address is a privileged system process that will spawn the new
process
and return an address. For example, a programmer has developed a
private
application (e.g., a simulation program) that will execute on a remote
mainframe
but be invoked from a local minicomputer. An RJE-type request
can
be issued to a remote job-management process that spawns the simulation
process.
Multiplexing
We
now turn to the concept of multiplexing, which was discussed in general terms
in
Section 15.1. With respect to the interface between the transport protocol and
higher-level
protocols, the transport protocol performs a multiplexing/demultiplexing
function.
That is, multiple users employ the same transport protocol, and are
distinguished
by either port numbers or service access points.
The
transport entity may also perform a multiplexing function with respect to
the
network services that it uses. Recall that we defined upward multiplexing as
the
multiplexing
of multiple connections on a single lower-level connection, and downward
multiplexing
as the splitting of a single connection among multiple lower-level
connections.
Consider,
for example, a transport entity making use of an X.25 service. Why
should
the transport entity employ upward multiplexing? There are, after all, 4095
virtual
circuits available. In the typical case, this is more than enough to handle all
active
TS users. However, most X.25 networks base part of their charge on
virtualcircuit
connect
time, as each virtual circuit consumes some node buffer resources.
Thus,
if a single virtual circuit provides sufficient throughput for multiple TS
users,
upward
multiplexing is indicated.
On
the other hand, downward multiplexing or splitting might be used to
improve
throughput. For example, each X.25 virtual circuit is restricted to a 3-bit or
7-bit
sequence number. A larger sequence space might be needed for high-speed,
high-delay
networks. Of course, throughput can only be increased so far. If there is
a
single station-node link over which all virtual circuits are multiplexed, the
throughput
of a transport connection cannot exceed the data rate of that link.
Flow
Control
Whereas
flow control is a relatively simple mechanism at the link layer, it is a rather
complex
mechanism at the transport layer, for two main reasons:
0
Flow
control at the transport level involves the interaction of TS users, transport
entities,
and the network service.
The
transmission delay between transport entities is generally long compared
to
actual transmission time, and, what is worse, it is variable.
Figure
17.3 illustrates the first point. TS user A wishes to send data to TS user
B
over a transport connection. We can view the situation as involving four
queues.
A
generates data and queues it up to send. A must wait to send that data until
0
It
has permission from B (peer flow control).
It
has permission from its own transport entity (interface flow control).
As
data flow down from A to transport entity a, a queues the data until it has
permission
to send it on from b and the network service. The data are then handed
to
the network layer for delivery to b. The network service must queue the data
until
it receives permission from b to pass them on. Finally, b must await B's
permission
before
delivering the data to their destination.
To
see the effects of delay, consider the possible interactions depicted in Figure
17.4.
When
a TS
user
wishes to transmit data, it sends these data to its transport
entity
(e.g., using a Send call); this triggers two events. The transport entity
generates
one
or more transport-level protocol data units, which we will call segment^,^
and
passes these on to the network service. It also in some way acknowledges to the
TS
user that it has accepted the data for transmission. At this point, the
transport
entity
can exercise flow control across the user-transport interface by simply
withholding
its
acknowledgment. The transport entity is most likely to do this if the
entity
itself is being held up by a flow control exercised by either the network
service
or
the target transport entity.
In
any case, once the transport entity has accepted the data, it sends out a
segment.
Some
time later, it receives an acknowledgment that the data have been
received
at the remote end. It then sends a confirmation to the sender.
At
the receiving end, a segment arrives at the transport entity, which unwraps
the
data and sends them on (e.g., by an Indication primitive) to the destination TS
user.
When the TS user accepts the data, it issues an acknowledgment (e.g., in the
form
of a Response primitive). The TS user can exercise flow control over the
transport
entity
by withholding its response.
The
target transport entity has two choices regarding acknowledgment. Either
it
can issue an acknowledgment as soon as it has correctly received the segment
(the
usual
practice), or it can wait until it knows that the TS user has correctly
received
the
data before acknowledging; the latter course is the safer, where the
confirmation
is
in fact a confirmation that the destination TS user received the data. In the
former
case, the entity merely confirms that the data made it through to the remote
transport
entity.
With
the discussion above in mind, we can cite two reasons why one transport
entity
would want to restrain the rate of segment transmission over a connection
from
another transport entity:
* The
user of the receiving transport entity cannot keep up with the flow of
data.
* The
receiving transport entity itself cannot keep up with the flow of segments.
How
do such problems manifest themselves? Well, presumably a transport
entity
has a certain amount of buffer space, to which incoming segments are added.
Each
buffered segment is processed (i.e., the transport header is examined) and the
data
are sent to the TS user. Either of the two problems mentioned above will cause
the
buffer to fill up. Thus, the transport entity needs to take steps to stop or
slow
the
flow of segments so as to prevent buffer overflow. This requirement is not so
easy
to fulfill, because of the annoying time gap between sender and receiver. We
return
to this point in a moment. First, we present four ways of coping with the flow
control
requirement. The receiving transport entity can
1.
Do
nothing.
2.
Refuse
to accept further segments from the network service.
3.
Use
a fixed sliding-window protocol.
4.
Use a credit scheme.
Alternative
1 means that the segments that overflow the buffer are discarded.
The
sending transport entity, failing to get an acknowledgment, will retransmit.
This
is
a shame, as the advantage of a reliable network is that one never has to
retransmit.
Furthermore,
the effect of this maneuver is to exacerbate the problem! The
sender
has increased its output to include new segments, plus retransmitted old
segments.
The
second alternative is a backpressure mechanism that relies on the network
service
to do the work. When a buffer of a transport entity is full, it refuses
additional
data from the network service. This triggers flow control procedures
within
the network that throttle the network service at the sending end. This service,
in
turn, refuses additional segments from its transport entity. It should be clear
that
this mechanism is clumsy and coarse-grained. For example, if multiple transport
connections
are multiplexed on a single network connection (virtual circuit),
flow
control is exercised only on the aggregate of all transport connections.
The
third alternative is already familiar to you from our discussions of link
layer
protocols. The key ingredients, recall, are
@
The
use of sequence numbers on data units.
The
use of a window of fixed size.
The
use of acknowledgments to advance the window.
With
a reliable network service, the sliding window technique would actually
work
quite well. For example, consider a protocol with a window size of 7. Whenever
the
sender receives an acknowledgment to a particular segment, it is automatically
authorized
to send the succeeding seven segments. (Of course, some may
already
have been sent.) Now, when the receiver's buffer capacity gets down to
seven
segments, it can withhold acknowledgment of incoming segments to avoid
overflow.
The sending transport entity can send, at most, seven additional segments
and
then must stop. Because the underlying network service is reliable, the sender
will
not time-out and retransmit. Thus, at some point, a sending transport entity
may
have a number of segments outstanding, for which no acknowledgment has
been
received. Because we are dealing with a reliable network, the sending transport
entity
can assume that the segments will get through and that the lack of
acknowledgment
is a flow control tactic. Such a strategy would not work well in an
unreliable
network, as the sending transport entity would not know whether the
lack
of acknowledgment is due to flow control or a lost segment.
The
fourth alternative, a credit scheme, provides the receiver with a greater
degree
of control over data flow. Although it is not strictly necessary with a
reliable
network
service, a credit scheme should result in a smoother traffic flow; further, it
is
a more effective scheme with an unreliable network service, as we shall see.
The
credit scheme decouples acknowledgment from flow control. In fixed
sliding-window
protocols, such as X.25 and HDLC, the two are synonymous. In a
credit
scheme, a segment may be acknowledged without granting new credit, and
vice
versa. Figure 17.5 illustrates the protocol (compare Figure 6.4). For
simplicity,
we
show a data flow in one direction only. In this example, data segments are
numbered
sequentially
modulo 8 (e.g., SN 0 = segment
with sequence number 0). Initially,
through
the connection-establishment process, the sending and receiving
sequence
numbers are synchronized, and A is granted a credit allocation of 7. A
advances
the trailing edge of its window each time that it transmits, and advances
the
leading edge only when it is granted credit.
Figure
17.6 shows the view of this mechanism from the sending and receiving
sides;
of course, both sides take both views because data may be exchanged in both
directions.
From the sending point of view, sequence numbers fall into four regions:
Data
sent and acknowledged. Beginning with the initial sequence number
used
on this connection through the last acknowledged number.
Q
Data
sent but not yet acknowledged. Represents data that have already been
transmitted,
with the sender now awaiting acknowledgment.
Permitted
data transmission. The
window of allowable transmissions, based
on
unused credit allocated from the other side.
Unused
and unusable numbers. Numbers above the window.
From
the receiving point of view, the concern is for received data and for the
window
of credit that has been allocated. Note that the receiver is not required to
immediately
acknowledge incoming segments, but may wait and issue a cumulative
acknowledgment
for a number of segments; this is true for both TCP and the IS0
transport
protocol.
In
both the credit allocation scheme and the sliding window scheme, the
receiver
needs to adopt some policy concerning the amount of data it permits the
sender
to transmit. The conservative approach is to only allow new segments up to
the
limit of available buffer space. If this policy were in effect in Figure 17.5,
the first
credit
message implies that B has five free buffer slots, and the second message that
B
has seven free slots.
A conservative flow control scheme may limit the
throughput of the transport
connection
in long-delay situations. The receiver could potentially increase
throughput
by optimistically granting credit for space it does not have. For example,
if
a receiver's buffer is full but it anticipates that it can release space for
two segments
within
a round-trip propagation time, it could immediately send a credit of 2.
If
the receiver can keep up with the sender, this scheme may increase throughput
and
can do no harm. If the sender is faster than the receiver, however, some
segments
may
be discarded, necessitating a retransmission. Because retransmissions
are
not otherwise necessary with a reliable network service, an optimistic flow
control
scheme
will complicate the protocol.
Connection
Establishment and Termination
Even
with a reliable network service, there is a need for connection establishment
and
termination procedures to support connection-oriented service. Connection
establishment
serves three main purposes:
It
allows each end to assure that the other exists.
It
allows negotiation of optional parameters (e.g., maximum segment size,
maximum
window size, quality of service).
It
triggers allocation of transport entity resources (e.g., buffer space, entry in
connection
table).
Connection
establishment is by mutual agreement and can be accomplished
by
a simple set of user commands and control segments, as shown in the state
diagram
of
Figure 17.7. To begin, a TS user is in an CLOSED state (i.e., it has no open
transport
connection). The TS user can signal that it will passively wait for a request
with
a Passive Open command. A server program, such as time sharing or a file
transfer
application, might do this. The TS user may change its mind by sending a
Close
command. After the Passive Open command is issued, the transport entity
creates
a connection object of some sort (i.e., a table entry) that is in the LISTEN
state.
From
the CLOSED state, the TS user may open a connection by issuing an
Active
Open command, which instructs the transport entity to attempt connection
establishment
with a designated user, which then triggers the transport entity to
send
an SYN (for synchronize) segment. This segment is carried to the receiving
transport
entity and interpreted as a request for connection to a particular port. If
the
destination transport entity is in the LISTEN state for that port, then a
connection
is
established through the following actions by the receiving transport entity:
Signal
the TS user that a connection is open.
Send
an SYN as confirmation to the remote transport entity.
Put
the connection object in an ESTAB (established) state.
When
the responding SYN is received by the initiating transport entity, it too
can
move the connection to an ESTAB state. The connection is prematurely
aborted
if either TS user issues a Close command.
Figure
17.8 shows the robustness of this protocol. Either side can initiate a
connection.
Further, if both sides initiate the connection at about the same time, it
is
established without confusion; this is because the SYN segment functions both
as
a
connection request and a connection acknowledgment.
The
reader may ask what happens if an SYN comes in while the requested TS
user
is idle (not listening). Three courses may be followed:
The
transport entity can reject the request by sending an RST (reset) segment
back
to the other transport entity.
9
The
request can be queued until a matching Open is issued by the TS user.
e The transport entity can interrupt or
otherwise signal the TS user to notify it
of
a pending request.
Note
that if the latter mechanism is used, a Passive Open command is not
strictly
necessary, but may be replaced by an Accept command, which is a signal
from
the user to the transport entity that it accepts the request for connection.
Connection
termination is handled similarly. Either side, or both sides, may
initiate
a close. The connection is closed by mutual agreement. This strategy allows
for
either abrupt or graceful termination. To achieve the latter, a connection in
the
CLOSE
WAIT state must continue to accept data segments until a FIN (finish) segment
is
received.
Similarly,
the diagram defines the procedure for graceful termination. First,
consider
the side that initiates the termination procedure:
1.
In response to a TS user's Close primitive, a FIN segment is sent to the other
side
of the connection, requesting termination.
2.
Having
sent the FIN, the transport entity places the connection in the FIN
WAIT
state. In this state, the transport entity must continue to accept data
from
the other side and deliver that data to its user.
3.
When
a FIN is received in response, the transport entity informs its user and
closes
the connection.
From
the point of view of the side that does not initiate a termination,
1. When a FIN segment is received, the transport entity
informs its user of the
termination
request and places the connection in the CLOSE WAIT state. In
this
state, the transport entity must continue to accept data from its user and
transmit
it in data segments to the other side.
2.
When
the user issues a Close primitive, the transport entity sends a responding
FIN
segment to the other side and closes the connection.
This
procedure ensures that both sides have received all outstanding data and
that
both sides agree to connection termination before actual termination.
Unreliable Network Service
The
most difficult case for a transport protocol is that of an unreliable network
service.
Examples
of such networks are
An
internetwork using IP
A
frame relay network using only the LAPF core protocol
An
IEEE 802.3
LAN
using the unacknowledged connectionless LLC service
The
problem is not just that segments are occasionally lost, but that segments
may
arrive out of sequence due to variable transit delays. As we shall see,
elaborate
machinery
is required to cope with these two interrelated network deficiencies. We
shall
also see that a discouraging pattern emerges. The combination of unreliability
and
nonsequencing creates problems with every mechanism we have discussed so
far.
Generally, the solution to each problem raises new problems, and although
there
are problems to be overcome for protocols at all levels, it seems that there
are
more
difficulties with a reliable connection-oriented transport protocol than any
other
sort of protocol.
Seven
issues need to be addressed:
Ordered
delivery
Retransmission
strategy
Duplicate
detection
Flow
control
Connection
establishment
Connection
termination
Crash
recovery
Ordered
Delivery
With
an unreliable network service, it is possible that segments, even if they are
all
delivered,
may arrive out of order. The required solution to this problem is to number
segments
sequentially. We have seen that for data link control protocols, such
as
HDLC, and for X.25, that each data unit (frame, packet) is numbered
sequentially
with
each successive sequence number being one more than the previous
sequence
number; this scheme is used in some transport protocols, such as the IS0
transport
protocols. However, TCP uses a somewhat different scheme in which
each
data octet that is transmitted is implicitly numbered. Thus, the first segment
may
have a sequence number of 0. If that segment has 1000 octets of data, then the
second
segment would have the sequence number 1000, and so on. For simplicity in
the
discussions of this section, we will assume that each successive segment's
sequence
number
is one more than that of the previous segment.
Retransmission
Strategy
Two
events necessitate the retransmission of a segment. First, the segment may be
damaged
in transit but, nevertheless, could arrive at its destination. If a frame check
sequence
is included with the segment, the receiving transport entity can detect the
error
and discard the segment. The second contingency is that a segment fails to
arrive.
In either case, the sending transport entity does not know that the segment
transmission
was unsuccessful. To cover this contingency, we require that a positive
acknowledgment
(ACK) scheme be used: The receiver must acknowledge each successfully
received
segment. For efficiency, we do not require one ACK per segment.
Rather,
a cumulative acknowledgment can be used, as we have seen many times in
this
lesson. Thus, the receiver may receive segments numbered 1,2, and 3, but only
send
ACK 4 back. The sender must interpret ACK 4 to mean that number 3 and all
previous
segments have been successfully received.
If
a segment does not arrive successfully, no ACK will be issued and a
retransmission
becomes
necessary. To cope with this situation, there must be a timer associated
with
each segment as it is sent. If the timer expires before the segment is
acknowledged,
the sender must retransmit.
So,
the addition of a timer solves this first problem. Next, at what value should
the
timer be set? If the value is too small, there will be many unnecessary
retransmissions,
thereby
wasting network capacity. If the value is too large, the protocol
will
be sluggish in responding to a lost segment. The timer should be set at a value
a
bit longer than the round trip delay (send segment, receive ACK). Of course
this
delay
is variable even under constant network load. Worse, the statistics of the
delay
will vary with changing network conditions.
Two
strategies suggest themselves. A fixed timer value could be used, based
on
an understanding of the network's typical behavior; this suffers from an
inability
to
respond to changing network conditions. If the value is set too high, the
service
will
always be sluggish. If it is set too low, a positive feedback condition can
develop,
in which network congestion leads to more retransmissions, which increase
congestion.
An
adaptive scheme has its own problems. Suppose that the transport entity
keeps
track of the time taken to acknowledge data segments and sets its retrans
mission
timer based on the average of the observed delays. This value cannot be
trusted
for three reasons:
The
peer entity may not acknowledge a segment immediately; recall that we
gave
it the privilege of cumulative acknowledgments.
If
a segment has been retransmitted, the sender cannot know whether the
received
ACK is a response to the initial transmission or the retransmission.
Network
conditions may change suddenly.
Each
of these problems is a cause for some further tweaking of the transport
algorithm,
but
the problem admits of no complete solution. There will always be some
uncertainty
concerning the best value for the retransmission timer.
Incidentally,
the retransmission timer is only one of a number of timers
needed
for proper functioning of a transport protocol; these are listed in Table 17.1,
together
with a brief explanation. Further discussion will be found in what follows.
Duplicate
Detection
If
a segment is lost and then retransmitted, no confusion will result. If,
however, an
ACK
is lost, one or more segments will be retransmitted and, if they arrive
successfully,
will
be duplicates of previously received segments. Thus, the receiver must
be
able to recognize duplicates. The fact that each segment carries a sequence
number
helps
but, nevertheless, duplicate detection and handling is no easy thing. There
are
two cases:
A
duplicate is received prior to the close of the connection.
A
duplicate is received after the close of the connection.
Notice
that we say "a" duplicate rather than "the" duplicate. From
the
sender's
point of view, the retransmitted segment is the duplicate. However, the
retransmitted
segment may arrive before the original segment, in which case
the
receiver views the original segment as the duplicate. In any case, two tactics
are
needed
to cope with a duplicate received prior to the close of a connection:
The
receiver must assume that its acknowledgment was lost and therefore
must
acknowledge the duplicate. Consequently, the sender must not get confused
if
it receives multiple ACKs to the same segment.
The
sequence number space must be long enough so as not to "cycle" in
less
than
the maximum possible segment lifetime.
Figure
17.9 illustrates the reason for the latter requirement. In this example,
the
sequence space is of length 8. For simplicity, we assume a sliding-window
protocol
with
a window size of
3. Suppose that A has transmitted data segments O,1,
and
2 and receives no acknowledgments. Eventually, it times-out and retransmits
segment
0. B has received 1 and 2, but 0 is delayed in transit. Thus, B does not
send
any
ACKs. When the duplicate segment 0 arrives, B acknowledges 0, 1, and 2.
Meanwhile,
A has timed-out again and retransmits 1, which B acknowledges with
another
ACK 3. Things now seem to have sorted themselves out, and data transfer
continues.
When the sequence space is exhausted, A cycles back to sequence number
0
and continues. Alas, the old segment 0 makes a belated appearance and is
accepted
by B before the new segment 0 arrives
It
should be clear that the untimely emergence of the old segment would have
caused
no difficulty if the sequence numbers had not yet wrapped around. The
problem
is: How big must the sequence space be? This depends on, among other
things,
whether the network enforces a maximum packet lifetime, as well as the rate
at
which segments are being transmitted. Fortunately, each addition of a single
bit
to
the sequence number field doubles the sequence space, so it is rather easy to
select
a safe size. As we shall see, the standard transport protocols allow stupendous
sequence
spaces.
Flow
Control
The
credit-allocation flow control mechanism described earlier is quite robust in
the
face
of an unreliable network service and requires little enhancement. We assume
that
the credit allocation scheme is tied to acknowledgments in the following way:
To
both acknowledge segments and grant credit, a transport entity sends a control
segment
of the form (ACK N, CREDIT M), where ACK N acknowledges
all data
segments
through number N
- 1, and CREDIT M
allows segments number N
though
N + M + 1 to be
transmitted. This mechanism is quite powerful. Consider
that
the last control segment issued by B was (ACK N, CREDIT M). Then,
To
increase or decrease credit to Xwhen no additional segments have arrived,
B
can issue (ACK N, CREDIT X).
To
acknowledge a new segment without increasing credit, B can issue (ACK
N + 1,
CREDIT M -
1).
If
an ACKICREDIT segment is lost, little harm is done. Future acknowledgments
will
resynchronize the protocol. Further, if no new acknowledgments are
forthcoming,
the sender times-out and retransmits a data segment, which triggers a
new
acknowledgment. However, it is still possible for deadlock to occur. Consider
a
situation in which B sends (ACK N, CREDIT 0), temporarily closing the
window.
Subsequently,
B sends (ACK N,
CREDIT
M), but this segment is lost. A is awaiting
the
opportunity to send data, and B thinks that it has granted that opportunity.
To
overcome this problem, a window timer can be used. This timer is reset with
each
outgoing ACWCREDIT segment. If the timer ever expires, the protocol
entity
is required to send an ACKICREDIT segment, even if it duplicates a previous
one.
This breaks the deadlock and also assures the other end that the protocol
entity
is still alive.
An
alternative or supplemental mechanism is to provide for acknowledgments
to
the ACKICREDIT segment. With this mechanism in place, the window timer can
have
quite a large value without causing much difficulty.
Connection
Establishment
As
with other protocol mechanisms, connection establishment must take into account
the
unreliability of a network service. Recall that a connection establishment
calls
for the exchange of SYNs, a procedure sometimes referred to as a two-way
handshake.
Suppose that A issues an SYN to B. It expects to get an SYN back, confirming
the
connection. Two things can go wrong: A's SYN can be lost or B's
answering
SYN can be lost. Both cases can be handled by use of a retransmit-SYN
timer.
After A issues an SYN, it will reissue the SYN when the timer expires.
This
situation gives rise. potentially, to duplicate SYNs. If A's initial SYN was
lost,
there are no duplicates. If B's response was lost, then B may receive two SYNs
from
A. Further, if B's response was not lost, but simply delayed, A may get two
responding
SYNs; all of this means that A and B must simply ignore duplicate SYNs
once
a connection is established.
There
are other problems with which to contend. Just as a delayed SYN or
lost
response can give rise to a duplicate SYN, a delayed data segment or lost
acknowledgment
can give rise to duplicate data segments, as we have seen in Figure
17.9).
Such a delayed or duplicated data segment can interfere with connection
establishment,
as illustrated in Figure 17.10. Assume that with each new connection,
each
transport protocol entity begins numbering its data segments with sequence
number
0. In the figure, a duplicate copy of segment 2 from an old connection
arrives
during the lifetime of a new connection and is delivered to B before delivery
of
the legitimate data segment number 2. One way of attacking this problem is
to
start each new connection with a different sequence number, far removed from
the
last sequence number of the most recent connection. For this purpose, the
connection
request
is of the form SYN i, where i is the sequence number of the first
data
segment that will be sent on this connection.
Now,
consider that a duplicate SYN i may survive past
the termination of the
connection.
Figure 17.11 depicts the problem that may arise. An old SYN i arrives
at
B after the connection is terminated. B assumes that this is a fresh request
and
responds
with SYN j.
Meanwhile,
A has decided to open a new connection with B
and
sends SYN k; B discards this as a duplicate. Now, both sides have
transmitted
and
subsequently received a SYN segment, and therefore think that a valid con-
nection
exists. However, when A initiates data transfer with a segment numbered k,
B
rejects the segment as being out of sequence.
The
way out of this problem is for each side to acknowledge explicitly the
other's
SYN and sequence number. The procedure is known as a three-way handshake.
The
revised connection state diagram, which is the one employed by TCP, is
shown
in the upper part of Figure 17.12. A new state (SYN RECEIVED) is added,
in
which the transport entity hesitates during connection opening to assure that
the
SYN
segments sent by the two sides have both been acknowledged before the
connection
is
declared established. In addition to the new state, there is a control segment
(RST)
to reset the other side when a duplicate SYN is detected.
Figure
17.13 illustrates typical three-way handshake operations. Transport
entity
A initiates the connection; a SYN includes the sending sequence number, i.
The
responding SYN acknowledges that number and includes the sequence number
for
the other side. A acknowledges the SYNIACK in its first data segment. Next is
shown
a situation in which an old SYN X arrives at B after the close of the relevant
connection.
B assumes that this is a fresh request and responds with SYN j, ACK i.
When
A receives this message, it realizes that it has not requested a connection and
therefore
sends an RST, ACK j.
Note
that the ACK j
portion
of the RST message
is
essential so that an old duplicate RST does not abort a legitimate connection
establishment.
The final example shows a case in which an old SYN, ACK arrives
in
the middle of a new connection establishment. Because of the use of sequence
numbers
in the acknowledgments, this event causes no mischief.
The
upper part of Figure 17.12 does not include transitions in which RST is
sent.
This was done for simplicity. The basic rule is to send an RST if the
connection
state
is not yet OPEN and an invalid ACK (one that does not reference something
that
was sent) is received. The reader should try various combinations of
events
to see that this connection establishment procedure works in spite of any
combination
of old and lost segments.
Connection
Termination
The
state diagram of Figure 17.7 defines the use of a simple two-way handshake for
connection
establishment, which was found to be unsatisfactory in the face of an
unreliable
network service. Similarly, the two-way handshake defined in that dia
gram
for connection termination is inadequate for an unreliable network service.
The
following scenario could be caused by a misordering of segments. A transport
entity
in the CLOSE WAIT state sends its last data segment, followed by a FIN segment,
but
the FIN segment arrives at the other side before the last data segment.
The
receiving transport entity will accept that FIN, close the connection, and lose
the
last segment of data. To avoid this problem, a sequence number can be
associated
with
the FIN, which can be assigned the next sequence number after the last
octet
of transmitted data. With this refinement, the receiving transport entity, upon
receiving
a FIN, will wait if necessary for the late-arriving data before closing the
connection.
A
more serious problem is the potential loss of segments and the potential
presence
of obsolete segments. Figure 17.12 shows that the termination procedure
adopts
a similar solution to that used for connection establishment. Each side must
explicitly
acknowledge the FIN of the other, using an ACK with the sequence number
of
the FIN to be acknowledged. For a graceful close, a transport entity requires
the
following:
It
must send a FIN i
and
receive an ACK i.
It
must receive a FIN j
and
send an ACK j.
It
must wait an interval equal to twice the maximum-expected segment
lifetime
Crash
Recovery
When
the system upon which a transport entity is running fails and subsequently
restarts,
the state information of all active connections is lost. The affected
connections
become
half-open, as the side that did not fail does not yet realize the problem.
The
still active side of a half-open connection can close the connection using
a
give-up timer. This timer measures the time the transport machine will continue
to
await an acknowledgment (or other appropriate reply) of a transmitted segment
after
the segment has been retransmitted the maximum number of times. When the
timer
expires, the transport entity assumes that either the other transport entity or
the
intervening network has failed. As a result, the timer closes the connection,
and
signals
an abnormal close to the TS user.
In
the event that a transport entity fails and quickly restarts, half-open
connections
can
be terminated more quickly by the use of the RST segment. The failed
side
returns an RST i
to
every segment i
that
it receives. When the RST i reaches
the
other side, it must be checked for validity based on the sequence number i, as
the
RST could be in response to an old segment. If the reset is valid, the
transport
entity
performs an abnormal termination.
These
measures clean up the situation at the transport level. The decision as
to
whether to reopen the connection is up to the TS users. The problem is one of
synchronization.
At the time of failure, there may have been one or more outstanding
segments
in either direction. The TS user on the side that did not fail
knows
how much data it has received, but the other user may not if state information
were
lost. Thus, there is the danger that some user data will be lost or
duplicated
No comments:
Post a Comment
silahkan membaca dan berkomentar