7.4.4
Internet Radio
Once it became possible to stream
audio over the Internet, commercial radio stations got the idea of broadcasting
their content over the Internet as well as over the air. Not so long after
that, college radio stations started putting their signal out over the
Internet. Then college students started their own radio stations. With current
technology, virtually anyone can start a radio station. The whole area of
Internet radio is very new and in a state of flux, but it is worth saying a
little bit about.
There are two general approaches to
Internet radio. In the first one, the programs are prerecorded and stored on
disk. Listeners can connect to the radio station's archives and pull up any
program and download it for listening. In fact, this is exactly the same as the
streaming audio we just discussed. It is also possible to store each program
just after it is broadcast live, so the archive is only running, say, half an
hour, or less behind the live feed. The advantages of this approach are that it
is easy to do, all the techniques we have discussed work here too, and
listeners can pick and choose among all the programs in the archive.
The other approach is to broadcast
live over the Internet. Some stations broadcast over the air and over the
Internet simultaneously, but there are increasingly many radio stations that
are Internet only. Some of the techniques that are applicable to streaming
audio are also applicable to live Internet radio, but there are also some key
differences.
One point that is the same is the
need for buffering on the user side to smooth out jitter. By collecting 10 or
15 seconds worth of radio before starting the playback, the audio can be kept
going smoothly even in the face of substantial jitter over the network. As long
as all the packets arrive before they are needed, it does not matter when they
arrived.
One key difference is that streaming
audio can be pushed out at a rate greater than the playback rate since the
receiver can stop it when the high-water mark is hit. Potentially, this gives
it the time to retransmit lost packets, although this strategy is not commonly
used. In contrast, live radio is always broadcast at exactly the rate it is
generated and played back.
Another difference is that a live
radio station usually has hundreds or thousands of simultaneous listeners
whereas streaming audio is point to point. Under these circumstances, Internet
radio should use multicasting with the RTP/RTSP protocols. This is clearly the
most efficient way to operate.
In current practice, Internet radio
does not work like this. What actually happens is that the user establishes a
TCP connection to the station and the feed is sent over the TCP connection. Of
course, this creates various problems, such as the flow stopping when the
window is full, lost packets timing out and being retransmitted, and so on.
The reason TCP unicasting is used
instead of RTP multicasting is threefold. First, few ISPs support multicasting,
so that is not a practical option. Second, RTP is less well known than TCP and
radio stations are often small and have little computer expertise, so it is
just easier to use a protocol that is widely understood and supported by all
software packages. Third, many people listen to Internet radio at work, which
in practice, often means behind a firewall. Most system administrators
configure their firewall to protect their LAN from unwelcome visitors. They
usually allow TCP connections from remote port 25 (SMTP for e-mail), UDP
packets from remote port 53 (DNS), and TCP connections from remote port 80
(HTTP for the Web). Almost everything else may be blocked, including RTP. Thus,
the only way to get the radio signal through the firewall is for the Web site
to pretend it is an HTTP server, at least to the firewall, and use HTTP
servers, which speak TCP. These severe measures, while providing only minimal
security. often force multimedia applications into drastically less efficient
modes of operation.
Since Internet radio is a new
medium, format wars are in full bloom. RealAudio, Windows Media Audio, and MP3
are aggressively competing in this market to become the dominant format for
Internet radio. A newcomer is Vorbis, which is technically similar to MP3 but
open source and different enough that it does not use the patents MP3 is based
on.
A typical Internet radio station has
a Web page listing its schedule, information about its DJs and announcers, and
many ads. There are also one or more icons listing the audio formats it
supports (or just LISTEN NOW if only one format is supported). These icons or
LISTEN NOW are linked metafiles of the type we discussed above.
When a user clicks on one of the
icons, the short metafile is sent over. The browser uses its MIME type or file
extension to determine the appropriate helper (i.e., media player) for the
metafile. Then it writes the metafile to a scratch file on disk, starts the
media player, and hands it the name of the scratch file. The media player reads
the scratch file, sees the URL contained in it (usually with scheme http rather
than rtsp to get around the firewall problem and because some popular
multimedia applications work that way), contacts the server, and starts acting
like a radio. As an aside, audio has only one stream, so http works, but for
video, which has at least two streams, http fails and something like rtsp is
really needed.
Another interesting development in
the area of Internet radio is an arrangement in which anybody, even a student,
can set up and operate a radio station. The main components are illustrated in Fig. 7-63. The basis of the station is an
ordinary PC with a sound card and a microphone. The software consists of a
media player, such as Winamp or Freeamp, with a plug-in for audio capture and a
codec for the selected output format, for example, MP3 or Vorbis.
The audio stream generated by the
station is then fed over the Internet to a large server, which handles
distributing it to large numbers of TCP connections. The server typically
supports many small stations. It also maintains a directory of what stations it
has and what is currently on the air on each one. Potential listeners go to the
server, select a station, and get a TCP feed. There are commercial software
packages for managing all the pieces, as well as open source packages such as
icecast. There are also servers that are willing to handle the distribution for
a fee.
Once upon a time, the public
switched telephone system was primarily used for voice traffic with a little
bit of data traffic here and there. But the data traffic grew and grew, and by
1999, the number of data bits moved equaled the number of voice bits (since
voice is in PCM on the trunks, it can be measured in bits/sec). By 2002, the
volume of data traffic was an order of magnitude more than the volume of voice
traffic and still growing exponentially, with voice traffic being almost flat
(5% growth per year).
As a consequence of these numbers,
many packet-switching network operators suddenly became interested in carrying
voice over their data networks. The amount of additional bandwidth required for
voice is minuscule since the packet networks are dimensioned for the data
traffic. However, the average person's phone bill is probably larger than his
Internet bill, so the data network operators saw Internet telephony as a way to
earn a large amount of additional money without having to put any new fiber in
the ground. Thus Internet telephony (also known as voice over IP), was born.
One thing that was clear to everyone
from the start was that if each vendor designed its own protocol stack, the
system would never work. To avoid this problem, a number of interested parties
got together under ITU auspices to work out standards. In 1996 ITU issued
recommendation H.323 entitled ''Visual Telephone Systems and Equipment for
Local Area Networks Which Provide a Non-Guaranteed Quality of Service.'' Only
the telephone industry would think of such a name. The recommendation was
revised in 1998, and this revised H.323 was the basis for the first widespread
Internet telephony systems.
H.323 is more of an architectural
overview of Internet telephony than a specific protocol. It references a large
number of specific protocols for speech coding, call setup, signaling, data
transport, and other areas rather than specifying these things itself. The
general model is depicted in Fig. 7-64. At the center is a gateway that
connects the Internet to the telephone network. It speaks the H.323 protocols
on the Internet side and the PSTN protocols on the telephone side. The
communicating devices are called terminals. A LAN may have a gatekeeper, which
controls the end points under its jurisdiction, called a zone.
A telephone network needs a number
of protocols. To start with, there is a protocol for encoding and decoding speech.
The PCM system is defined in ITU recommendation G.711. It encodes a single
voice channel by sampling 8000 times per second with an 8-bit sample to give
uncompressed speech at 64 kbps. All H.323 systems must support G.711. However,
other speech compression protocols are also permitted (but not required). They
use different compression algorithms and make different trade-offs between
quality and bandwidth. For example, G.723.1 takes a block of 240 samples (30
msec of speech) and uses predictive coding to reduce it to either 24 bytes or
20 bytes. This algorithm gives an output rate of either 6.4 kbps or 5.3 kbps
(compression factors of 10 and 12), respectively, with little loss in perceived
quality. Other codecs are also allowed.
Since multiple compression
algorithms are permitted, a protocol is needed to allow the terminals to negotiate
which one they are going to use. This protocol is called H.245. It also
negotiates other aspects of the connection such as the bit rate. RTCP is need
for the control of the RTP channels. Also required is a protocol for
establishing and releasing connections, providing dial tones, making ringing
sounds, and the rest of the standard telephony. ITU Q.931 is used here. The
terminals need a protocol for talking to the gatekeeper (if present). For this
purpose, H.225 is used. The PC-to-gatekeeper channel it manages is called the RAS
(Registration/Admission/Status ) channel. This channel allows terminals to join
and leave the zone, request and return bandwidth, and provide status updates,
among other things. Finally, a protocol is needed for the actual data
transmission. RTP is used for this purpose. It is managed by RTCP, as usual.
The positioning of all these protocols is shown in Fig. 7-65.
To see how these protocols fit
together, consider the case of a PC terminal on a LAN (with a gatekeeper)
calling a remote telephone. The PC first has to discover the gatekeeper, so it
broadcasts a UDP gatekeeper discovery packet to port 1718. When the gatekeeper
responds, the PC learns the gatekeeper's IP address. Now the PC registers with
the gatekeeper by sending it a RAS message in a UDP packet. After it has been
accepted, the PC sends the gatekeeper a RAS admission message requesting
bandwidth. Only after bandwidth has been granted may call setup begin. The idea
of requesting bandwidth in advance is to allow the gatekeeper to limit the
number of calls to avoid oversubscribing the outgoing line in order to help
provide the necessary quality of service.
The PC now establishes a TCP connection
to the gatekeeper to begin call setup. Call setup uses existing telephone
network protocols, which are connection oriented, so TCP is needed. In
contrast, the telephone system has nothing like RAS to allow telephones to
announce their presence, so the H.323 designers were free to use either UDP or
TCP for RAS, and they chose the lower-overhead UDP.
Now that it has bandwidth allocated,
the PC can send a Q.931 SETUP message over the TCP connection. This message
specifies the number of the telephone being called (or the IP address and port,
if a computer is being called). The gatekeeper responds with a Q.931 CALL
PROCEEDING message to acknowledge correct receipt of the request. The
gatekeeper then forwards the SETUP message to the gateway.
The gateway, which is half computer,
half telephone switch, then makes an ordinary telephone call to the desired
(ordinary) telephone. The end office to which the telephone is attached rings
the called telephone and also sends back a Q.931 ALERT message to tell the calling
PC that ringing has begun. When the person at the other end picks up the
telephone, the end office sends back a Q.931 CONNECT message to signal the PC
that it has a connection.
Once the connection has been
established, the gatekeeper is no longer in the loop, although the gateway is,
of course. Subsequent packets bypass the gatekeeper and go directly to the
gateway's IP address. At this point, we just have a bare tube running between
the two parties. This is just a physical layer connection for moving bits, no
more. Neither side knows anything about the other one.
The H.245 protocol is now used to
negotiate the parameters of the call. It uses the H.245 control channel, which
is always open. Each side starts out by announcing its capabilities, for
example, whether it can handle video (H.323 can handle video) or conference
calls, which codecs it supports, etc. Once each side knows what the other one
can handle, two unidirectional data channels are set up and a codec and other
parameters assigned to each one. Since each side may have different equipment,
it is entirely possible that the codecs on the forward and reverse channels are
different. After all negotiations are complete, data flow can begin using RTP.
It is managed using RTCP, which plays a role in congestion control. If video is
present, RTCP handles the audio/video synchronization. The various channels are
shown in Fig. 7-66. When either party hangs up, the Q.931
call signaling channel is used to tear down the connection.
When the call is terminated, the
calling PC contacts the gatekeeper again with a RAS message to release the
bandwidth it has been assigned. Alternatively, it can make another call.
We have not said anything about
quality of service, even though this is essential to making voice over IP a
success. The reason is that QoS falls outside the scope of H.323. If the
underlying network is capable of producing a stable, jitterfree connection from
the calling PC , to the gateway, then the QoS on the call will be good;
otherwise it will not be. The telephone part uses PCM and is always jitter
free.
H.323 was designed by ITU. Many
people in the Internet community saw it as a typical telco product: large,
complex, and inflexible. Consequently, IETF set up a committee to design a
simpler and more modular way to do voice over IP. The major result to date is
the SIP (Session Initiation Protocol), which is described in RFC 3261. This
protocol describes how to set up Internet telephone calls, video conferences,
and other multimedia connections. Unlike H.323, which is a complete protocol
suite, SIP is a single module, but it has been designed to interwork well with
existing Internet applications. For example, it defines telephone numbers as
URLs, so that Web pages can contain them, allowing a click on a link to
initiate a telephone call (the same way the mailto scheme allows a click on a
link to bring up a program to send an e-mail message).
SIP can establish two-party sessions
(ordinary telephone calls), multiparty sessions (where everyone can hear and
speak), and multicast sessions (one sender, many receivers). The sessions may
contain audio, video, or data, the latter being useful for multiplayer
real-time games, for example. SIP just handles setup, management, and
termination of sessions. Other protocols, such as RTP/RTCP, are used for data
transport. SIP is an application-layer protocol and can run over UDP or TCP.
SIP supports a variety of services,
including locating the callee (who may not be at his home machine) and
determining the callee's capabilities, as well as handling the mechanics of
call setup and termination. In the simplest case, SIP sets up a session from
the caller's computer to the callee's computer, so we will examine that case
first.
Telephone numbers in SIP are
represented as URLs using the sip scheme, for example, sip:ilse@cs.university.edu
for a user named Ilse at the host specified by the DNS name cs.university.edu.
SIP URLs may also contain IPv4 addresses, IPv6 address, or actual telephone
numbers.
The SIP protocol is a text-based
protocol modeled on HTTP. One party sends a message in ASCII text consisting of
a method name on the first line, followed by additional lines containing
headers for passing parameters. Many of the headers are taken from MIME to
allow SIP to interwork with existing Internet applications. The six methods
defined by the core specification are listed in Fig. 7-67.
To establish a session, the caller
either creates a TCP connection with the callee and sends an INVITE message
over it or sends the INVITE message in a UDP packet. In both cases, the headers
on the second and subsequent lines describe the structure of the message body,
which contains the caller's capabilities, media types, and formats. If the
callee accepts the call, it responds with an HTTP-type reply code (a
three-digit number using the groups of Fig. 7-42, 200 for acceptance). Following the
reply-code line, the callee also may supply information about its capabilities,
media types, and formats.
Connection is done using a three-way
handshake, so the caller responds with an ACK message to finish the protocol
and confirm receipt of the 200 message.
Either party may request termination
of a session by sending a message containing the BYE method. When the other
side acknowledges it, the session is terminated.
The OPTIONS method is used to query
a machine about its own capabilities. It is typically used before a session is
initiated to find out if that machine is even capable of voice over IP or
whatever type of session is being contemplated.
The REGISTER method relates to SIP's
ability to track down and connect to a user who is away from home. This message
is sent to a SIP location server that keeps track of who is where. That server
can later be queried to find the user's current location. The operation of
redirection is illustrated in Fig. 7-68. Here the caller sends the INVITE message
to a proxy server to hide the possible redirection. The proxy then looks up
where the user is and sends the INVITE message there. It then acts as a relay
for the subsequent messages in the three-way handshake. The LOOKUP and REPLY
messages are not part of SIP; any convenient protocol can be used, depending on
what kind of location server is used.
SIP has a variety of other features
that we will not describe here, including call waiting, call screening,
encryption, and authentication. It also has the ability to place calls from a
computer to an ordinary telephone, if a suitable gateway between the Internet
and telephone system is available.
H.323 and SIP have many similarities
but also some differences. Both allow two-party and multiparty calls using both
computers and telephones as end points. Both support parameter negotiation,
encryption, and the RTP/RTCP protocols. A summary of the similarities and
differences is given in Fig. 7-69.
Although the feature sets are
similar, the two protocols differ widely in philosophy. H.323 is a typical,
heavyweight, telephone-industry standard, specifying the complete protocol
stack and defining precisely what is allowed and what is forbidden. This approach
leads to very well defined protocols in each layer, easing the task of
interoperability. The price paid is a large, complex, and rigid standard that
is difficult to adapt to future applications.
In contrast, SIP is a typical
Internet protocol that works by exchanging short lines of ASCII text. It is a
lightweight module that interworks well with other Internet protocols but less
well with existing telephone system signaling protocols. Because the IETF model
of voice over IP is highly modular, it is flexible and can be adapted to new
applications easily. The downside is potential interoperability problems,
although these are addressed by frequent meetings where different implementers
get together to test their systems.
Voice over IP is an up-and-coming
topic. Consequently, there are several books on the subject already. A few
examples are (Collins, 2001; Davidson and Peters, 2000; Kumar et al., 2001; and
Wright, 2001). The May/June 2002 issue of Internet Computing has several
articles on this topic.
No comments:
Post a Comment
silahkan membaca dan berkomentar