7.2
Electronic Mail
Electronic mail, or e-mail, as it is
known to its many fans, has been around for over two decades. Before 1990, it
was mostly used in academia. During the 1990s, it became known to the public at
large and grew exponentially to the point where the number of e-mails sent per
day now is vastly more than the number of snail mail (i.e., paper) letters.
E-mail, like most other forms of
communication, has its own conventions and styles. In particular, it is very
informal and has a low threshold of use. People who would never dream of
calling up or even writing a letter to a Very Important Person do not hesitate
for a second to send a sloppily-written e-mail.
E-mail is full of jargon such as BTW
(By The Way), ROTFL (Rolling On The Floor Laughing), and IMHO (In My Humble
Opinion). Many people also use little ASCII symbols called smileys or emoticons
in their e-mail. A few of the more interesting ones are reproduced in Fig. 7-6. For most, rotating the book 90 degrees
clockwise will make them clearer. For a minibook giving over 650 smileys, see
(Sanderson and Dougherty, 1993).
The first e-mail systems simply
consisted of file transfer protocols, with the convention that the first line
of each message (i.e., file) contained the recipient's address. As time went
on, the limitations of this approach became more obvious.
Some of the complaints were as
follows:
- Sending a message to a group of people was inconvenient. Managers often need this facility to send memos to all their subordinates.
- Messages had no internal structure, making computer processing difficult. For example, if a forwarded message was included in the body of another message, extracting the forwarded part from the received message was difficult.
- The originator (sender) never knew if a message arrived or not.
- If someone was planning to be away on business for several weeks and wanted all incoming e-mail to be handled by his secretary, this was not easy to arrange.
- The user interface was poorly integrated with the transmission system requiring users first to edit a file, then leave the editor and invoke the file transfer program.
- It was not possible to create and send messages containing a mixture of text, drawings, facsimile, and voice.
As experience was gained, more
elaborate e-mail systems were proposed. In 1982, the ARPANET e-mail proposals
were published as RFC 821 (transmission protocol) and RFC 822 (message format).
Minor revisions, RFC 2821 and RFC 2822, have become Internet standards, but
everyone still refers to Internet e-mail as RFC 822.
In 1984, CCITT drafted its X.400
recommendation. After two decades of competition, e-mail systems based on RFC
822 are widely used, whereas those based on X.400 have disappeared. How a
system hacked together by a handful of computer science graduate students beat
an official international standard strongly backed by all the PTTs in the
world, many governments, and a substantial part of the computer industry brings
to mind the Biblical story of David and Goliath.
The reason for RFC 822's success is
not that it is so good, but that X.400 was so poorly designed and so complex
that nobody could implement it well. Given a choice between a simple-minded,
but working, RFC 822-based e-mail system and a supposedly truly wonderful, but
nonworking, X.400 e-mail system, most organizations chose the former. Perhaps
there is a lesson lurking in there somewhere. Consequently, our discussion of
e-mail will focus on the Internet e-mail system.
In this section we will provide an
overview of what e-mail systems can do and how they are organized. They
normally consist of two subsystems: the user agents, which allow people to read
and send e-mail, and the message transfer agents, which move the messages from
the source to the destination. The user agents are local programs that provide
a command-based, menu-based, or graphical method for interacting with the
e-mail system. The message transfer agents are typically system daemons, that
is, processes that run in the background. Their job is to move e-mail through
the system.
Typically, e-mail systems support
five basic functions. Let us take a look at them.
Composition refers to the process of
creating messages and answers. Although any text editor can be used for the
body of the message, the system itself can provide assistance with addressing
and the numerous header fields attached to each message. For example, when
answering a message, the e-mail system can extract the originator's address
from the incoming e-mail and automatically insert it into the proper place in
the reply.
Transfer refers to moving messages
from the originator to the recipient. In large part, this requires establishing
a connection to the destination or some intermediate machine, outputting the
message, and releasing the connection. The e-mail system should do this
automatically, without bothering the user.
Reporting has to do with telling the
originator what happened to the message. Was it delivered? Was it rejected? Was
it lost? Numerous applications exist in which confirmation of delivery is
important and may even have legal significance (''Well, Your Honor, my e-mail
system is not very reliable, so I guess the electronic subpoena just got lost
somewhere'').
Displaying incoming messages is
needed so people can read their e-mail. Sometimes conversion is required or a
special viewer must be invoked, for example, if the message is a PostScript
file or digitized voice. Simple conversions and formatting are sometimes
attempted as well.
Disposition is the final step and
concerns what the recipient does with the message after receiving it.
Possibilities include throwing it away before reading, throwing it away after
reading, saving it, and so on. It should also be possible to retrieve and
reread saved messages, forward them, or process them in other ways.
In addition to these basic services,
some e-mail systems, especially internal corporate ones, provide a variety of
advanced features. Let us just briefly mention a few of these. When people move
or when they are away for some period of time, they may want their e-mail
forwarded, so the system should be able to do this automatically.
Most systems allow users to create mailboxes
to store incoming e-mail. Commands are needed to create and destroy mailboxes,
inspect the contents of mailboxes, insert and delete messages from mailboxes,
and so on.
Corporate managers often need to
send a message to each of their subordinates, customers, or suppliers. This
gives rise to the idea of a mailing list, which is a list of e-mail addresses.
When a message is sent to the mailing list, identical copies are delivered to
everyone on the list.
Other advanced features are carbon
copies, blind carbon copies, high-priority e-mail, secret (i.e., encrypted)
e-mail, alternative recipients if the primary one is not currently available,
and the ability for secretaries to read and answer their bosses' e-mail.
E-mail is now widely used within
industry for intracompany communication. It allows far-flung employees to
cooperate on complex projects, even over many time zones. By eliminating most
cues associated with rank, age, and gender, e-mail debates tend to focus on
ideas, not on corporate status. With e-mail, a brilliant idea from a summer student
can have more impact than a dumb one from an executive vice president.
A key idea in e-mail systems is the
distinction between the envelope and its contents. The envelope encapsulates
the message. It contains all the information needed for transporting the
message, such as the destination address, priority, and security level, all of
which are distinct from the message itself. The message transport agents use
the envelope for routing, just as the post office does.
The message inside the envelope
consists of two parts: the header and the body. The header contains control
information for the user agents. The body is entirely for the human recipient.
Envelopes and messages are illustrated in Fig. 7-7.
E-mail systems have two basic parts,
as we have seen: the user agents and the message transfer agents. In this
section we will look at the user agents. A user agent is normally a program
(sometimes called a mail reader) that accepts a variety of commands for
composing, receiving, and replying to messages, as well as for manipulating
mailboxes. Some user agents have a fancy menu- or icon-driven interface that requires
a mouse, whereas others expect 1-character commands from the keyboard.
Functionally, these are the same. Some systems are menu- or icon-driven but
also have keyboard shortcuts.
To send an e-mail message, a user
must provide the message, the destination address, and possibly some other
parameters. The message can be produced with a free-standing text editor, a
word processing program, or possibly with a specialized text editor built into
the user agent. The destination address must be in a format that the user agent
can deal with. Many user agents expect addresses of the form user@dns-address.
However, it is worth noting that
other forms of addressing exist. In particular, X.400 addresses look radically
different from DNS addresses. They are composed of attribute = value pairs
separated by slashes, for example,
/C=US/ST=MASSACHUSETTS/L=CAMBRIDGE/PA=360
MEMORIAL DR./CN=KEN SMITH/
This address specifies a country,
state, locality, personal address and a common name (Ken Smith). Many other
attributes are possible, so you can send e-mail to someone whose exact e-mail
address you do not know, provided you know enough other attributes (e.g.,
company and job title). Although X.400 names are considerably less convenient
than DNS names, most e-mail systems have aliases (sometimes called nicknames)
that allow users to enter or select a person's name and get the correct e-mail
address. Consequently, even with X.400 addresses, it is usually not necessary
to actually type in these strange strings.
Most e-mail systems support mailing
lists, so that a user can send the same message to a list of people with a
single command. If the mailing list is maintained locally, the user agent can
just send a separate message to each intended recipient. However, if the list
is maintained remotely, then messages will be expanded there. For example, if a
group of bird watchers has a mailing list called birders installed on meadowlark.arizona.edu,
then any message sent to birders@meadowlark.arizona.edu will be routed to the
University of Arizona and expanded there into individual messages to all the
mailing list members, wherever in the world they may be. Users of this mailing
list cannot tell that it is a mailing list. It could just as well be the
personal mailbox of Prof. Gabriel O. Birders.
Typically, when a user agent is
started up, it looks at the user's mailbox for incoming e-mail before
displaying anything on the screen. Then it may announce the number of messages
in the mailbox or display a one-line summary of each one and wait for a
command.
As an example of how a user agent
works, let us take a look at a typical mail scenario. After starting up the
user agent, the user asks for a summary of his e-mail. A display like that of Fig. 7-8 then appears on the screen. Each line
refers to one message. In this example, the mailbox contains eight messages.
Each line of the display contains
several fields extracted from the envelope or header of the corresponding
message. In a simple e-mail system, the choice of fields displayed is built
into the program. In a more sophisticated system, the user can specify which
fields are to be displayed by providing a user profile, a file describing the
display format. In this basic example, the first field is the message number.
The second field, Flags, can contain a K, meaning that the message is not new
but was read previously and kept in the mailbox; an A, meaning that the message
has already been answered; and/or an F, meaning that the message has been
forwarded to someone else. Other flags are also possible.
The third field tells how long the
message is, and the fourth one tells who sent the message. Since this field is
simply extracted from the message, this field may contain first names, full
names, initials, login names, or whatever else the sender chooses to put there.
Finally, the Subject field gives a brief summary of what the message is about.
People who fail to include a Subject field often discover that responses to
their e-mail tend not to get the highest priority.
After the headers have been
displayed, the user can perform any of several actions, such as displaying a
message, deleting a message, and so on. The older systems were text based and
typically used one-character commands for performing these tasks, such as T
(type message), A (answer message), D (delete message), and F (forward
message). An argument specified the message in question. More recent systems
use graphical interfaces. Usually, the user selects a message with the mouse
and then clicks on an icon to type, answer, delete, or forward it.
E-mail has come a long way from the
days when it was just file transfer. Sophisticated user agents make managing a
large volume of e-mail possible. For people who receive and send thousands of
messages a year, such tools are invaluable.
Let us now turn from the user
interface to the format of the e-mail messages themselves. First we will look
at basic ASCII e-mail using RFC 822. After that, we will look at multimedia
extensions to RFC 822.
Messages consist of a primitive
envelope (described in RFC 821), some number of header fields, a blank line,
and then the message body. Each header field (logically) consists of a single
line of ASCII text containing the field name, a colon, and, for most fields, a
value. RFC 822 was designed decades ago and does not clearly distinguish the
envelope fields from the header fields. Although it was revised in RFC 2822,
completely redoing it was not possible due to its widespread usage. In normal
usage, the user agent builds a message and passes it to the message transfer
agent, which then uses some of the header fields to construct the actual
envelope, a somewhat old-fashioned mixing of message and envelope.
The principal header fields related
to message transport are listed in Fig. 7-9. The To: field gives the DNS address of
the primary recipient. Having multiple recipients is also allowed. The Cc:
field gives the addresses of any secondary recipients. In terms of delivery,
there is no distinction between the primary and secondary recipients. It is
entirely a psychological difference that may be important to the people
involved but is not important to the mail system. The term Cc: (Carbon copy) is
a bit dated, since computers do not use carbon paper, but it is well
established. The Bcc: (Blind carbon copy) field is like the Cc: field, except that
this line is deleted from all the copies sent to the primary and secondary
recipients. This feature allows people to send copies to third parties without
the primary and secondary recipients knowing this.
The next two fields, From: and Sender:,
tell who wrote and sent the message, respectively. These need not be the same.
For example, a business executive may write a message, but her secretary may be
the one who actually transmits it. In this case, the executive would be listed
in the From: field and the secretary in the Sender: field. The From: field is
required, but the Sender: field may be omitted if it is the same as the From:
field. These fields are needed in case the message is undeliverable and must be
returned to the sender.
A line containing Received: is added
by each message transfer agent along the way. The line contains the agent's
identity, the date and time the message was received, and other information
that can be used for finding bugs in the routing system.
The Return-Path: field is added by
the final message transfer agent and was intended to tell how to get back to
the sender. In theory, this information can be gathered from all the Received:
headers (except for the name of the sender's mailbox), but it is rarely filled
in as such and typically just contains the sender's address.
In addition to the fields of Fig. 7-9, RFC 822 messages may also contain a
variety of header fields used by the user agents or human recipients. The most
common ones are listed in Fig. 7-10. Most of these are self-explanatory, so
we will not go into all of them in detail.
The Reply-To: field is sometimes
used when neither the person composing the message nor the person sending the
message wants to see the reply. For example, a marketing manager writes an e-mail
message telling customers about a new product. The message is sent by a
secretary, but the Reply-To: field lists the head of the sales department, who
can answer questions and take orders. This field is also useful when the sender
has two e-mail accounts and wants the reply to go to the other one.
The RFC 822 document explicitly says
that users are allowed to invent new headers for their own private use,
provided that these headers start with the string X-. It is guaranteed that no
future headers will use names starting with X-, to avoid conflicts between
official and private headers. Sometimes wiseguy undergraduates make up fields
like X-Fruit-of-the-Day: or X-Disease-of-the-Week:, which are legal, although
not always illuminating.
After the headers comes the message
body. Users can put whatever they want here. Some people terminate their
messages with elaborate signatures, including simple ASCII cartoons, quotations
from greater and lesser authorities, political statements, and disclaimers of
all kinds (e.g., The XYZ Corporation is not responsible for my opinions; in
fact, it cannot even comprehend them).
In the early days of the ARPANET,
e-mail consisted exclusively of text messages written in English and expressed
in ASCII. For this environment, RFC 822 did the job completely: it specified
the headers but left the content entirely up to the users. Nowadays, on the
worldwide Internet, this approach is no longer adequate. The problems include
sending and receiving
- Messages in languages with accents (e.g., French and German).
- Messages in non-Latin alphabets (e.g., Hebrew and Russian).
- Messages in languages without alphabets (e.g., Chinese and Japanese).
- Messages not containing text at all (e.g., audio or images).
A solution was proposed in RFC 1341
and updated in RFCs 2045–2049. This solution, called MIME (Multipurpose
Internet Mail Extensions) is now widely used. We will now describe it. For
additional information about MIME, see the RFCs.
The basic idea of MIME is to
continue to use the RFC 822 format, but to add structure to the message body
and define encoding rules for non-ASCII messages. By not deviating from RFC
822, MIME messages can be sent using the existing mail programs and protocols.
All that has to be changed are the sending and receiving programs, which users
can do for themselves.
MIME defines five new message
headers, as shown in Fig. 7-11. The first of these simply tells the
user agent receiving the message that it is dealing with a MIME message, and
which version of MIME it uses. Any message not containing a MIME-Version:
header is assumed to be an English plaintext message and is processed as such.
The Content-Description: header is
an ASCII string telling what is in the message. This header is needed so the
recipient will know whether it is worth decoding and reading the message. If
the string says: ''Photo of Barbara's hamster'' and the person getting the
message is not a big hamster fan, the message will probably be discarded rather
than decoded into a high-resolution color photograph.
The Content-Id: header identifies
the content. It uses the same format as the standard Message-Id: header.
The Content-Transfer-Encoding: tells
how the body is wrapped for transmission through a network that may object to
most characters other than letters, numbers, and punctuation marks. Five
schemes (plus an escape to new schemes) are provided. The simplest scheme is
just ASCII text. ASCII characters use 7 bits and can be carried directly by the
e-mail protocol provided that no line exceeds 1000 characters.
The next simplest scheme is the same
thing, but using 8-bit characters, that is, all values from 0 up to and
including 255. This encoding scheme violates the (original) Internet e-mail
protocol but is used by some parts of the Internet that implement some
extensions to the original protocol. While declaring the encoding does not make
it legal, having it explicit may at least explain things when something goes
wrong. Messages using the 8-bit encoding must still adhere to the standard
maximum line length.
Even worse are messages that use
binary encoding. These are arbitrary binary files that not only use all 8 bits
but also do not even respect the 1000-character line limit. Executable programs
fall into this category. No guarantee is given that messages in binary will
arrive correctly, but some people try anyway.
The correct way to encode binary
messages is to use base64 encoding, sometimes called ASCII armor. In this
scheme, groups of 24 bits are broken up into four 6-bit units, with each unit
being sent as a legal ASCII character. The coding is ''A'' for 0, ''B'' for 1,
and so on, followed by the 26 lower-case letters, the ten digits, and finally +
and / for 62 and 63, respectively. The == and = sequences indicate that the
last group contained only 8 or 16 bits, respectively. Carriage returns and line
feeds are ignored, so they can be inserted at will to keep the lines short
enough. Arbitrary binary text can be sent safely using this scheme.
For messages that are almost
entirely ASCII but with a few non-ASCII characters, base64 encoding is somewhat
inefficient. Instead, an encoding known as quoted-printable encoding is used.
This is just 7-bit ASCII, with all the characters above 127 encoded as an equal
sign followed by the character's value as two hexadecimal digits.
In summary, binary data should be
sent encoded in base64 or quoted-printable form. When there are valid reasons
not to use one of these schemes, it is possible to specify a user-defined
encoding in the Content-Transfer-Encoding: header.
The last header shown in Fig. 7-11 is really the most interesting one. It
specifies the nature of the message body. Seven types are defined in RFC 2045,
each of which has one or more subtypes. The type and subtype are separated by a
slash, as in
Content-Type:
video/mpeg
The subtype must be given explicitly
in the header; no defaults are provided. The initial list of types and subtypes
specified in RFC 2045 is given in Fig. 7-12. Many new ones have been added since
then, and additional entries are being added all the time as the need arises.
Let us now go briefly through the
list of types. The text type is for straight ASCII text. The text/plain
combination is for ordinary messages that can be displayed as received, with no
encoding and no further processing. This option allows ordinary messages to be
transported in MIME with only a few extra headers.
The text/enriched subtype allows a
simple markup language to be included in the text. This language provides a
system-independent way to express boldface, italics, smaller and larger point
sizes, indentation, justification, sub- and superscripting, and simple page
layout. The markup language is based on SGML, the Standard Generalized Markup Language
also used as the basis for the World Wide Web's HTML. For example, the message
The
<bold> time </bold> has come the <italic> walrus
</italic> said ...
would be displayed as
The time has come the walrus said
...
It is up to the receiving system to choose
the appropriate rendition. If boldface and italics are available, they can be
used; otherwise, colors, blinking, underlining, reverse video, etc., can be
used for emphasis. Different systems can, and do, make different choices.
When the Web became popular, a new
subtype text/html was added (in RFC 2854) to allow Web pages to be sent in RFC
822 e-mail. A subtype for the extensible markup language, text/xml, is defined
in RFC 3023..
The next MIME type is image, which
is used to transmit still pictures. Many formats are widely used for storing
and transmitting images nowadays, both with and without compression. Two of
these, GIF and JPEG, are built into nearly all browsers, but many others exist
as well and have been added to the original list.
The audio and video types are for
sound and moving pictures, respectively. Please note that video includes only
the visual information, not the soundtrack. If a movie with sound is to be
transmitted, the video and audio portions may have to be transmitted
separately, depending on the encoding system used. The first video format
defined was the one devised by the modestly-named Moving Picture Experts Group
(MPEG), but others have been added since. In addition to audio/basic, a new
audio type, audio/mpeg was added in RFC 3003 to allow people to e-mail MP3
audio files.
The application type is a catchall
for formats that require external processing not covered by one of the other
types. An octet-stream is just a sequence of uninterpreted bytes. Upon
receiving such a stream, a user agent should probably display it by suggesting
to the user that it be copied to a file and prompting for a file name.
Subsequent processing is then up to the user.
The other defined subtype is postscript,
which refers to the PostScript language defined by Adobe Systems and widely
used for describing printed pages. Many printers have built-in PostScript
interpreters. Although a user agent can just call an external PostScript
interpreter to display incoming PostScript files, doing so is not without some
danger. PostScript is a full-blown programming language. Given enough time, a
sufficiently masochistic person could write a C compiler or a database
management system in PostScript. Displaying an incoming PostScript message is
done by executing the PostScript program contained in it. In addition to
displaying some text, this program can read, modify, or delete the user's
files, and have other nasty side effects.
The message type allows one message
to be fully encapsulated inside another. This scheme is useful for forwarding
e-mail, for example. When a complete RFC 822 message is encapsulated inside an
outer message, the rfc822 subtype should be used.
The partial subtype makes it
possible to break an encapsulated message into pieces and send them separately
(for example, if the encapsulated message is too long). Parameters make it
possible to reassemble all the parts at the destination in the correct order.
Finally, the external-body subtype
can be used for very long messages (e.g., video films). Instead of including
the MPEG file in the message, an FTP address is given and the receiver's user
agent can fetch it over the network at the time it is needed. This facility is
especially useful when sending a movie to a mailing list of people, only a few
of whom are expected to view it (think about electronic junk mail containing
advertising videos).
The final type is multipart, which
allows a message to contain more than one part, with the beginning and end of
each part being clearly delimited. The mixed subtype allows each part to be
different, with no additional structure imposed. Many e-mail programs allow the
user to provide one or more attachments to a text message. These attachments
are sent using the multipart type.
In contrast to multipart, the alternative
subtype, allows the same message to be included multiple times but expressed in
two or more different media. For example, a message could be sent in plain
ASCII, in enriched text, and in PostScript. A properly-designed user agent
getting such a message would display it in PostScript if possible. Second
choice would be enriched text. If neither of these were possible, the flat
ASCII text would be displayed. The parts should be ordered from simplest to most
complex to help recipients with pre-MIME user agents make some sense of the
message (e.g., even a pre-MIME user can read flat ASCII text).
The alternative subtype can also be
used for multiple languages. In this context, the Rosetta Stone can be thought
of as an early multipart/alternative message.
A multimedia example is shown in Fig. 7-13. Here a birthday greeting is
transmitted both as text and as a song. If the receiver has an audio
capability, the user agent there will fetch the sound file, birthday.snd, and
play it. If not, the lyrics are displayed on the screen in stony silence. The
parts are delimited by two hyphens followed by a (software-generated) string
specified in the boundary parameter.
Note that the Content-Type header
occurs in three positions within this example. At the top level, it indicates
that the message has multiple parts. Within each part, it gives the type and
subtype of that part. Finally, within the body of the second part, it is
required to tell the user agent what kind of an external file it is to fetch.
To indicate this slight difference in usage, we have used lower case letters
here, although all headers are case insensitive. The content-transfer-encoding
is similarly required for any external body that is not encoded as 7-bit ASCII.
Getting back to the subtypes for
multipart messages, two more possibilities exist. The parallel subtype is used
when all parts must be ''viewed'' simultaneously. For example, movies often
have an audio channel and a video channel. Movies are more effective if these
two channels are played back in parallel, instead of consecutively.
Finally, the digest subtype is used
when many messages are packed together into a composite message. For example,
some discussion groups on the Internet collect messages from subscribers and
then send them out to the group as a single multipart/digest message.
The message transfer system is
concerned with relaying messages from the originator to the recipient. The
simplest way to do this is to establish a transport connection from the source
machine to the destination machine and then just transfer the message. After
examining how this is normally done, we will examine some situations in which
this does not work and what can be done about them.
Within the Internet, e-mail is
delivered by having the source machine establish a TCP connection to port 25 of
the destination machine. Listening to this port is an e-mail daemon that speaks
SMTP (Simple Mail Transfer Protocol). This daemon accepts incoming connections
and copies messages from them into the appropriate mailboxes. If a message
cannot be delivered, an error report containing the first part of the
undeliverable message is returned to the sender.
SMTP is a simple ASCII protocol.
After establishing the TCP connection to port 25, the sending machine,
operating as the client, waits for the receiving machine, operating as the
server, to talk first. The server starts by sending a line of text giving its
identity and telling whether it is prepared to receive mail. If it is not, the
client releases the connection and tries again later.
If the server is willing to accept
e-mail, the client announces whom the e-mail is coming from and whom it is
going to. If such a recipient exists at the destination, the server gives the
client the go-ahead to send the message. Then the client sends the message and
the server acknowledges it. No checksums are needed because TCP provides a
reliable byte stream. If there is more e-mail, that is now sent. When all the
e-mail has been exchanged in both directions, the connection is released. A
sample dialog for sending the message of Fig. 7-13, including the numerical codes used by
SMTP, is shown in Fig. 7-14. The lines sent by the client are
marked C:. Those sent by the server are marked S:.
A few comments about Fig. 7-14 may be helpful. The first command from
the client is indeed HELO. Of the various four-character abbreviations for HELLO,
this one has numerous advantages over its biggest competitor. Why all the
commands had to be four characters has been lost in the mists of time.
In Fig. 7-14, the message is sent to only one
recipient, so only one RCPT command is used. Such commands are allowed to send
a single message to multiple receivers. Each one is individually acknowledged
or rejected. Even if some recipients are rejected (because they do not exist at
the destination), the message can be sent to the other ones.
Finally, although the syntax of the
four-character commands from the client is rigidly specified, the syntax of the
replies is less rigid. Only the numerical code really counts. Each
implementation can put whatever string it wants after the code.
To get a better feel for how SMTP
and some of the other protocols described in this chapter work, try them out.
In all cases, first go to a machine connected to the Internet. On a UNIX
system, in a shell, type
telnet
mail.isp.com 25
substituting the DNS name of your
ISP's mail server for mail.isp.com. On a Windows system, click on Start, then
Run, and type the command in the dialog box. This command will establish a
telnet (i.e., TCP) connection to port 25 on that machine. Port 25 is the SMTP
port (see Fig. 6-27 for some common ports). You will
probably get a response something like this:
Trying
192.30.200.66...
Connected
to mail.isp.com
Escape
character is '^]'.
220
mail.isp.com Smail #74 ready at Thu, 25 Sept 2002 13:26 +0200
The first three lines are from
telnet telling you what it is doing. The last line is from the SMTP server on
the remote machine announcing its willingness to talk to you and accept e-mail.
To find out what commands it accepts, type
HELP
From this point on, a command
sequence such as the one in Fig. 7-14 is possible, starting with the client's
HELO command.
It is worth noting that the use of
lines of ASCII text for commands is not an accident. Most Internet protocols
work this way. Using ASCII text makes the protocols easy to test and debug.
They can be tested by sending commands manually, as we saw above, and dumps of
the messages are easy to read.
Even though the SMTP protocol is
completely well defined, a few problems can still arise. One problem relates to
message length. Some older implementations cannot handle messages exceeding 64
KB. Another problem relates to timeouts. If the client and server have
different timeouts, one of them may give up while the other is still busy,
unexpectedly terminating the connection. Finally, in rare situations, infinite
mailstorms can be triggered. For example, if host 1 holds mailing list A and
host 2 holds mailing list B and each list contains an entry for the other one,
then a message sent to either list could generate a never-ending amount of
e-mail traffic unless somebody checks for it.
To get around some of these
problems, extended SMTP (ESMTP) has been defined in RFC 2821. Clients wanting
to use it should send an EHLO message instead of HELO initially. If this is
rejected, then the server is a regular SMTP server, and the client should
proceed in the usual way. If the EHLO is accepted, then new commands and
parameters are allowed.
Up until now, we have assumed that
all users work on machines that are capable of sending and receiving e-mail. As
we saw, e-mail is delivered by having the sender establish a TCP connection to
the receiver and then ship the e-mail over it. This model worked fine for
decades when all ARPANET (and later Internet) hosts were, in fact, on-line all
the time to accept TCP connections.
However, with the advent of people
who access the Internet by calling their ISP over a modem, it breaks down. The
problem is this: what happens when Elinor wants to send Carolyn e-mail and
Carolyn is not currently on-line? Elinor cannot establish a TCP connection to
Carolyn and thus cannot run the SMTP protocol.
One solution is to have a message transfer
agent on an ISP machine accept e-mail for its customers and store it in their
mailboxes on an ISP machine. Since this agent can be on-line all the time,
e-mail can be sent to it 24 hours a day.
Unfortunately, this solution creates
another problem: how does the user get the e-mail from the ISP's message
transfer agent? The solution to this problem is to create another protocol that
allows user transfer agents (on client PCs) to contact the message transfer
agent (on the ISP's machine) and allow e-mail to be copied from the ISP to the
user. One such protocol is POP3 (Post Office Protocol Version 3), which is
described in RFC 1939.
The situation that used to hold
(both sender and receiver having a permanent connection to the Internet) is
illustrated in Fig. 7-15(a). A situation in which the sender is
(currently) on-line but the receiver is not is illustrated in Fig. 7-15(b).
Figure 7-15. (a) Sending and reading mail when the receiver
has a permanent Internet connection and the user agent runs on the same machine
as the message transfer agent. (b) Reading e-mail when the receiver has a
dial-up connection to an ISP.
POP3 begins when the user starts the
mail reader. The mail reader calls up the ISP (unless there is already a
connection) and establishes a TCP connection with the message transfer agent at
port 110. Once the connection has been established, the POP3 protocol goes
through three states in sequence:
- Authorization.
- Transactions.
- Update.
The authorization state deals with
having the user log in. The transaction state deals with the user collecting
the e-mails and marking them for deletion from the mailbox. The update state
actually causes the e-mails to be deleted.
This behavior can be observed by
typing something like:
telnet
mail.isp.com 110
where mail.isp.com represents the
DNS name of your ISP's mail server. Telnet establishes a TCP connection to port
110, on which the POP3 server listens. Upon accepting the TCP connection, the
server sends an ASCII message announcing that it is present. Usually, it begins
with +OK followed by a comment. An example scenario is shown in Fig. 7-16 starting after the TCP connection has
been established. As before, the lines marked C: are from the client (user) and
those marked S: are from the server (message transfer agent on the ISP's
machine).
During the authorization state, the
client sends over its user name and then its password. After a successful
login, the client can then send over the LIST com
mand, which causes the server to
list the contents of the mailbox, one message per line, giving the length of
that message. The list is terminated by a period.
Then the client can retrieve
messages using the RETR command and mark them for deletion with DELE. When all
messages have been retrieved (and possibly marked for deletion), the client
gives the QUIT command to terminate the transaction state and enter the update
state. When the server has deleted all the messages, it sends a reply and breaks
the TCP connection.
While it is true that the POP3
protocol supports the ability to download a specific message or set of messages
and leave them on the server, most e-mail programs just download everything and
empty the mailbox. This behavior means that in practice, the only copy is on
the user's hard disk. If that crashes, all e-mail may be lost permanently.
Let us now briefly summarize how
e-mail works for ISP customers. Elinor creates a message for Carolyn using some
e-mail program (i.e., user agent) and clicks on an icon to send it. The e-mail
program hands the message over to the message transfer agent on Elinor's host.
The message transfer agent sees that it is directed to carolyn@xyz.com so it
uses DNS to look up the MX record for xyz.com (where xyz.com is Carolyn's ISP).
This query returns the DNS name of xyz.com's mail server. The message transfer
agent now looks up the IP address of this machine using DNS again, for example,
using gethostbyname. It then establishes a TCP connection to the SMTP server on
port 25 of this machine. Using an SMTP command sequence analogous to that of Fig. 7-14, it transfers the message to Carolyn's
mailbox and breaks the TCP connection.
In due course of time, Carolyn boots
up her PC, connects to her ISP, and starts her e-mail program. The e-mail
program establishes a TCP connection to the POP3 server at port 110 of the
ISP's mail server machine. The DNS name or IP address of this machine is
typically configured when the e-mail program is installed or the subscription
to the ISP is made. After the TCP connection has been established, Carolyn's
e-mail program runs the POP3 protocol to fetch the contents of the mailbox to
her hard disk using commands similar to those of Fig. 7-16. Once all the e-mail has been
transferred, the TCP connection is released. In fact, the connection to the ISP
can also be broken now, since all the e-mail is on Carolyn's hard disk. Of
course, to send a reply, the connection to the ISP will be needed again, so it
is not generally broken right after fetching the e-mail.
For a user with one e-mail account
at one ISP that is always accessed from one PC, POP3 works fine and is widely
used due to its simplicity and robustness. However, it is a computer-industry
truism that as soon as something works well, somebody will start demanding more
features (and getting more bugs). That happened with e-mail, too. For example,
many people have a single e-mail account at work or school and want to access
it from work, from their home PC, from their laptop when on business trips, and
from cybercafes when on so-called vacation. While POP3 allows this, since it
normally downloads all stored messages at each contact, the result is that the
user's e-mail quickly gets spread over multiple machines, more or less at
random, some of them not even the user's.
This disadvantage gave rise to an
alternative final delivery protocol, IMAP (Internet Message Access Protocol),
which is defined in RFC 2060. Unlike POP3, which basically assumes that the
user will clear out the mailbox on every contact and work off-line after that,
IMAP assumes that all the e-mail will remain on the server indefinitely in
multiple mailboxes. IMAP provides extensive mechanisms for reading messages or
even parts of messages, a feature useful when using a slow modem to read the
text part of a multipart message with large audio and video attachments. Since
the working assumption is that messages will not be transferred to the user's
computer for permanent storage, IMAP provides mechanisms for creating,
destroying, and manipulating multiple mailboxes on the server. In this way a
user can maintain a mailbox for each correspondent and move messages there from
the inbox after they have been read.
IMAP has many features, such as the
ability to address mail not by arrival number as is done in Fig. 7-8, but by using attributes (e.g., give me
the first message from Bobbie). Unlike POP3, IMAP can also accept outgoing
e-mail for shipment to the destination as well as deliver incoming e-mail.
The general style of the IMAP
protocol is similar to that of POP3 as shown in Fig. 7-16, except that are there dozens of
commands. The IMAP server listens to port 143. A comparison of POP3 and IMAP is
given in Fig. 7-17. It should be noted, however, that not
every ISP supports both protocols and not every e-mail program supports both
protocols. Thus, when choosing an e-mail program, it is important to find out
which protocol(s) it supports and make sure the ISP supports at least one of
them.
Independently of whether POP3 or
IMAP is used, many systems provide hooks for additional processing of incoming
e-mail. An especially valuable feature for many e-mail users is the ability to
set up filters. These are rules that are checked when e-mail comes in or when
the user agent is started. Each rule specifies a condition and an action. For
example, a rule could say that any message received from the boss goes to
mailbox number 1, any message from a select group of friends goes to mailbox
number 2, and any message containing certain objectionable words in the Subject
line is discarded without comment.
Some ISPs provide a filter that
automatically categorizes incoming e-mail as either important or spam (junk
e-mail) and stores each message in the corresponding mailbox. Such filters
typically work by first checking to see if the source is a known spammer. Then
they usually examine the subject line. If hundreds of users have just received
a message with the same subject line, it is probably spam. Other techniques are
also used for spam detection.
Another delivery feature often
provided is the ability to (temporarily) forward incoming e-mail to a different
address. This address can even be a computer operated by a commercial paging
service, which then pages the user by radio or satellite, displaying the Subject:
line on his pager.
Still another common feature of
final delivery is the ability to install a vacation daemon. This is a program
that examines each incoming message and sends the sender an insipid reply such
as
Hi.
I'm on vacation. I'll be back on the 24th of August. Have a nice summer.
Such replies can also specify how to
handle urgent matters in the interim, other people to contact for specific
problems, etc. Most vacation daemons keep track of whom they have sent canned
replies to and refrain from sending the same person a second reply. The good
ones also check to see if the incoming message was sent to a mailing list, and
if so, do not send a canned reply at all. (People who send messages to large
mailing lists during the summer probably do not want to get hundreds of replies
detailing everyone's vacation plans.)
The author once ran into an extreme
form of delivery processing when he sent an e-mail message to a person who
claims to get 600 messages a day. His identity will not be disclosed here, lest
half the readers of this book also send him e-mail. Let us call him John.
John has installed an e-mail robot
that checks every incoming message to see if it is from a new correspondent. If
so, it sends back a canned reply explaining that John can no longer personally
read all his e-mail. Instead, he has produced a personal FAQ (Frequently Asked
Questions) document that answers many questions he is commonly asked. Normally,
newsgroups have FAQs, not people.
John's FAQ gives his address, fax,
and telephone numbers and tells how to contact his company. It explains how to
get him as a speaker and describes where to get his papers and other documents.
It also provides pointers to software he has written, a conference he is
running, a standard he is the editor of, and so on. Perhaps this approach is
necessary, but maybe a personal FAQ is the ultimate status symbol.
One final topic worth mentioning is
Webmail. Some Web sites, for example, Hotmail and Yahoo, provide e-mail service
to anyone who wants it. They work as follows. They have normal message transfer
agents listening to port 25 for incoming SMTP connections. To contact, say,
Hotmail, you have to acquire their DNS MX record, for example, by typing
host
–a –v hotmail.com
on a UNIX system. Suppose that the
mail server is called mx10.hotmail.com, then by typing
telnet
mx10.hotmail.com 25
you can establish a TCP connection
over which SMTP commands can be sent in the usual way. So far, nothing unusual,
except that these big servers are often busy, so it may take several attempts
to get a TCP connection accepted.
The interesting part is how e-mail
is delivered. Basically, when the user goes to the e-mail Web page, a form is
presented in which the user is asked for a login name and password. When the
user clicks on Sign In, the login name and password are sent to the server,
which then validates them. If the login is successful, the server finds the
user's mailbox and builds a listing similar to that of Fig. 7-8, only formatted as a Web page in HTML.
The Web page is then sent to the browser for display. Many of the items on the
page are clickable, so messages can be read, deleted, and so on.
No comments:
Post a Comment
silahkan membaca dan berkomentar