ELECTRONIC
MAIL-SMTP AND MIME
The
most heavily used application in virtually any distributed system is electronic
mail.
From the start, the Simple Mail Transfer Protocol (SMTP) has been the workhorse
of
the TCPIIP protocol suite. However, SMTP has traditionally been limited
to
the delivery of simple text messages. In recent years, there has been a demand
for
the delivery mail to be able to contain various types of data, including voice,
images,
and video clips. To satisfy this requirement, a new electronic mail standard,
which
builds on SMTP, has been defined: the Multi-Purpose Internet Mail Extension
(MIME).
In this section, we first examine SMTP, then look at MIME.
Simple
Mail Transfer Protocol (SMTP)
SMTP
is the standard protocol for transferring mail between hosts in the TCPIIP
protocol
suite; it is defined in RFC 821.
Although
messages transferred by SMTP usually follow the format defined in
RFC
822, described later, SMTP is not concerned with the format or content of
messages
themselves, with two exceptions. This concept is often expressed by saying
that
SMTP uses information written on the envelope of the mail (message
header),
but does not look at the contents (message body) of the envelope. The
two
exceptions
are
1. SMTP standardizes the message character
set as 7-bit ASCII.
2.
SMTP
adds log information to the start of the delivered message that indicates
the
path the message took.
Basic
Electronic Mail operation3
Figure
19.9 illustrates the overall flow of mail in a typical system. Although much of
this
activity is outside the scope of SMTP, the figure illustrates the context
within
which
SMTP typically operates.
To
begin, mail is created by a user-agent program in response to user input.
Each
created message consists of a header that includes the recipient's email
address
and other information, and a body containing the message to be sent. These
messages
are then queued in some fashion and provided as input to an SMTP
Sender
program, which is typically an always-present server program on the host.
Although
the structure of the outgoing-mail queue will differ depending on
the
host's operating system, each queued message, conceptually, has two parts:
1.
The
message text, consisting of
* The
RFC 822 header: the message envelope, which includes an indication of
the
intended recipient or recipients.
* The
body of the message, composed by the user.
2.
A list of mail destinations.
The
list of mail destinations for the message is derived by the user agent from
the
822 message header. In some cases, the destination, or destinations, are
literally
specified
in the message header. In other cases, the user agent may need to expand
mailing
list names, remove duplicates, and replace mnemonic names with actual
mailbox
names. If any blind carbon copies (BCC) are indicated, the user agent
needs
to prepare messages that conform to this requirement. The basic idea is that
the
multiple formats and styles preferred by humans in the user interface are
replaced
by a standardized list suitable for the SMTP send program.
The
SMTP sender takes messages from the outgoing mail queue and transmits
them
to the proper destination host via SMTP transactions over one or more TCP
connections
to port 25 on the target hosts. A host may have multiple SMTP senders
active
simultaneously if it has a large volume of outgoing mail, and it should also
have
the capability of creating SMTP receivers on demand so that mail from one
host
cannot delay mail from another.
Whenever
the SMTP sender completes delivery of a particular message to one
or
more users on a specific host, it deletes the corresponding destinations from
that
message's
destination list. When all destinations for a particular message are
processed,
the message is deleted from the queue. In processing a queue, the SMTP
sender
can perform a variety of optimization. If a particular message is sent to
multiple
users
on a single host, the message text need be sent only once. If multiple messages
are
ready to send to the same host, the SMTP sender can open a TCP connection,
transfer
the multiple messages, and then close the connection, rather than
opening
and closing a connection for each message.
The
SMTP sender must deal with a variety of errors. The destination host may
be
unreachable. out of operation, or the TCP connection may fail while mail is
being
transferred. The sender can requeue the mail for later delivery, but give up
after
some period rather then keep the message in the queue indefinitely. A common
error
is a faulty destination address, which can occur due to user-input error or
because
the intended destination user has a new address on a different host. The
SMTP
sender must either redirect the message, if possible, or return an error
notification
to
the message's originator.
The
SMTP protocol is used to transfer a message from the SMTP sender to
the
SMTP receiver over a TCP connection. SMTP attempts to provide reliable
operation
but does not guarantee to recover from lost messages. No end-to-end
acknowledgment
is returned to a message's originator that a message is successfully
delivered
to the message's recipient, and error indications are not guaranteed to be
returned
either. However, the SMTP-based mail system is generally considered
reliable.
The
SMTP receiver accepts each arriving message and either places it in the
appropriate
user mailbox or copies it to the local outgoing mail queue if forwarding
is
required. The SMTP receiver must be able to verify local mail destinations and
deal
with errors, including transmission errors and lack of disk file capacity.
The
SMTP sender is responsible for a message up to the point where the
SMTP
receiver indicates that the transfer is complete; however, this simply means
that
the message has arrived at the SMTP receiver, not that the message has been
delivered
to and retrieved by the intended recipient. The SMTP receiver's errorhandling
responsibilities
are generally limited to giving up on TCP connections that
fail
or are inactive for very long periods. Thus, the sender has most of the
errorrecovery
responsibility.
Errors during completion indication may cause duplicate,
but
not lost, messages.
In
most cases, messages go directly from the mail originator's machine to the
destination
machine over a single TCP connection. However, mail will occasionally
go
through intermediate machines via an SMTP forwarding capability, in which
case
the message must traverse multiple TCP connections between source and
destination;
one
way for this to happen is for the sender to specify a route to the destination
in
the form of a sequence of servers. A more common event is forwarding
required
because a user has moved.
It
is important to note that the SMTP protocol is limited to the conversation
that
takes place between the SMTP sender and the SMTP receiver. SMTP's main
function
is the transfer of messages, although there are some ancillary functions
dealing
with mail destination verification and handling. The rest of the mailhandling
apparatus
depicted in Figure 19.9 is beyond the scope of SMTP and may
differ
from one system to another.
We
now turn to a discussion of the main elements of SMTP.
SMTP
Overview
The
operation of SMTP consists of a series of commands and responses exchanged
between
the SMTP sender and receiver. The initiative is with the SMTP sender,
who
establishes the TCP connection. Once the connection is established, the SMTP
sender
sends commands over the connection to the receiver. Each command generates
exactly
one reply from the SMTP receiver.
Table
19.5 lists the SMTP commands. Each command consists of a single line
of
text, beginning with a four-letter command code followed in some cases by an
argument
field. Most replies are a single line, although multiple-line replies are
possible.
The
table indicates those commands that all receivers must be able to recognize.
The
other commands are optional and may be ignored by the receiver.
SMTP
replies are
listed in Table 19.6.
Each
reply begins with a three-digit
code
and may be followed by additional information. The leading digit indicates the
category
of the reply:
Positive
Completion reply. The
requested action has been successfully completed.
A
new
request may be initiated.
Positive
Intermediate reply. The
command has been accepted, but the
requested
action is being held in abeyance, pending receipt of further information.
The
sender-SMTP should send another command specifying this
information.
This reply is used in command sequence groups.
Transient
Negative Completion reply. The command was not accepted, and
the
requested action did not occur. However, the error condition is temporary
and
the action may be requested again.
Permanent
Negative Completion reply. The command was not accepted and
the
requested action did not occur.
Basic
SMTP operation occurs in three phases: connection setup, exchange of
one
or more command-response pairs, and connection termination. We examine
each
phase in turn.
Connection
Setup
An
SMTP sender will attempt to set up a TCP connection with a target host when it
has
one or more mail messages to deliver to that host. The sequence is quite
simple:
1. The sender opens a TCP connection with the receiver.
2.
Once
the connection is established, the receiver identifies itself with "220
Service
Ready".
3.
The
sender identifies itself with the HELO command.
4.
The receiver accepts the sender's identification with "250 OK".
If
the mail service on the destination is unavailable, the destination host
returns
a "421 Service Not Available" reply in step 2, and the process is
terminated.
Mail
Transfer
Once
a connection has been established, the SMTP sender may send one or more
messages
to the SMTP receiver. There are three logical phases to the transfer of a
message:
1. A MAIL command identifies the originator of the
message.
2.
One or more RCPT commands identify the recipients for this message.
3.
A
DATA command transfers the message text.
The
MAIL command gives the reverse-path, which can be used to report
errors.
If the receiver is prepared to accept messages from this originator, it returns
a
"250 OK" reply. Otherwise, the receiver returns a reply indicating
failure to execute
the
command (codes 451, 452, 552), or an error in the command (codes 421,
500,501).
The
RCPT command identifies an individual recipient of the mail data;
multiple
recipients
are specified by multiple use of this command. A separate reply is
returned
for each RCPT command, with one of the following possibilities:
1.
The
receiver accepts the destination with a 250 reply; this indicates that the
designated
mailbox is on the receiver's system.
2.
The destination will require forwarding, and the receiver will forward (251).
3.
The
destination requires forwarding, but the receiver will not forward; the
sender
must resend to the forwarding address (551).
4.
A
mailbox does not exist for this recipient at this host (550).
5.
The
destination is rejected due to some other failure to execute (codes 450,
451,452,552,553),
or an error in the command (codes 421,500,501,503).
The
advantage of using a separate RCPT phase is that the sender will not send
the
message until it is assured that the receiver is prepared to receive the
message
for
at least one recipient, thereby avoiding the overhead of sending an entire
message
only
to learn that the destination is unknown. Once the SMTP receiver has
agreed
to receive the mail message for at least one recipient, the SMTP sender uses
the
DATA
command
to
initiate the transfer of the message. If the SMTP receiver
is
still prepared to receive the message, it returns a 354 message; otherwise, the
receiver
returns a reply indicating failure to execute the command (codes 451,554),
or
an error in the command (codes 421,500,501,503). If the 354 reply is returned,
the
SMTP sender proceeds to send the message over the TCP connection as a
sequence
of ASCII lines. The end of the message is indicated by a line containing
only
a period. The SMTP receiver responds with a "250 OK" reply if the
message
is
accepted, or with the appropriate error code (451,452,552,554).
An
example, taken from RFC 821, illustrates the process:
S:
MAIL FROM:<Smith@Alpha.ARPA>
R:
250 OK
S:
RCPT TO:<Jones@Beta.ARPA>
R:
250 OK
S:
RCPT TO:<Green@Beta.ARPA>
R:
550 No such user here
S:
RCPT TO:<Brown@Beta.ARPA>
R:
250 OK
S:
DATA
R:
354 Start mail input; end with <CRLF>.<CRLF>
S:
Blah blah blah ...
S:
... etc. etc. etc.
S:
<CRLF>.<CRLF>
R:
250 OK
The
SMTP sender is transmitting mail that originates with the user
Smith@Alpha.ARPA.
The message is addressed to three users on machine
Beta.ARPA,
namely, Jones, Green, and Brown. The SMTP receiver indicates that
it
has mailboxes for Jones and Brown but does not have information on Green.
Because
at least one of the intended recipients has been verified, the sender proceeds
to
send the text message.
Connection
Closing
The
SMTP sender closes the connection in two steps. First, the sender sends a
QUIT
command and waits for a reply. The second step is to initiate a TCP close
operation
for the TCP connection. The receiver initiates its TCP close after sending
its
reply to the QUIT command.
RFC
822
RFC
822 defines a format for text messages that are sent using electronic mail. The
SMTP
standard adopts RFC 822 as the format for use in constructing messages for
transmission
via SMTP. In the RFC 822 context, messages are viewed as having an
envelope
and contents. The envelope contains whatever information is needed to
accomplish
transmission and delivery. The contents compose the object to be delivered
to
the recipient. The RFC 822 standard applies only to the contents. However,
the
content standard includes a set of header fields that may be used by the mail
system
to
create the envelope, and the standard is intended to facilitate the acquisition
of
such information by programs.
An
RFC 822 message consists of a sequence of lines of text, and uses a general
"memo"
framework. That is, a message consists of some number of header
lines,
which follow a rigid format, followed by a body portion consisting of arbitrary
text.
A
header line usually consists of a keyword, followed by a colon, followed by
the
keyword's arguments; the format allows a long line to be broken up into several
lines.
The most frequently used keywords are From, To, Subject, and Date. Here is
an
example message:
Date:
Tue, 16 Jan 1996 10:37:17 (EST)
From:
"William Stallings" <ws@host.com:
Subject:
The Syntax in RFC 822
To:
SmithOOther-host.com
Cc:
JonesOYet-Another-Host.com
Hello.
This section begins the actual message body, which is
delimited
from the message heading by a blank line.
Another
field that is commonly found in RFC 822 headers is Message-ID.
This
field contains a unique identifier associated with this message.
Multipurpose
Internet Mail Extensions (MIME)
MIME
is an extension to the RFC 822 framework that is intended to address some
of
the problems and limitations of the use of SMTP and RFC 822 for electronic
mail.
[MURP95] lists the following limitations of the SMTP1822 scheme:
1.
SMTP
cannot transmit executable files or other binary objects. A number of
schemes
are in use for converting binary files into a text form that can be used
by
SMTP mail systems, including the popular UNIX UUencodeIUUdecode
scheme.
However, none of these is a standard or even a de facto standard.
2. SMTP cannot transmit text data that includes
national language characters, as
these
are represented by 8-bit codes with values of 128 decimal or higher, and
SMTP
is limited to 7-bit ASCII.
3.
SMTP
servers may reject mail message over a certain size.
4.
SMTP
gateways that translate between ASCII and the character code
EBCDIC
do not use a consistent set of mappings, resulting in translation
problems.
5.
SMTP gateways to X.400 electronic mail networks cannot handle non-textual
data
included in X.400 messages.
6.
Some SMTP implementations do not adhere completely to the SMTP standards
defined
in RFC 821. Common problems include the following:
Deletion,
addition, or reording of carriage return and linefeed.
Truncating
or wrapping lines longer than 76 characters.
Removal
of trailing white space (tab and space characters).
Padding
of lines in a message to the same length.
Conversion
of tab characters into multiple-space characters.
MIME
is intended to resolve these problems in a manner that is compatible
with
existing RFC 822 implementations. The specification is provided in RFC 1521
and
1522.
Overview
The
MIME specification includes the following elements:
1.
Five
new message header fields are defined, which may be included in an RFC
822
header. These fields provide information about the body of the message.
2.
A
number of content formats are defined, thus standardizing representations
that
support multimedia electronic mail.
3.
Transfer
encodings are defined that enable the conversion of any content format
into
a form that is protected from alteration by the mail system.
In
this subsection, we introduce the five message header fields. The next two
subsections
address content formats and transfer encodings.
The
five header fields defined in MIME are
MIME-version.
Must
have the parameter value 1.0. This field indicates that
the
message conforms to RFC 1521 and 1522.
Content-type.
Describes
the data contained in the body with sufficient detail
that
the receiving user agent can pick an appropriate agent or mechanism to
represent
the data to the user or otherwise handle the data in an appropriate
manner.
Content-transfer-encoding.
Indicates
the type of transformation that has been
used
to represent the body of the message in a way that is acceptable for mail
transport.
Content-id.
Used
to uniquely identify MIME entities in multiple contexts.
Content-description.
A
plain-text description of the object with the body; this
is
useful when the object is not readable (e.g., audio data).
Any
or all of these fields may appear in a normal KFC 822 header. A compliant
implementation
must support the MIME-Version, Content-Type, and Content-
Transfer-Encoding
fields; the Content-ID and Content-Description fields are
optional
and may be ignored by the recipient implementation.
MIME
Content Types
The
bulk of the MIME specification is concerned with the definition of a variety of
content
types; this reflects the need to provide standardized ways of dealing with a
wide
variety of information representations in a multimedia environment.
Table
19.7 lists the content types specified in RFC 1521. There are seven different
major
types of content and a total of 14 subtypes. In general, a content type
declares
the general type of data, and the subtype specifies a particular format for
that
type of data.
For
the text type of body, no special software is required to get the full
meaning
of
the text, aside from support of the indicated character set. RFC 1521 defines
only
one subtype: plain text, which is simply a string of ASCII characters or IS0
8859
characters. An earlier version of the MIME specification included a richtext
subtype,
that allows greater formatting flexibility. It is expected that this subtype
will
reappear in a later RFC.
The
multipart type indicates that the body contains multiple, independent
parts.
The Content-Type header field includes a parameter, called a boundary, that
defines
the delimiter between body parts. This boundary should not appear in any
parts
of the message. Each boundary starts on a new line and consists of two
hyphens
followed by the boundary value. The final boundary, which indicates the
end
of the last part, also has a suffix of two hyphens. Within each part, there may
be
an optional, ordinary MIME header.
Here
is a simple example of a multipart message, containing two parts, both
consisting
of simple text (taken from RFC 1521):
From:
Nathaniel Borenstein <nsb@bellcore.com>
To:
Ned Freed <ned@innosoft.com>
Subject:
Sample message
MIME-Version:
1.0
Content-type:
multipartlmixed; boundary="simple boundary"
This
is the preamble. It is to be ignored, though it is a handy place for mail
composers
to include an explanatory note to non-MIME-conformant readers.
--simple
boundary
This
is implicitly-typed plain ASCII text. It does NOT end with a linebreak.
--simple
boundary
Content-type:
textlplain; charset=us-ascii
This
is explicitly-typed plain ASCII text. It DOES end with a linebreak.
--simple
boundary--
This
is the epilogue. It is also to be ignored.
There
are four subtypes of the multipart type, all of which have the same overall
syntax.
The multipart/mixed subtype is used when there are multiple independent
body
parts that need to be bundled in a particular order. For the multipariYparalle1
subtype,
the order of the parts is not significant. If the recipient's system is
appropriate,
the
multiple parts can be presented in parallel. For example, a picture or text
part
could be accompanied by a voice commentary that is played while the picture
or
text is displayed.
For
the mu1tipariYalternative subtype, the various parts are different
representations
of
the same information. The following is an example:
From:
Nathaniel Borenstein <nsb@bellcore.com>
To:
Ned Freed <ned@innosoft.com>
Subject:
Formatted text mail
MIME-Version:
1.0
Content-Type:
multipartlalternative; boundary=boundary42
-boundary42
Content-Type:
textlplain; charset=us-ascii
...p
lain-text version of message goes here ....
--boundary42
Content-Type:
textlrichtext
....
RFC
1341 richtext version of same message goes here ...
--boundary42--
In
this subtype, the body parts are ordered in terms of increasing preference.
For
this example, if the recipient system is capable of displaying the message in
the
richtext
format, this is done; otherwise, the plain-text format is used.
The
multipart/digest subtype is used when each of the body parts is interpreted
as
an RFC 822 message with headers. This subtype enables the construction of a
message
whose parts are individual messages. For example, the moderator of a
group
might collect email messages from participants, bundle these messages, and
send
them out in one encapsulating MIME message.
The
message type provides a number of important capabilities in MIME. The
message/rfc822
subtype indicates
that the body is an entire message, including
header
and body. Despite the name of this subtype, the encapsulated message may
be
not only a simple RFC 822 message, but any MIME message.
The
message/partial subtype enables fragmentation of a large message into a
number
of parts, which must be reassembled at the destination. For this = subtype,
three
parameters are specified in the Content-Type: MessagelPartial field:
Id.
A
value that is common to each fragment of the same message, so that the
fragments
can be identified at the recipient for reassembly, but which is
unique
across different messages.
Number.
A
sequence number that indicates the position of this fragment in
the
original message. The first fragment is numbered 1, the second 2, and so
on.
Total.
The
total number of parts. The last fragment is identified by having the
same
value for the number and total parameters.
The
rules for fragmenting a message are as follows:
1.
Divide the body of the original message into N parts.
2.
The
first fragment begins with a header that has no Content-Transfer-Encoding
field;
the default of 7-bit ASCII is used. The header has a Content-Type of
MessagelPartial,
with a unique id, number =
1,
and
total = N. The remaining
fields
of the header are copied from the original message header.
3.
The
body of the first fragment is an encapsulated MIME message that has the
Content-Type
and Content-Transfer-Encoding of the original message body.
The
Message-ID field of the encapsulated header must differ from that of the
enclosing
header.
4.
The remaining fragments include header fields from the outer header of the
first
fragment. The Message-ID field must be unique. The Content-Type field
has
the same id and total values as the outer header of the first fragment, as
well
as the appropriate number value. There is no Content-Transfer-Encoding
field.
The
rules for reassembly are as follows:
1.
The
fields for the header of the reassembled message are taken from the outer
header
of the first fragment, with the following exceptions. The Content-Type,
Content-Transfer-Encoding,
and Message-ID fields are taken from the inner
header
of the first fragment.
2.
All of
the header fields from the second and any subsequent fragments are
ignored.
3.
The
body parts of the messages, not including the inner header of the first
part,
are reassembled in order to form the body of the reassembled message.
Figure
19.10 illustrates a message that is transferred in two fragments.
The
nzessage/external-body subtype indicates that the actual data to be
conveyed
in
this message are not contained in the body. Instead, the body contains
the
information needed to access the data. As with the other message types, the
messagelexternal-body
subtype has an outer header and an encapsulated message
with
its own header. The only necessary field in the outer header is the Content-
Type
field, which identifies this as a messagelexternal-body subtype. The inner
header
is the message header for the encapsulated message.
The
Content-Type field in the outer header must include an access-type parameter,
which
has one of the following values:
8
FTP.
The
message body is accessible as a file using the file transfer protocol
(FTP).
For this access type, the following additional parameters are mandatory:
name,
indicating
the name of the file; and site, indicating the domain
name
of the host where the file resides. Optional parameters are directory, the
directory
in which the file is located; and mode, which indicates how FTP
should
retrieve the file (e.g., ASCII, image). Before the file transfer can take
place,
the user will need to provide a user id and password; these are not
transmitted
with the message for security reasons.
TFPT.
The
message body is accessible as a file using the trivial file transfer
protocol
(TFTP). The same parameters as for FTP are used, and the user id
and
password must also be supplied.
Anon-ITP.
Identical
to FTP, except that the user is not asked to supply a user
id
and password. The parameter name supplies the name of the file.
Local-File.
The
message body is accessible as a file on the recipient's machine.
AFS.
The
message body is accessible as a file via the global AFS (Andrew File
System).
The parameter name supplies the name of the file.
Mail-Server.
The
message body is accessible by sending an email message to
a
mail server. A server parameter must be included that gives the email
address
of the server. The body of the original message, known as the phantom
body,
should contain the exact command to be sent to the mail server.
The
image type indicates that the body contains a displayable image. The
subtype,
jpeg
or gif, specifies the image format. In the future, more subtypes will be
added
to this list.
The
video type indicates that the body contains a time-varying picture
image,
possibly
with color and coordinated sound. The only subtype so far specified is
mpeg.
The
audio type indicates that the body contains audio data. The only
subtype,
basic,
conforms to an ISDN service known as "64-kbps, 8-kHz Structured, Usable
for
Speech Information," with a digitized speech algorithm referred to as
p-law
PCM
(pulse-code modulation). This general type is the typical way of transmitting
speech
signals over a digital network. The term p-law refers to the specific encoding
technique;
it is the standard technique used in North America and Japan. A
competing
system, known as A-law, is standard in Europe.
The
application type refers to other kinds of data, typically either
uninterpreted
binary
data or information to be processed by a mail-based application. The
applicatiodoctet-stream.
subtype
indicates
general binary data in a sequence of
octets.
RFC 1521 recommends that the receiving implementation should offer to
put
the data in a file or use it as input to a program.
The
applicatiodpostscript subtype indicates the use of Adobe Postscript.
MIME
Transfer Encodings
The
other major component of the MIME specification, in addition to content-type
specification,
is a definition of transfer encodings for message bodies. The objective
is
to provide reliable delivery across the largest range of environments.
The
MIME standard defines two methods of encoding data. The Content-
Transfer-Encoding
field can actually take on six values, as listed in Table 19.8. However,
three
of these values (7bit, Sbit, and binary) indicate that no encoding has
been
done, but they do provide some information about the nature of the data. For
SMTP
transfer, it is safe to use the 7bit form. The 8bit and binary forms may be
usable
in other mail-transport contexts. Another Content-Transfer-Encoding value
is
x-token, which indicates that some other encoding scheme is used, for which a
name
is to be supplied; this could be a vendor-specific or application-specific
scheme.
The two actual encoding schemes defined are quoted-printable and base64.
Two
schemes are defined to provide a choice between a transfer technique that is
essentially
human-readable and one that is safe for all types of data in a way that is
reasonably
compact.
The
quoted-printable
transfer
encoding is useful when the data consist largely
of
octets that correspond to printable ASCII characters (see Table 2.1). In
essence,
it
represents non-safe characters by the hexadecimal representation of their code
and
introduces reversible (soft) line breaks to limit message lines to 76
characters.
The
encoding rules are as follows:
1. General 8-bit representation: This rule
is to be used when none of the other
rules
apply. Any character is represented by an equal sign, followed by a twodigit
hexadecimal
representation of the octet's value. For example, the ASCII
for~n-feedw,
hich has an 8-bit value of decimal 12, is represented by "=OCM.
2.
Literal representation: Any character in the range decimal 33 ("!") through
decimal
126 ("-"),
except
decimal 61, ("=")
is
represented as that ASCII
character.
3.
White
space: Octets with the values 9 and 32 may be represented as ASCII tab
and
space characters, respectively, except at the end of a line. Any white space
(tab
or blank) at the end of a line must be represented by rule 1. On decoding,
any
trailing white space on a line is deleted; this eliminates any white
space
added by intermediate transport agents.
4.
Line
breaks: Any line break, regardless of its initial representation, is
represented
by
the RFC 822 line break, which is a carriage-returnlline-feed combination.
5. Soft line breaks: If an encoded line would be longer
than 76 characters
(excluding
<CRLF>), a soft line break must be inserted at or before character
position
75. A soft line break consists of the hexadecimal sequence 3DODOA,
which
is the ASCII code for an equal sign followed by carriage return line
feed.
The
base64 transfer encoding, also known as radix-64 encoding, is a common
one
for encoding arbitrary binary data in such a way as to be invulnerable to the
processing
by mail-transport programs. For example, both PGP (Pretty Good Privacy)
and
PEM (Privacy Enhanced Mail) secure electronic-mail schemes make use
of
base64; this technique maps arbitrary binary input into printable character
output.
The
form of encoding has the following relevant characteristics:
1.
The
range of the function is a character set that is universally representable at
all
sites, not a specific binary encoding of that character set. Thus, the
characters
themselves
can be encoded into whatever form is needed by a specific system.
For
example, the character "Ex is represented in an ASCII-based system
as
hexadecimal 45 and in an EBCDIC-based system as hexadecimal C5.
2.
The character set consists of 65 printable characters, one of which is used for
padding.
With 2^6 =
64
available characters, each character can be used to represent
6
bits of input.
3.
No
control characters are included in the set. Thus, a message encoded in
radix
64 can traverse mail-handling systems that scan the data stream for control
characters.
4.
The hyphen character ("-")
is
not used. This character has significance in the
RFC
822 format and should therefore be avoided.
Table
19.9 shows the mapping of 6-bit input values to characters. The character
set
consists of the alphanumeric characters plus "+" and "I". The "=" character
is
used as the padding character.
Figure
19.11 illustrates the simple mapping scheme. Binary input is processed
in
blocks of 3 octets, or 24 bits. Each set of 6 bits in the 24-bit block
is mapped into
this
typical case, each 24-bit input is expanded to 32 bits of output.
One
important feature of this mapping is that the least significant 6 bits of the
representation
of these 65 characters is the same in all commonly used character
sets.
For example, as was mentioned, "En in 7-bit ASCII is 0100 0101 and in
8-bit
EBCDIC
is 1100 0101. The rightmost 6 bits are the same in both cases. Thus, the
reverse
mapping from radix 64 to binary is simply a matter of extracting the least
significant
6 bits of each character.
For
example, the sequence "H52Qn in ASCII is
.
A
Multipart
Example
Figure
19.12, taken from RFC 1521, is the outline of a complex multipart message.
The
message has five parts to be displayed serially: two introductory plain text
parts,
an embedded multipart message. a richtext part, and a closing encapsulated
text
message in a non-ASCII character set. The embedded multipart message has
two
parts to be displayed in parallel: a picture and an audio fragment.
No comments:
Post a Comment
silahkan membaca dan berkomentar