Translate

Wednesday, October 5, 2016

ELECTRONIC MAIL-SMTP AND MIME


ELECTRONIC MAIL-SMTP AND MIME
The most heavily used application in virtually any distributed system is electronic
mail. From the start, the Simple Mail Transfer Protocol (SMTP) has been the workhorse
of the TCPIIP protocol suite. However, SMTP has traditionally been limited
to the delivery of simple text messages. In recent years, there has been a demand
for the delivery mail to be able to contain various types of data, including voice,
images, and video clips. To satisfy this requirement, a new electronic mail standard,
which builds on SMTP, has been defined: the Multi-Purpose Internet Mail Extension
(MIME). In this section, we first examine SMTP, then look at MIME.
Simple Mail Transfer Protocol (SMTP)
SMTP is the standard protocol for transferring mail between hosts in the TCPIIP
protocol suite; it is defined in RFC 821.
Although messages transferred by SMTP usually follow the format defined in
RFC 822, described later, SMTP is not concerned with the format or content of
messages themselves, with two exceptions. This concept is often expressed by saying
that SMTP uses information written on the envelope of the mail (message
header), but does not look at the contents (message body) of the envelope. The two
exceptions are
1. SMTP standardizes the message character set as 7-bit ASCII.
2. SMTP adds log information to the start of the delivered message that indicates
the path the message took.
Basic Electronic Mail operation3
Figure 19.9 illustrates the overall flow of mail in a typical system. Although much of
this activity is outside the scope of SMTP, the figure illustrates the context within
which SMTP typically operates.
To begin, mail is created by a user-agent program in response to user input.
Each created message consists of a header that includes the recipient's email
address and other information, and a body containing the message to be sent. These
messages are then queued in some fashion and provided as input to an SMTP
Sender program, which is typically an always-present server program on the host.
Although the structure of the outgoing-mail queue will differ depending on
the host's operating system, each queued message, conceptually, has two parts:
1. The message text, consisting of
* The RFC 822 header: the message envelope, which includes an indication of
the intended recipient or recipients.
* The body of the message, composed by the user.
2. A list of mail destinations.
The list of mail destinations for the message is derived by the user agent from
the 822 message header. In some cases, the destination, or destinations, are literally
specified in the message header. In other cases, the user agent may need to expand
mailing list names, remove duplicates, and replace mnemonic names with actual
mailbox names. If any blind carbon copies (BCC) are indicated, the user agent
needs to prepare messages that conform to this requirement. The basic idea is that
the multiple formats and styles preferred by humans in the user interface are
replaced by a standardized list suitable for the SMTP send program.
The SMTP sender takes messages from the outgoing mail queue and transmits
them to the proper destination host via SMTP transactions over one or more TCP
connections to port 25 on the target hosts. A host may have multiple SMTP senders
active simultaneously if it has a large volume of outgoing mail, and it should also
have the capability of creating SMTP receivers on demand so that mail from one
host cannot delay mail from another.
Whenever the SMTP sender completes delivery of a particular message to one
or more users on a specific host, it deletes the corresponding destinations from that
message's destination list. When all destinations for a particular message are
processed, the message is deleted from the queue. In processing a queue, the SMTP
sender can perform a variety of optimization. If a particular message is sent to multiple
users on a single host, the message text need be sent only once. If multiple messages
are ready to send to the same host, the SMTP sender can open a TCP connection,
transfer the multiple messages, and then close the connection, rather than
opening and closing a connection for each message.
The SMTP sender must deal with a variety of errors. The destination host may
be unreachable. out of operation, or the TCP connection may fail while mail is
being transferred. The sender can requeue the mail for later delivery, but give up
after some period rather then keep the message in the queue indefinitely. A common
error is a faulty destination address, which can occur due to user-input error or
because the intended destination user has a new address on a different host. The
SMTP sender must either redirect the message, if possible, or return an error notification
to the message's originator.
The SMTP protocol is used to transfer a message from the SMTP sender to
the SMTP receiver over a TCP connection. SMTP attempts to provide reliable
operation but does not guarantee to recover from lost messages. No end-to-end
acknowledgment is returned to a message's originator that a message is successfully
delivered to the message's recipient, and error indications are not guaranteed to be
returned either. However, the SMTP-based mail system is generally considered
reliable.
The SMTP receiver accepts each arriving message and either places it in the
appropriate user mailbox or copies it to the local outgoing mail queue if forwarding
is required. The SMTP receiver must be able to verify local mail destinations and
deal with errors, including transmission errors and lack of disk file capacity.
The SMTP sender is responsible for a message up to the point where the
SMTP receiver indicates that the transfer is complete; however, this simply means
that the message has arrived at the SMTP receiver, not that the message has been
delivered to and retrieved by the intended recipient. The SMTP receiver's errorhandling
responsibilities are generally limited to giving up on TCP connections that
fail or are inactive for very long periods. Thus, the sender has most of the errorrecovery
responsibility. Errors during completion indication may cause duplicate,
but not lost, messages.
In most cases, messages go directly from the mail originator's machine to the
destination machine over a single TCP connection. However, mail will occasionally
go through intermediate machines via an SMTP forwarding capability, in which
case the message must traverse multiple TCP connections between source and destination;
one way for this to happen is for the sender to specify a route to the destination
in the form of a sequence of servers. A more common event is forwarding
required because a user has moved.
It is important to note that the SMTP protocol is limited to the conversation
that takes place between the SMTP sender and the SMTP receiver. SMTP's main
function is the transfer of messages, although there are some ancillary functions
dealing with mail destination verification and handling. The rest of the mailhandling
apparatus depicted in Figure 19.9 is beyond the scope of SMTP and may
differ from one system to another.
We now turn to a discussion of the main elements of SMTP.
SMTP Overview
The operation of SMTP consists of a series of commands and responses exchanged
between the SMTP sender and receiver. The initiative is with the SMTP sender,
who establishes the TCP connection. Once the connection is established, the SMTP
sender sends commands over the connection to the receiver. Each command generates
exactly one reply from the SMTP receiver.
Table 19.5 lists the SMTP commands. Each command consists of a single line
of text, beginning with a four-letter command code followed in some cases by an
argument field. Most replies are a single line, although multiple-line replies are possible.
The table indicates those commands that all receivers must be able to recognize.
The other commands are optional and may be ignored by the receiver.
SMTP replies are listed in Table 19.6. Each reply begins with a three-digit
code and may be followed by additional information. The leading digit indicates the
category of the reply:
Positive Completion reply. The requested action has been successfully completed.
A new request may be initiated.
Positive Intermediate reply. The command has been accepted, but the
requested action is being held in abeyance, pending receipt of further information.
The sender-SMTP should send another command specifying this
information. This reply is used in command sequence groups.
Transient Negative Completion reply. The command was not accepted, and
the requested action did not occur. However, the error condition is temporary
and the action may be requested again.
Permanent Negative Completion reply. The command was not accepted and
the requested action did not occur.
Basic SMTP operation occurs in three phases: connection setup, exchange of
one or more command-response pairs, and connection termination. We examine
each phase in turn.
Connection Setup
An SMTP sender will attempt to set up a TCP connection with a target host when it
has one or more mail messages to deliver to that host. The sequence is quite simple:
1. The sender opens a TCP connection with the receiver.
2. Once the connection is established, the receiver identifies itself with "220 Service
Ready".
3. The sender identifies itself with the HELO command.
4. The receiver accepts the sender's identification with "250 OK".
If the mail service on the destination is unavailable, the destination host
returns a "421 Service Not Available" reply in step 2, and the process is terminated.
Mail Transfer
Once a connection has been established, the SMTP sender may send one or more
messages to the SMTP receiver. There are three logical phases to the transfer of a
message:
1. A MAIL command identifies the originator of the message.
2. One or more RCPT commands identify the recipients for this message.
3. A DATA command transfers the message text.
The MAIL command gives the reverse-path, which can be used to report
errors. If the receiver is prepared to accept messages from this originator, it returns
a "250 OK" reply. Otherwise, the receiver returns a reply indicating failure to execute
the command (codes 451, 452, 552), or an error in the command (codes 421,
500,501).
The RCPT command identifies an individual recipient of the mail data; multiple
recipients are specified by multiple use of this command. A separate reply is
returned for each RCPT command, with one of the following possibilities:
1. The receiver accepts the destination with a 250 reply; this indicates that the
designated mailbox is on the receiver's system.
2. The destination will require forwarding, and the receiver will forward (251).
3. The destination requires forwarding, but the receiver will not forward; the
sender must resend to the forwarding address (551).
4. A mailbox does not exist for this recipient at this host (550).
5. The destination is rejected due to some other failure to execute (codes 450,
451,452,552,553), or an error in the command (codes 421,500,501,503).
The advantage of using a separate RCPT phase is that the sender will not send
the message until it is assured that the receiver is prepared to receive the message
for at least one recipient, thereby avoiding the overhead of sending an entire message
only to learn that the destination is unknown. Once the SMTP receiver has
agreed to receive the mail message for at least one recipient, the SMTP sender uses
the DATA command to initiate the transfer of the message. If the SMTP receiver
is still prepared to receive the message, it returns a 354 message; otherwise, the
receiver returns a reply indicating failure to execute the command (codes 451,554),
or an error in the command (codes 421,500,501,503). If the 354 reply is returned,
the SMTP sender proceeds to send the message over the TCP connection as a
sequence of ASCII lines. The end of the message is indicated by a line containing
only a period. The SMTP receiver responds with a "250 OK" reply if the message
is accepted, or with the appropriate error code (451,452,552,554).
An example, taken from RFC 821, illustrates the process:
S: MAIL FROM:<Smith@Alpha.ARPA>
R: 250 OK
S: RCPT TO:<Jones@Beta.ARPA>
R: 250 OK
S: RCPT TO:<Green@Beta.ARPA>
R: 550 No such user here
S: RCPT TO:<Brown@Beta.ARPA>
R: 250 OK
S: DATA
R: 354 Start mail input; end with <CRLF>.<CRLF>
S: Blah blah blah ...
S: ... etc. etc. etc.
S: <CRLF>.<CRLF>
R: 250 OK
The SMTP sender is transmitting mail that originates with the user
Smith@Alpha.ARPA. The message is addressed to three users on machine
Beta.ARPA, namely, Jones, Green, and Brown. The SMTP receiver indicates that
it has mailboxes for Jones and Brown but does not have information on Green.
Because at least one of the intended recipients has been verified, the sender proceeds
to send the text message.
Connection Closing
The SMTP sender closes the connection in two steps. First, the sender sends a
QUIT command and waits for a reply. The second step is to initiate a TCP close
operation for the TCP connection. The receiver initiates its TCP close after sending
its reply to the QUIT command.
RFC 822
RFC 822 defines a format for text messages that are sent using electronic mail. The
SMTP standard adopts RFC 822 as the format for use in constructing messages for
transmission via SMTP. In the RFC 822 context, messages are viewed as having an
envelope and contents. The envelope contains whatever information is needed to
accomplish transmission and delivery. The contents compose the object to be delivered
to the recipient. The RFC 822 standard applies only to the contents. However,
the content standard includes a set of header fields that may be used by the mail system
to create the envelope, and the standard is intended to facilitate the acquisition
of such information by programs.
An RFC 822 message consists of a sequence of lines of text, and uses a general
"memo" framework. That is, a message consists of some number of header
lines, which follow a rigid format, followed by a body portion consisting of arbitrary
text.
A header line usually consists of a keyword, followed by a colon, followed by
the keyword's arguments; the format allows a long line to be broken up into several
lines. The most frequently used keywords are From, To, Subject, and Date. Here is
an example message:
Date: Tue, 16 Jan 1996 10:37:17 (EST)
From: "William Stallings" <ws@host.com:
Subject: The Syntax in RFC 822
To: SmithOOther-host.com
Cc: JonesOYet-Another-Host.com
Hello. This section begins the actual message body, which is
delimited from the message heading by a blank line.
Another field that is commonly found in RFC 822 headers is Message-ID.
This field contains a unique identifier associated with this message.
Multipurpose Internet Mail Extensions (MIME)
MIME is an extension to the RFC 822 framework that is intended to address some
of the problems and limitations of the use of SMTP and RFC 822 for electronic
mail. [MURP95] lists the following limitations of the SMTP1822 scheme:
1. SMTP cannot transmit executable files or other binary objects. A number of
schemes are in use for converting binary files into a text form that can be used
by SMTP mail systems, including the popular UNIX UUencodeIUUdecode
scheme. However, none of these is a standard or even a de facto standard.
2. SMTP cannot transmit text data that includes national language characters, as
these are represented by 8-bit codes with values of 128 decimal or higher, and
SMTP is limited to 7-bit ASCII.
3. SMTP servers may reject mail message over a certain size.
4. SMTP gateways that translate between ASCII and the character code
EBCDIC do not use a consistent set of mappings, resulting in translation
problems.
5. SMTP gateways to X.400 electronic mail networks cannot handle non-textual
data included in X.400 messages.
6. Some SMTP implementations do not adhere completely to the SMTP standards
defined in RFC 821. Common problems include the following:
Deletion, addition, or reording of carriage return and linefeed.
Truncating or wrapping lines longer than 76 characters.
Removal of trailing white space (tab and space characters).
Padding of lines in a message to the same length.
Conversion of tab characters into multiple-space characters.
MIME is intended to resolve these problems in a manner that is compatible
with existing RFC 822 implementations. The specification is provided in RFC 1521
and 1522.
Overview
The MIME specification includes the following elements:
1. Five new message header fields are defined, which may be included in an RFC
822 header. These fields provide information about the body of the message.
2. A number of content formats are defined, thus standardizing representations
that support multimedia electronic mail.
3. Transfer encodings are defined that enable the conversion of any content format
into a form that is protected from alteration by the mail system.
In this subsection, we introduce the five message header fields. The next two
subsections address content formats and transfer encodings.
The five header fields defined in MIME are
MIME-version. Must have the parameter value 1.0. This field indicates that
the message conforms to RFC 1521 and 1522.
Content-type. Describes the data contained in the body with sufficient detail
that the receiving user agent can pick an appropriate agent or mechanism to
represent the data to the user or otherwise handle the data in an appropriate
manner.
Content-transfer-encoding. Indicates the type of transformation that has been
used to represent the body of the message in a way that is acceptable for mail
transport.
Content-id. Used to uniquely identify MIME entities in multiple contexts.
Content-description. A plain-text description of the object with the body; this
is useful when the object is not readable (e.g., audio data).
Any or all of these fields may appear in a normal KFC 822 header. A compliant
implementation must support the MIME-Version, Content-Type, and Content-
Transfer-Encoding fields; the Content-ID and Content-Description fields are
optional and may be ignored by the recipient implementation.
MIME Content Types
The bulk of the MIME specification is concerned with the definition of a variety of
content types; this reflects the need to provide standardized ways of dealing with a
wide variety of information representations in a multimedia environment.
Table 19.7 lists the content types specified in RFC 1521. There are seven different
major types of content and a total of 14 subtypes. In general, a content type
declares the general type of data, and the subtype specifies a particular format for
that type of data.
For the text type of body, no special software is required to get the full meaning
of the text, aside from support of the indicated character set. RFC 1521 defines
only one subtype: plain text, which is simply a string of ASCII characters or IS0
8859 characters. An earlier version of the MIME specification included a richtext
subtype, that allows greater formatting flexibility. It is expected that this subtype
will reappear in a later RFC.
The multipart type indicates that the body contains multiple, independent
parts. The Content-Type header field includes a parameter, called a boundary, that
defines the delimiter between body parts. This boundary should not appear in any
parts of the message. Each boundary starts on a new line and consists of two
hyphens followed by the boundary value. The final boundary, which indicates the
end of the last part, also has a suffix of two hyphens. Within each part, there may
be an optional, ordinary MIME header.
Here is a simple example of a multipart message, containing two parts, both
consisting of simple text (taken from RFC 1521):
From: Nathaniel Borenstein <nsb@bellcore.com>
To: Ned Freed <ned@innosoft.com>
Subject: Sample message
MIME-Version: 1.0
Content-type: multipartlmixed; boundary="simple boundary"
This is the preamble. It is to be ignored, though it is a handy place for mail
composers to include an explanatory note to non-MIME-conformant readers.
--simple boundary
This is implicitly-typed plain ASCII text. It does NOT end with a linebreak.
--simple boundary
Content-type: textlplain; charset=us-ascii
This is explicitly-typed plain ASCII text. It DOES end with a linebreak.
--simple boundary--
This is the epilogue. It is also to be ignored.
There are four subtypes of the multipart type, all of which have the same overall
syntax. The multipart/mixed subtype is used when there are multiple independent
body parts that need to be bundled in a particular order. For the multipariYparalle1
subtype, the order of the parts is not significant. If the recipient's system is appropriate,
the multiple parts can be presented in parallel. For example, a picture or text
part could be accompanied by a voice commentary that is played while the picture
or text is displayed.
For the mu1tipariYalternative subtype, the various parts are different representations
of the same information. The following is an example:
From: Nathaniel Borenstein <nsb@bellcore.com>
To: Ned Freed <ned@innosoft.com>
Subject: Formatted text mail
MIME-Version: 1.0
Content-Type: multipartlalternative; boundary=boundary42
-boundary42
Content-Type: textlplain; charset=us-ascii
...p lain-text version of message goes here ....
--boundary42
Content-Type: textlrichtext
.... RFC 1341 richtext version of same message goes here ...
--boundary42--
In this subtype, the body parts are ordered in terms of increasing preference.
For this example, if the recipient system is capable of displaying the message in the
richtext format, this is done; otherwise, the plain-text format is used.
The multipart/digest subtype is used when each of the body parts is interpreted
as an RFC 822 message with headers. This subtype enables the construction of a
message whose parts are individual messages. For example, the moderator of a
group might collect email messages from participants, bundle these messages, and
send them out in one encapsulating MIME message.
The message type provides a number of important capabilities in MIME. The
message/rfc822 subtype indicates that the body is an entire message, including
header and body. Despite the name of this subtype, the encapsulated message may
be not only a simple RFC 822 message, but any MIME message.
The message/partial subtype enables fragmentation of a large message into a
number of parts, which must be reassembled at the destination. For this = subtype,
three parameters are specified in the Content-Type: MessagelPartial field:
Id. A value that is common to each fragment of the same message, so that the
fragments can be identified at the recipient for reassembly, but which is
unique across different messages.
Number. A sequence number that indicates the position of this fragment in
the original message. The first fragment is numbered 1, the second 2, and so
on.
Total. The total number of parts. The last fragment is identified by having the
same value for the number and total parameters.
The rules for fragmenting a message are as follows:
1. Divide the body of the original message into N parts.
2. The first fragment begins with a header that has no Content-Transfer-Encoding
field; the default of 7-bit ASCII is used. The header has a Content-Type of
MessagelPartial, with a unique id, number = 1, and total = N. The remaining
fields of the header are copied from the original message header.
3. The body of the first fragment is an encapsulated MIME message that has the
Content-Type and Content-Transfer-Encoding of the original message body.
The Message-ID field of the encapsulated header must differ from that of the
enclosing header.
4. The remaining fragments include header fields from the outer header of the
first fragment. The Message-ID field must be unique. The Content-Type field
has the same id and total values as the outer header of the first fragment, as
well as the appropriate number value. There is no Content-Transfer-Encoding
field.
The rules for reassembly are as follows:
1. The fields for the header of the reassembled message are taken from the outer
header of the first fragment, with the following exceptions. The Content-Type,
Content-Transfer-Encoding, and Message-ID fields are taken from the inner
header of the first fragment.
2. All of the header fields from the second and any subsequent fragments are
ignored.
3. The body parts of the messages, not including the inner header of the first
part, are reassembled in order to form the body of the reassembled message.
Figure 19.10 illustrates a message that is transferred in two fragments.
The nzessage/external-body subtype indicates that the actual data to be conveyed
in this message are not contained in the body. Instead, the body contains
the information needed to access the data. As with the other message types, the
messagelexternal-body subtype has an outer header and an encapsulated message
with its own header. The only necessary field in the outer header is the Content-
Type field, which identifies this as a messagelexternal-body subtype. The inner
header is the message header for the encapsulated message.
The Content-Type field in the outer header must include an access-type parameter,
which has one of the following values:
8 FTP. The message body is accessible as a file using the file transfer protocol
(FTP). For this access type, the following additional parameters are mandatory:
name, indicating the name of the file; and site, indicating the domain
name of the host where the file resides. Optional parameters are directory, the
directory in which the file is located; and mode, which indicates how FTP
should retrieve the file (e.g., ASCII, image). Before the file transfer can take
place, the user will need to provide a user id and password; these are not
transmitted with the message for security reasons.
TFPT. The message body is accessible as a file using the trivial file transfer
protocol (TFTP). The same parameters as for FTP are used, and the user id
and password must also be supplied.
Anon-ITP. Identical to FTP, except that the user is not asked to supply a user
id and password. The parameter name supplies the name of the file.
Local-File. The message body is accessible as a file on the recipient's machine.
AFS. The message body is accessible as a file via the global AFS (Andrew File
System). The parameter name supplies the name of the file.
Mail-Server. The message body is accessible by sending an email message to
a mail server. A server parameter must be included that gives the email
address of the server. The body of the original message, known as the phantom
body, should contain the exact command to be sent to the mail server.
The image type indicates that the body contains a displayable image. The subtype,
jpeg or gif, specifies the image format. In the future, more subtypes will be
added to this list.
The video type indicates that the body contains a time-varying picture image,
possibly with color and coordinated sound. The only subtype so far specified is
mpeg.
The audio type indicates that the body contains audio data. The only subtype,
basic, conforms to an ISDN service known as "64-kbps, 8-kHz Structured, Usable
for Speech Information," with a digitized speech algorithm referred to as p-law
PCM (pulse-code modulation). This general type is the typical way of transmitting
speech signals over a digital network. The term p-law refers to the specific encoding
technique; it is the standard technique used in North America and Japan. A
competing system, known as A-law, is standard in Europe.
The application type refers to other kinds of data, typically either uninterpreted
binary data or information to be processed by a mail-based application. The
applicatiodoctet-stream. subtype indicates general binary data in a sequence of
octets. RFC 1521 recommends that the receiving implementation should offer to
put the data in a file or use it as input to a program.
The applicatiodpostscript subtype indicates the use of Adobe Postscript.
MIME Transfer Encodings
The other major component of the MIME specification, in addition to content-type
specification, is a definition of transfer encodings for message bodies. The objective
is to provide reliable delivery across the largest range of environments.
The MIME standard defines two methods of encoding data. The Content-
Transfer-Encoding field can actually take on six values, as listed in Table 19.8. However,
three of these values (7bit, Sbit, and binary) indicate that no encoding has
been done, but they do provide some information about the nature of the data. For
SMTP transfer, it is safe to use the 7bit form. The 8bit and binary forms may be
usable in other mail-transport contexts. Another Content-Transfer-Encoding value
is x-token, which indicates that some other encoding scheme is used, for which a
name is to be supplied; this could be a vendor-specific or application-specific
scheme. The two actual encoding schemes defined are quoted-printable and base64.
Two schemes are defined to provide a choice between a transfer technique that is
essentially human-readable and one that is safe for all types of data in a way that is
reasonably compact.
The quoted-printable transfer encoding is useful when the data consist largely
of octets that correspond to printable ASCII characters (see Table 2.1). In essence,
it represents non-safe characters by the hexadecimal representation of their code
and introduces reversible (soft) line breaks to limit message lines to 76 characters.
The encoding rules are as follows:
1. General 8-bit representation: This rule is to be used when none of the other
rules apply. Any character is represented by an equal sign, followed by a twodigit
hexadecimal representation of the octet's value. For example, the ASCII
for~n-feedw, hich has an 8-bit value of decimal 12, is represented by "=OCM.
2. Literal representation: Any character in the range decimal 33 ("!") through
decimal 126 ("-"), except decimal 61, ("=") is represented as that ASCII
character.
3. White space: Octets with the values 9 and 32 may be represented as ASCII tab
and space characters, respectively, except at the end of a line. Any white space
(tab or blank) at the end of a line must be represented by rule 1. On decoding,
any trailing white space on a line is deleted; this eliminates any white
space added by intermediate transport agents.
4. Line breaks: Any line break, regardless of its initial representation, is represented
by the RFC 822 line break, which is a carriage-returnlline-feed combination.
5. Soft line breaks: If an encoded line would be longer than 76 characters
(excluding <CRLF>), a soft line break must be inserted at or before character
position 75. A soft line break consists of the hexadecimal sequence 3DODOA,
which is the ASCII code for an equal sign followed by carriage return line
feed.
The base64 transfer encoding, also known as radix-64 encoding, is a common
one for encoding arbitrary binary data in such a way as to be invulnerable to the
processing by mail-transport programs. For example, both PGP (Pretty Good Privacy)
and PEM (Privacy Enhanced Mail) secure electronic-mail schemes make use
of base64; this technique maps arbitrary binary input into printable character output.
The form of encoding has the following relevant characteristics:
1. The range of the function is a character set that is universally representable at
all sites, not a specific binary encoding of that character set. Thus, the characters
themselves can be encoded into whatever form is needed by a specific system.
For example, the character "Ex is represented in an ASCII-based system
as hexadecimal 45 and in an EBCDIC-based system as hexadecimal C5.
2. The character set consists of 65 printable characters, one of which is used for
padding. With 2^6 = 64 available characters, each character can be used to represent
6 bits of input.
3. No control characters are included in the set. Thus, a message encoded in
radix 64 can traverse mail-handling systems that scan the data stream for control
characters.
4. The hyphen character ("-") is not used. This character has significance in the
RFC 822 format and should therefore be avoided.
Table 19.9 shows the mapping of 6-bit input values to characters. The character
set consists of the alphanumeric characters plus "+" and "I". The "=" character
is used as the padding character.
Figure 19.11 illustrates the simple mapping scheme. Binary input is processed
in blocks of 3 octets, or 24 bits. Each set of 6 bits in the 24-bit block is mapped into
this typical case, each 24-bit input is expanded to 32 bits of output.
One important feature of this mapping is that the least significant 6 bits of the
representation of these 65 characters is the same in all commonly used character
sets. For example, as was mentioned, "En in 7-bit ASCII is 0100 0101 and in 8-bit
EBCDIC is 1100 0101. The rightmost 6 bits are the same in both cases. Thus, the
reverse mapping from radix 64 to binary is simply a matter of extracting the least
significant 6 bits of each character.
For example, the sequence "H52Qn in ASCII is
.
A Multipart Example
Figure 19.12, taken from RFC 1521, is the outline of a complex multipart message.
The message has five parts to be displayed serially: two introductory plain text
parts, an embedded multipart message. a richtext part, and a closing encapsulated
text message in a non-ASCII character set. The embedded multipart message has
two parts to be displayed in parallel: a picture and an audio fragment.

No comments:

Post a Comment

silahkan membaca dan berkomentar