Tải bản đầy đủ
4 ASCII ( American standard code for information interchange)

4 ASCII ( American standard code for information interchange)

Tải bản đầy đủ

32

Fundamentals of data communication and packet switching
Table 2.2
ASCII
character
ACK
BEL
BS
CAN
CR
DC1
DC2
DC3
DC4
DEL
DLE
EM
ENQ
EOT
ESC
ETB
ETX
FF
FS
GS
HT
LF
NAK
NUL
RS
SI
SO
SOH
STX
SUB
SYN
US
VT

ASCII control characters
Meaning
Acknowledgement
Bell
Backspace
Cancel
Carriage Return
Device Control 1
Device Control 2
Device Control 3
Device Control 4
Delete
Data Link Escape
End of Medium
Enquiry
End of Transmission
Escape
End of Transmission Block
End of Text
Form Feed
File Separator
Group Separator
Horizontal Tab
Line Feed
Negative Acknowledgement
Null
Record Separator
Shift In
Shift Out
Start of Header
Start of Text
Substitute Character
Synchronisation character
Unit Separator
Vertical Tab

also a range of new control characters as needed to govern the flow of data in and around
the computers.
An adapted 8-bit version of the code, developed by IBM and sometimes called extended
ASCII is now standard in computer systems, the most commonly used code used is the 8-bit
ASCII code corresponding to the DOS (disk operating system) code page 437. This is the
default character set loaded for use in standard PC keyboards, unless an alternative national
character set is loaded by means of re-setting to a different code page.
The extended 8-bit ASCII code (international alphabet IA5 and code page 437) is illustrated
in Table 2.3. The ‘coded bits’ representing a particular character, number or control signal
are numbered 1 through 8 respectively (top left-hand corner of Table 2.3). These represent
the least (LSB) (number 1) through most significant bits (MSB) (number 8) respectively. Each
alphanumeric character, however, is usually written most significant bit (i.e., bit number 8)
first. Thus the letter C is written ‘01000011’. But to confuse matters further, the least significant
bit is transmitted first. Thus the order of transmission for the letter ‘C’ is ‘11000010’ and for
the word ‘ASCII’ is as shown in Figure 2.2.

ASCII (American standard code for information interchange)
Table 2.3

Extended ASCII code (as developed for the IBM PC; code page 437)

Figure 2.2

Transmission of bits to line, most significant bit first.

33

34

Fundamentals of data communication and packet switching

Table 2.3 shows the 256 characters of the standard set, listing the binary value (the ‘coded
bits’) used to represent each character. The decimal value of each character (in the top lefthand corner of each box), appears for reference as well, since some protocol specification
documents refer to it rather than the binary value.
The hexadecimal values shown in Table 2.3 have the same numerical value as the binary
and decimal equivalents in the same ‘cell’ of the table, but are simply expressed in base 16
(as opposed to base 2 for binary or base 10 for decimal). Computer and data communications
specialists often prefer to talk in terms of hexadecimal values, because they are easier to
remember and much shorter than the binary number equivalent. It is also very easy to convert
from a binary number to its equivalent hexadecimal value and vice versa1 . Hexadecimal
notation is used widely in Internet protocol suite specifications. Hexadecimal number values
are usually preceded with ‘0x’ or suffixed with an ‘H’. Thus the binary value 0100 1111B
(decimal value 79 and ASCII character ‘O’) can be written as a hexadecimal value either as
0x4F or as 4F H.
It was convenient to extend the original ASCII code (which used a standard length of
7 bits) to 8 bits (of the extended ASCII code) for three reasons: First, it allowed additional
characters to be incorporated corresponding to the various European language variations on
the roman character set (e.g., a¨ , a˚ , aˆ , α, c¸ , e´ , e` , n˜ , ø, o¨ , u¨ , ß, etc). Second, it also allowed
the addition of a number of graphical characters to control the formatting of characters (e.g.,
‘bold’, ‘italics’, ‘underlined’, ‘fontstyle’, etc.) as well as enabling simple tables to be created.
Third, it is convenient for computer designers and programmers to work with the standard
character length of 1 byte (8 bits). The memory and processing capabilities of a computer are
designed and expressed in terms of bytes, Megabytes (1 Mbyte = 1024 bytes = 8192 bits)
and Gigabytes (1 Gigabyte = 1024 × 1024 bytes = 8 388 608 bits).
As well as the 8-bit IBM PC-version of ASCII, there are a number of other 8-bit extended
ASCII codes including the text/html code (ISO 8859-1) which we shall encounter in Chapter 11
(Table 11.7) and the Microsoft Windows Latin-1 character set (code page 1252) detailed
in Appendix 1). The different character sets are adapted for slightly different purposes. In
addition, a 16-bit (2 byte) character set which is based on ASCII but correctly called unicode
has also been developed as a general purpose character set. Unicode allows computer programs
to represent all the possible worldwide characters without having to change between code sets
(Arabic, Chinese, Greek, Hebrew, Roman, Russian, Japanese, etc.) and is sometimes used in
multilingual computer programs. But under normal circumstances, an 8-bit version of ASCII
is used instead of unicode in order to keep the storage needed for each character down to
only 8-bits.

2.5 EBCDIC and extended forms of ASCII
EBCDIC (extended binary coded decimal interchange code) (pronounced ebb-si-dick) is an
alternative 8-bit scheme for encoding characters, and is the main character set used by IBM
mainframe computers. The 8-bit EBCDIC code existed before the ASCII code was extended
to 8 bits and afforded the early IBM mainframes the scope of 128 extra control characters.
1
In the hexadecimal (or base 16) numbering scheme the digits have values equivalent to the decimal values
0–15. These digits are given the signs 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E and F respectively. Thus ‘A’
represents the decimal value ‘ten’, ‘B’ represents ‘eleven’ and so on, up to ‘F’ which represents ‘fifteen’. The
conversion of hexadecimal digits into binary values is easy, since each digit in turn can be progressively replaced
by four binary digits. Thus the value 0H (hexadecimal) = 0000B (binary), 1H = 0001B, 2 H = 0010B, 3H =
0011B, 4H = 0100B, 5H = 0101B, 6H = 0110B, 7H = 0111B, 8H = 1000B, 9H = 1001B, AH = 1010B,
BH = 1011B, CH = 1100B, DH = 1101B, EH = 1110B and FH = 1111B. Surprisingly, perhaps, even multiple digit hex (hexadecimal) numbers can be converted easily to binary. So, for example the hex value 9F is
equivalent to the binary number 1001 1111.

Use of the binary code to convey graphical images

35

In the case that an IBM mainframe receives an ASCII-coded file from another computer, the
character set needs to be converted using a translation program. This is the presentation layer
functionality we discussed in Chapter 1.

2.6 Use of the binary code to convey graphical images
Besides representing numerical and alphabetical (or textual) characters, the binary code is also
used to transmit pictorial and graphical images as well as complex formatting information.
This information is typically processed by the computer graphics card.
Pictures are sent as binary information by sending an 8-bit number (a value between 1
and 256) to represent the particular colour and shade (from a choice of 256) of a miniscule
dot, making up a part of the picture. The picture itself is an array of dots. Typically a video
graphics array (VGA) card supports an array of 640 dots width and 480 dots high (a so-called
resolution of 640 × 4802 picture elements or pixels). Put all the coloured dots together again in
the right pattern (like an impressionist painting) and the picture reappears. This is the principle
on which computer images are communicated. Send a series of pictures, one after the other
at a rate of 25 Hz (25 picture frames per second) and you have video signal.
Figure 2.3 illustrates the principle of creating a computer graphic. In our example, the letters
‘VGA’ appear in a graphic array of 25 dots width and 16 dots height. Each dot (correctly
called a pixel — a picture element) in our example is either the colour ‘black’ or ‘white’. We
could thus represent each pixel with a single bit, value 1 for black and 0 for white. For each
of the 400 pixels in turn, we code and send a single bit, according to the colour of the picture
we wish to send. We start at the top left-hand corner and scan across each line in turn. So that
in binary (usually called digital code) our picture becomes:
00000
00000
01000
01000
01000
00100
00100
00100
00010
00010
00010
00001
00001
00001
00000
00000

00000
00000
00100
00101
00101
01001
01001
01001
10001
10001
10001
00001
00001
00000
00000
00000

00000
00000
11111
00000
00000
00000
00000
00000
00000
00111
00000
00000
00000
11111
00000
00000

00000
00000
00000
10000
10000
00001
00001
00001
00010
10011
10010
10100
10100
00100
00000
00000

00000
00000
10000
10000
10000
01000
01000
01000
00100
11100
00100
00010
00010
00010
00000
00000

In the binary code format the image is much harder to pick out than on the ‘screen’ of
Figure 2.3! In fact, it would have been even harder if we were not to have typed the bits
in the convenient array fashion, but instead had printed them as a continuous line of 400
characters — which is as they appear on the ‘transmission line’ between the PC graphics card
and the screen.
On reflection, it might seem rather strange, given our previous discussion about the ASCII
code — in which the three letters ‘VGA’ could be represented by 24 bits (01010110 01000111
2

Alternative, commonly used VGA picture sizes are 800 × 600 pixels and 1024 × 768 pixels.

36

Fundamentals of data communication and packet switching

Figure 2.3

An example of a video graphic array (VGA) comprising 25 × 16 picture elements (pixels).

01000001), that we should now find ourselves requiring 400 bits to represent the same three
letters! How have we come to need an extra 376 bits, you might ask? The reason is that the
form in which we wish to present the three letters (on a video screen, to a computer user)
requires the 400-bit format. The conversion of the 24-bit format into the 400-bit format is
an example of the OSI model presentation layer function, as discussed in Chapter 1. Such a
conversion is typically part of the function carried out by the computer graphics card. But in
reality, most modern computer graphics images are not a mere ‘black and white’ image of only
400 bits for a 25 × 16 pixel matrix. Instead the colour of each pixel is usually represented
by between 1 and 4 bytes of code (representing between 256 and 43 million colours) and
most computer screens nowadays are arrays of either 640 × 480 pixels, 800 × 600 pixels or
1024 × 768 pixels, so that a single ‘screenshot image’ may correspond to 3 Mbytes (1024 ×
768 × 4 = 3.1 million bits) of data.

2.7 Decoding binary messages — the need for synchronisation
and for avoiding errors
Next, we consider the challenge posed by the decoding of a binary message at the receiver end
of a connection, and also the problems caused by errors introduced during the communication.
Let us consider sending a short message, containing the single word ‘Greeting’. Coding
each of the letters into ASCII using the table of Table 2.3 we derive a bit sequence as follows,
where the right-hand bit should be sent first:
s
g
01110011 01100111
last bit to be sent

n
01101110

i
01101001

t
01110100

e
01100101

e
01100101

r
G
01110010 01100111
first bit to be sent

All well and good: easy to transmit, receive and decode back to the original message. But
what happens if the receiver incorrectly interprets the ‘idle’ signal which is sent on the line
(value ‘0’) prior to the first ‘1’ bit of the real message as a leading bit of value ‘0’? In this
case, an extra ‘0’ appears at the right-hand end of our bit string and all the other bit values
are shifted one position to the left. The pattern decoded by the receiver will be as below. Now
the meaning of the decoded message is gibberish! [decoded message:

]

Digital transmission
last bit received
11100110 11001110
[last bit lost]

11011100

11010010

11101000

11001010

11001010

37

first bit received
11100100 11001110
[extra ‘0’ assumed]

µ

Our example illustrates perfectly the need for maintaining synchronisation between the transmitter and the receiver, in order that both take the same bit as the first of each byte. We shall
return to the various methods of ensuring synchronism later in the chapter. But first, let us
also consider the effect of errors.
Errors are bits which have changed their value during conveyance across a network. They
may be caused by a large number of different reasons, some of which we shall consider later in
the chapter. If the three underlined errors below occur in the original code for ‘Greetings’, then
the received message is corrupted. Unfortunately, the result may not obviously be corrupted
gibberish, but may instead appear to be a ‘valid’ message. In this example, rather than pleasing
our recipient with ‘Greetings’, we end up insulting him with the message ‘Greedy∼gs’!
s
g
01110011 01100111
last bit to be received

01111110

y
d
e
01111001 01100100 01100101
errors underlined

e
01100101

r
G
01110010 01100111
first bit to be received

There is a clear need to minimise errors. We do this by ensuring that the quality of the
transmission lines we use is very high. The quality we measure in terms of the bit error ratio
(BER). In our example we had three bit errors in a total of 9 × 8 = 72 bits, a bit error ratio
(BER) of 4%. This would be an unacceptably high BER for a modern data network, most of
which operate in the range BER = 10−7 to 10−9 (1 error in 10 million or 1000 million bits
sent). In addition to using very high quality lines, data protocols also usually include means for
detecting and correcting errors. These methods are called error detection or error correction
codes. The simple fact is that we cannot afford any corruptions in our data!

2.8 Digital transmission
We have learned how we can code textual, graphic, video and other types of computer data
into binary code — in particular into 8-bit blocks of binary code which we call bytes. And we
have seen how we can convey this data across a communications medium by means of digital
transmission — essentially turning the electricity or light on the line either ‘on’ (to represent
binary value ‘1’) or ‘off’ (to represent binary value ‘0’). Digital transmission media which
operate according to this basic principle include:
• the serial ports of computers and the local connection lines connected to them;
• local area networks (LANs);
• digital leaselines (including all PDH (plesiochronous digital hierarchy), SDH (synchronous
digital hierarchy) and SONET (synchronous optical network) type lines. . . e.g., lines conforming to RS-232, V.24, X.21, G.703, ‘64 kbit/s’, ‘128 kbit/s’, ‘E1’, ‘T1’, ‘E3’, ‘T3’,
‘STM-1’, ‘OC-3’, etc);
• digital radio links;
• digital satellite connections;
• point-to-point fibre optic transmission.
In reality, however, the transmission on digital line systems is rarely a simple two-state ‘on-off’
process. For the purpose of line synchronisation and error avoidance it is instead normal to use