Tải bản đầy đủ - 0 (trang)
4 ¦Ì-Law and A-Law Companding

# 4 ¦Ì-Law and A-Law Companding

Tải bản đầy đủ - 0trang

7.4 µ-Law and A-Law Companding

705

encoder uses the logarithmic expression

sgn(x)

ln(1 + µ|x|)

, where sgn(x) =

ln(1 + µ)

+1,

0,

−1,

x > 0,

x = 0,

x < 0,

(and µ is a positive integer), to compute and output an 8-bit code in the same interval

[−1, +1]. The output is then scaled to the range [−256, +255]. Figure 7.8 shows this

output as a function of the input for the three µ values 25, 255, and 2555. It is clear that

large values of µ cause coarser quantization for larger amplitudes. Such values allocate

more bits to the smaller, more important, amplitudes. The G.711 standard recommends

the use of µ = 255. The diagram shows only the nonnegative values of the input (i.e.,

from 0 to 8191). The negative side of the diagram has the same shape but with negative

inputs and outputs.

The A-law encoder uses the similar expression

A|x|

 sgn(x) 1 + ln(A) ,

for 0 ≤ |x| <

 sgn(x) 1 + ln(A|x|) , for

1 + ln(A)

1

A

1

A,

≤ |x| < 1.

The G.711 standard recommends the use of A = 255.

The following simple calculations illustrate the nonlinear nature of the µ-law. The

two (normalized) input samples 0.15 and 0.16 are transformed by µ-law to outputs

0.6618 and 0.6732. The diﬀerence between the outputs is 0.0114. On the other hand,

the two input samples 0.95 and 0.96 (bigger inputs but with the same diﬀerence) are

transformed to 0.9908 and 0.9927. The diﬀerence between these two outputs is 0.0019;

much smaller.

Bigger samples are decoded with more noise, and smaller samples are decoded with

less noise. However, the signal-to-noise ratio (SNR, Section 4.2.2) is constant because

both the µ-law and the SNR use logarithmic expressions.

Logarithms are slow to calculate, so the µ-law encoder performs much simpler calculations that produce an approximation. The output speciﬁed by the G.711 standard

is an 8-bit codeword whose format is shown in Figure 7.9.

Bit P in Figure 7.9 is the sign bit of the output (same as the sign bit of the 14-bit

signed input sample). Bits S2, S1, and S0 are the segment code, and bits Q3 through

Q0 are the quantization code. The encoder determines the segment code by (1) adding

a bias of 33 to the absolute value of the input sample, (2) determining the bit position

of the most signiﬁcant 1-bit among bits 5 through 12 of the input, and (3) subtracting

5 from that position. The 4-bit quantization code is set to the four bits following the

bit position determined in step 2. The encoder ignores the remaining bits of the input

sample, and it inverts (1’s complements) the codeword before it is output.

We use the input sample −656 as an example. The sample is negative, so bit P

becomes 1. Adding 33 to the absolute value of the input yields 689 = 00010101100012

(Figure 7.10). The most signiﬁcant 1-bit in positions 5 through 12 is found at position

9. The segment code is thus 9 − 5 = 4. The quantization code is the four bits 0101 at

706

7.

Audio Compression

128

112

2555

96

255

80

25

64

48

32

16

0

2000

4000

6000

8000

Figure 7.8: The µ-Law for µ Values of 25, 255, and 2555.

dat=linspace(0,1,1000);

mu=255;

plot(dat*8159,128*log(1+mu*dat)/log(1+mu));

Matlab code for Figure 7.8. Notice how the input is normalized to the range

[0, 1] before the calculations, and how the output is scaled from the interval

[0, 1] to [0, 128].

P

S2

S1

S0

Q3

Q2

Q1

Q0

Figure 7.9: G.711 µ-Law Codeword.

Q3 Q2 Q1 Q0

0

0

0

1

0

1

0

1

1

0

0

0

1

12

11

10

9

8

7

6

5

4

3

2

1

0

Figure 7.10: Encoding Input Sample −656.

7.4 µ-Law and A-Law Companding

707

positions 8–5, and the remaining ﬁve bits 10001 are ignored. The 8-bit codeword (which

is later inverted) becomes

P

S2

S1

S0

1

1

0

0

Q3 Q2 Q1 Q0

0

1

0

1

The µ-law decoder inputs an 8-bit codeword and inverts it. It then decodes it as

follows:

1.

2.

3.

4.

Multiply the quantization code by 2 and add 33 (the bias) to the result.

Multiply the result by 2 raised to the power of the segment code.

Decrement the result by the bias.

Use bit P to determine the sign of the result.

Applying these steps to our example produces

1. The quantization code is 1012 = 5, so 5×2 + 33 = 43.

2. The segment code is 1002 = 4, so 43×24 = 688.

3. Decrement by the bias 688 − 33 = 655.

4. Bit P is 1, so the ﬁnal result is −655. Thus, the quantization error (the noise) is 1;

very small.

Figure 7.11a illustrates the nature of the µ-law midtread quantization. Zero is one

of the valid output values, and the quantization steps are centered at the input value of

zero. The steps are organized in eight segments of 16 steps each. The steps within each

segment have the same width, but they double in width from one segment to the next.

If we denote the segment number by i (where i = 0, 1, . . . , 7) and the width of a segment

by k (where k = 1, 2, . . . , 16), then the middle of the tread of each step in Figure 7.11a

(i.e., the points labeled xj ) is given by

x(16i + k) = T (i) + k×D(i),

(7.3)

where the constants T (i) and D(i) are the initial value and the step size for segment i,

respectively. They are given by

i

T (i)

D(i)

0

1

2

1

35

4

2

103

8

3

239

16

4

511

32

5

1055

64

6

2143

128

7

4319

256

Table 7.12 lists some values of the breakpoints (points xj ) and outputs (points yj ) shown

in Figure 7.11a.

The operation of the A-law encoder is similar, except that the quantization (Figure 7.11b) is of the midriser variety.. The breakpoints xj are given by Equation (7.3),

but the initial value T (i) and the step size D(i) for segment i are diﬀerent from those

used by the µ-law encoder and are given by

i

T (i)

D(i)

0

0

2

1

32

2

2

64

4

3

128

8

4

256

16

5

512

32

6

1024

64

7

2048

128

708

7.

y4

y output

y output

y4

y3

y2

y3

quantization

y1

y0

−x2

Audio Compression

A-law midriser

quantization

x input

y2

x input

y1

−x2

x1 x2 x3 x4

−y1

x1 x2 x3 x4

−y2

(b)

(a)

Figure 7.11: (a) µ-Law Midtread Quantization. (b) A-law Midriser Quantization.

segment 0

break

points

segment 1

output

values

y0 = 0

x1 = 1

break

points

x2 = 3

y17 = 37

···

x18 = 39

y2 = 4

x3 = 5

y18 = 41

x19 = 43

y3 = 6

x4 = 7

···

···

x15 = 29

y19 = 45

x20 = 47

y31 = 93

x32 = 95

segment 7

break

points

···

···

···

···

x31 = 91

y15 = 28

x16 = 31

···

x17 = 35

y1 = 2

···

output

values

y16 = 33

···

output

values

y112 = 4191

x113 = 4319

y113 = 4447

x114 = 4575

y114 = 4703

x115 = 4831

y115 = 4959

x116 = 5087

x127 = 7903

y127 = 8031

x128 = 8159

Table 7.12: Speciﬁcation of the µ-Law Quantizer.

Table 7.13 lists some values of the breakpoints (points xj ) and outputs (points yj ) shown

in Figure 7.11b.

The A-law encoder generates an 8-bit codeword with the same format as the µ-law

encoder. It sets the P bit to the sign of the input sample. It then determines the segment

code by

1. Determining the bit position of the most signiﬁcant 1-bit among the seven most

signiﬁcant bits of the input.

2. If such a 1-bit is found, the segment code becomes that position minus 4. Otherwise,

the segment code becomes zero.

The 4-bit quantization code is set to the four bits following the bit position deter-

7.4 µ-Law and A-Law Companding

segment 0

break

points

x0 = 0

segment 1

output

values

y1 = 1

x1 = 2

y2 = 3

x2 = 4

y3 = 5

x3 = 6

···

···

x15 = 30

y4 = 7

break

output

points

values

x16 = 32

y17 = 33

x17 = 34

y18 = 35

x18 = 36

y19 = 37

x19 = 38

y20 = 39

x31 = 62

y16 = 31

y32 = 63

···

···

···

···

···

···

···

···

709

segment 7

break

points

x112 = 2048

output

values

y113 = 2112

x113 = 2176

y114 = 2240

x114 = 2304

y115 = 2368

x115 = 2432

y116 = 2496

x128 = 4096

y127 = 4032

Table 7.13: Speciﬁcation of the A-Law Quantizer.

mined in step 1, or to half the input value if the segment code is zero. The encoder ignores

the remaining bits of the input sample, and it inverts bit P and the even-numbered bits

of the codeword before it is output.

The A-law decoder decodes an 8-bit codeword into a 13-bit audio sample as follows:

1. It inverts bit P and the even-numbered bits of the codeword.

2. If the segment code is nonzero, the decoder multiplies the quantization code by 2 and

increments this by the bias (33). The result is then multiplied by 2 and raised to the

power of the (segment code minus 1). If the segment code is zero, the decoder outputs

twice the quantization code, plus 1.

3. Bit P is then used to determine the sign of the output.

Normally, the output codewords are generated by the encoder at the rate of 64 Kbps.

The G.711 standard also provides for two other rates, as follows:

1. To achieve an output rate of 48 Kbps, the encoder masks out the two least-signiﬁcant

bits of each codeword. This works, since 6/8 = 48/64.

2. To achieve an output rate of 56 Kpbs, the encoder masks out the least-signiﬁcant bit

of each codeword. This works, since 7/8 = 56/64 = 0.875.

This applies to both the µ-law and the A-law. The decoder typically ﬁlls up the

masked bit positions with zeros before decoding a codeword.

710

7.

Audio Compression

As always, compression is possible only because sound, and thus audio samples, tend to

have redundancies. Adjacent audio samples tend to be similar in much the same way that

neighboring pixels in an image tend to have similar colors. The simplest way to exploit

this redundancy is to subtract adjacent samples and code the diﬀerences, which tend

to be small numbers. Any audio compression method based on this principle is called

DPCM (diﬀerential pulse code modulation). Such methods, however, are ineﬃcient,

since they do not adapt themselves to the varying magnitudes of the audio stream.

Better results are achieved by an adaptive version, and any such version is called ADPCM

[ITU-T 90].

Similar to predictive image compression, ADPCM uses the previous sample (or

several previous samples) to predict the current sample. It then computes the diﬀerence between the current sample and its prediction, and quantizes the diﬀerence. For

each input sample X[n], the output C[n] of the encoder is simply a certain number of

quantization levels. The decoder multiplies this number by the quantization step (and

may add half the quantization step, for better precision) to obtain the reconstructed

audio sample. The method is eﬃcient because the quantization step is modiﬁed all the

time, by both encoder and decoder, in response to the varying magnitudes of the input

samples. It is also possible to modify adaptively the prediction algorithm.

Various ADPCM methods diﬀer in the way they predict the current sound sample

and in the way they adapt to the input (by changing the quantization step size and/or

the prediction method).

In addition to the quantized values, an ADPCM encoder can provide the decoder

with side information. This information increases the size of the compressed stream,

but this degradation is acceptable to the users, since it makes the compressed audio data

more useful. Typical applications of side information are (1) help the decoder recover

from errors and (2) signal an entry point into the compressed stream. An original audio

stream may be recorded in compressed form on a medium such as a CD-ROM. If the

user (listener) wants to listen to song 5, the decoder can use the side information to

quickly ﬁnd the start of that song.

Figure 7.14a,b shows the general organization of the ADPCM encoder and decoder.

Notice that they share two functional units, a feature that helps in both software and

the current input sample X[n] and the prediction Xp[n − 1]. The quantizer computes

and outputs the quantized code C[n] of X[n]. The same code is sent to the adaptive

dequantizer (the same dequantizer used by the decoder), which produces the next dequantized diﬀerence value Dq[n]. This value is added to the previous predictor output

Xp[n − 1], and the sum Xp[n] is sent to the predictor to be used in the next step.

Better prediction would be obtained by feeding the actual input X[n] to the predictor. However, the decoder wouldn’t be able to mimic that, since it does not have X[n].

We see that the basic ADPCM encoder is simple, but the decoder is even simpler. It

inputs a code C[n], dequantizes it to a diﬀerence Dq[n], which is added to the preceding

predictor output Xp[n − 1] to form the next output Xp[n]. The next output is also fed

into the predictor, to be used in the next step.

X[n]

+

Xp[n-1]

+

D[n]

711

C[n]

quantizer

+

+

predictor

quantizer

+

(a)

C[n]

Dq[n]

+

dequantizer

Xp[n-1]

Xp[n]

predictor

(b)

Figure 7.14: (a) ADPCM Encoder and (b) Decoder.

Multimedia Association (IMA). The IMA is a consortium of computer hardware and

software manufacturers, established to develop standards for multimedia applications.

The goal of the IMA in developing its audio compression standard was to have a public

domain method that is simple and fast enough such that a 20-MHz 386-class personal

computer would be able to decode, in real time, sound recorded in stereo at 44,100 16-bit

samples per second (this is 88,200 16-bit samples per second).

The encoder quantizes each 16-bit audio sample into a 4-bit code. The compression

factor is thus a constant 4.

The “secret” of the IMA algorithm is the simplicity of its predictor. The predicted

value Xp[n − 1] that is output by the predictor is simply the decoded value Xp[n] of the

preceding input X[n]. The predictor just stores Xp[n] for one cycle (one audio sample

interval), then outputs it as Xp[n − 1]. It does not use any of the preceding values Xp[i]

to obtain better prediction. Thus, the predictor is not adaptive (but the quantizer is).

Also, no side information is generated by the encoder.

Figure 7.15a is a block diagram of the IMA quantizer. It is both simple and adaptive,

varying the quantization step size based on both the current step size and the previous

quantizer output. The adaptation is done by means of two table lookups, so it is fast.

The quantizer outputs 4-bit codes where the leftmost bit is a sign and the remaining

three bits are the number of quantization levels computed for the current audio sample.

These three bits are used as an index to the ﬁrst table. The item found in this table

previously stored index, and the sum, after being checked for proper range, is used as

the index for the second table lookup. The sum is then stored, and it becomes the stored

index used in the next adaptation step. The item found in the second table becomes

the new quantization step size. Figure 7.15b illustrates this process, and Tables 7.17

and 7.18 list the two tables. Table 7.16 shows the 4-bit output produced by the quantizer

712

7.

Audio Compression

as a function of the sample size. For example, if the sample is in the range [1.5ss, 1.75ss),

where ss is the step size, then the output is 0|110.

Table 7.17 adjusts the index by bigger steps when the quantized magnitude is bigger.

Table 7.18 is constructed such that the ratio between successive entries is about 1.1.

ADPCM: Short for Adaptive Diﬀerential Pulse Code Modulation, a form of pulse

code modulation (PCM) that produces a digital signal with a lower bit rate than

standard PCM. ADPCM produces a lower bit rate by recording only the diﬀerence

between samples and adjusting the coding scale dynamically to accommodate large

and small diﬀerences.

—From Webopedia.com

7.6 MLP Audio

Note. The MLP audio compression method described in this section is diﬀerent from

and unrelated to the MLP (multilevel progressive) image compression method of Section 4.21. The identical acronyms are an unfortunate coincidence.

Meridian [Meridian 03] is a British company specializing in high-quality audio products, such as CD and DVD players, loudspeakers, radio tuners, and surround stereo

ampliﬁers. Good-quality digitized sound normally employs two channels (stereo sound),

each sampled at 44.1 kHz with 16-bit samples (Section 7.2). This is, for example, the

sound quality of an audio CD. A typical high-quality digitized sound, on the other hand,

may use six channels (i.e., the sound is originally recorded by six microphones, for surround sound), sampled at the high rate of 96 kHz (to ensure that all the nuances of the

performance can be delivered), with 24-bit samples (to get the highest possible dynamic

range). This kind of audio data is represented by 6×96000×24 = 13.824 Mbps (that’s

megabits, not megabytes). In contrast, a DVD (digital versatile disc) holds 4.7 Gbytes,

which at 13.824 Mbps in uncompressed form translates to only 45 min of playing. (Recall that even CDs, which have a much smaller capacity, hold 74 min of play time. This

is an industry standard.) Also, the maximum data transfer rate for DVD-A (audio)

is 9.6 Mbps, much lower than 13.824 Mbps. It is obvious that compression is the key

to achieving a practical DVD-A format, but the high quality (as opposed to just good

quality) requires lossless compression.

The algorithm that has been selected as the compression standard for DVD-A (audio) is MLP (Meridian Lossless Packing). This algorithm is patented and some of its

details are still kept secret, which is reﬂected in the information provided in this section.

The term “packing” has a dual meaning. It refers to (1) removing redundancy from the

original data in order to “pack” it densely, and (2) the audio samples are encoded in

packets.

MLP operates by reducing or completely removing redundancies in the digitized

sound, without any quantization or other loss of data. Notice that high-quality audio

formats such as 96 kHz with 24-bit samples carry more information than is strictly

necessary for the human listener (or more than is available from modern microphone

and converter techniques). Thus, such audio formats contain much redundancy and can

be compressed eﬃciently. MLP can handle up to 63 audio channels and sampling rates

of up to 192 kHz.

7.6 MLP Audio

713

start

sample<0 ?

yes bit3←1

sample←−sample

no

sample≥

step size ?

bit3←0

yes

no

bit2←1

sample←

sample−step size

yes bit1←1

sample≥

sample←

step size/2 ?

sample−step size/2

bit2←0

no

sample≥

step size/4 ?

no

bit1←0

bit0←0

(a)

ls 3 bits of

quantizer

output

first table

lookup

index

limit idex to

[0,88]

+

yes

second table

lookup

save index for

new

step

size

(b)

If sample

is in range

4-Bit

quant

If sample

is in range

4-Bit

quant

[1.75ss, ∞)

[1.5ss, 1.75ss)

[1.25ss, 1.5ss)

[1ss, 1.25ss)

[.75ss, 1ss)

[.5ss, .75ss)

[.25ss, .5ss)

[0, .25ss)

0|111

0|110

0|101

0|100

0|011

0|010

0|001

0|000

[−∞, −1.75ss)

[−1.75ss, −1.5ss)

[−1.5ss, −1.25ss)

[−1.25ss, −1ss)

[−1ss, −.75ss)

[−.75ss, −.5ss)

[−.5ss, −.25ss)

[−.25ss, 0)

1|111

1|110

1|101

1|100

1|011

1|010

1|001

1|000

Table 7.16: Step Size and 4-Bit Quantizer Outputs.

bit0←1

done

714

7.

Audio Compression

three bits

quantized

magnitude

index

−1

−1

−1

−1

2

4

6

8

000

001

010

011

100

101

110

111

Table 7.17: First Table for IMA ADPCM.

Index Step Size

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

7

8

9

10

11

12

13

14

16

17

19

21

23

25

28

31

34

37

41

45

50

55

Index Step Size

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

60

66

73

80

88

97

107

118

130

143

157

173

190

209

230

253

279

307

337

371

408

449

Index Step Size

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

494

544

598

658

724

796

876

963

1,060

1,166

1,282

1,411

1,552

1,707

1,878

2,066

2,272

2,499

2,749

3,024

3,327

3,660

Index Step Size

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

Table 7.18: Second Table for IMA ADPCM.

4,026

4,428

4,871

5,358

5,894

6,484

7,132

7,845

8,630

9,493

10,442

11,487

12,635

13,899

15,289

16,818

18,500

20,350

22,358

24,633

27,086

29,794

32,767

7.6 MLP Audio

715

The main features of MLP are as follows:

1. At least 4 bits/sample of compression for both average and peak data rates.

2. Easy transformation between ﬁxed-rate and variable-rate data streams.

3. Careful and economical handling of mixed input sample rates.

4. Simple, fast decoding.

5. It is cascadable. An audio stream can be encoded and decoded multiple times

in succession and the output will always be an exact copy of the original. With MLP,

there no generation loss.

The term “variable data rate” is important. An uncompressed data stream consists

of audio samples and each second of sound requires the same number of samples. Such a

stream has a ﬁxed data rate. In contrast, the compressed stream generated by a lossless

audio encoder has a variable data rate; each second of sound occupies a diﬀerent number

of bits in this stream, depending on the nature of the sound. A second of silence occupies

very few bits, whereas a second of random sound will not compress and will require the

same number of bits in the compressed stream as in the original ﬁle. Most lossless audio

compression methods are designed to reduce the average data rate, but MLP has the

important feature that it reduces the instantaneous peak data rate by a known amount.

This feature makes it possible to record 74 min of any kind of nonrandom sound on a

4.7-Gbyte DVD-A.

In addition to being lossless (which means that the original data is delivered bit-forbit at the playback), MLP is also robust. It does not include any error-correcting code

but has error-protection features. It uses check bits to make sure that each packet decompressed by the decoder is identical to that compressed by the encoder. The compressed

stream contains restart points, placed at intervals of 10–30 ms. When the decoder notices an error, it simply skips to the next restart point, with a minimal loss of sound.

This is another meaning of the term “high-quality sound.” For the ﬁrst time, a listener

hears exactly what the composer/performer intended—bit-for-bit and note-for-note.

With lossy audio compression, the amount of compression is measured by the number of bits per second of sound in the compressed stream, regardless of the audio-sample

size. With lossless compression, a large sample size (which really means more leastsigniﬁcant bits), must be losslessly compressed, so it increases the size of the compressed

stream, but the extra LSBs typically have little redundancy and are thus harder to compress. This is why lossless audio compression should be measured by the number of bits

saved in each audio sample—a relative measure of compression.

MLP reduces the audio samples from their original size (typically 24 bits) depending

on the sampling rate. For average data rates, the reduction is as follows: For sampling

rates of 44.1 kHz and 48 kHz, a sample is reduced by 5 to 11 bits. At 88.2 kHz and

96 kHz, the reduction increases to 9 to 13 bits. At 192 kHz, MLP can sometimes reduce

a sample by 14 bits. Even more important are the savings for peak data rates. They

are 4 bits for 44.1 kHz, 8 bits for 96 kHz, and 9 bits for 192 kHz samples. These peak

data rate savings amount to a virtual guarantee, and they are one of the main reasons

for the adoption of MLP as the DVD-A compression standard.

The remainder of this section covers some of the details of MLP. It is based on

[Stuart et al. 99]. The techniques used by MLP to compress audio samples include:

1. It looks for blocks of consecutive audio samples that are small, i.e., have several

### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

4 ¦Ì-Law and A-Law Companding

Tải bản đầy đủ ngay(0 tr)

×
x