4 ¦Ì-Law and A-Law Companding
Tải bản đầy đủ - 0trang
7.4 µ-Law and A-Law Companding
705
encoder uses the logarithmic expression
sgn(x)
ln(1 + µ|x|)
, where sgn(x) =
ln(1 + µ)
+1,
0,
−1,
x > 0,
x = 0,
x < 0,
(and µ is a positive integer), to compute and output an 8-bit code in the same interval
[−1, +1]. The output is then scaled to the range [−256, +255]. Figure 7.8 shows this
output as a function of the input for the three µ values 25, 255, and 2555. It is clear that
large values of µ cause coarser quantization for larger amplitudes. Such values allocate
more bits to the smaller, more important, amplitudes. The G.711 standard recommends
the use of µ = 255. The diagram shows only the nonnegative values of the input (i.e.,
from 0 to 8191). The negative side of the diagram has the same shape but with negative
inputs and outputs.
The A-law encoder uses the similar expression
A|x|
sgn(x) 1 + ln(A) ,
for 0 ≤ |x| <
sgn(x) 1 + ln(A|x|) , for
1 + ln(A)
1
A
1
A,
≤ |x| < 1.
The G.711 standard recommends the use of A = 255.
The following simple calculations illustrate the nonlinear nature of the µ-law. The
two (normalized) input samples 0.15 and 0.16 are transformed by µ-law to outputs
0.6618 and 0.6732. The diﬀerence between the outputs is 0.0114. On the other hand,
the two input samples 0.95 and 0.96 (bigger inputs but with the same diﬀerence) are
transformed to 0.9908 and 0.9927. The diﬀerence between these two outputs is 0.0019;
much smaller.
Bigger samples are decoded with more noise, and smaller samples are decoded with
less noise. However, the signal-to-noise ratio (SNR, Section 4.2.2) is constant because
both the µ-law and the SNR use logarithmic expressions.
Logarithms are slow to calculate, so the µ-law encoder performs much simpler calculations that produce an approximation. The output speciﬁed by the G.711 standard
is an 8-bit codeword whose format is shown in Figure 7.9.
Bit P in Figure 7.9 is the sign bit of the output (same as the sign bit of the 14-bit
signed input sample). Bits S2, S1, and S0 are the segment code, and bits Q3 through
Q0 are the quantization code. The encoder determines the segment code by (1) adding
a bias of 33 to the absolute value of the input sample, (2) determining the bit position
of the most signiﬁcant 1-bit among bits 5 through 12 of the input, and (3) subtracting
5 from that position. The 4-bit quantization code is set to the four bits following the
bit position determined in step 2. The encoder ignores the remaining bits of the input
sample, and it inverts (1’s complements) the codeword before it is output.
We use the input sample −656 as an example. The sample is negative, so bit P
becomes 1. Adding 33 to the absolute value of the input yields 689 = 00010101100012
(Figure 7.10). The most signiﬁcant 1-bit in positions 5 through 12 is found at position
9. The segment code is thus 9 − 5 = 4. The quantization code is the four bits 0101 at
706
7.
Audio Compression
128
112
2555
96
255
80
25
64
48
32
16
0
2000
4000
6000
8000
Figure 7.8: The µ-Law for µ Values of 25, 255, and 2555.
dat=linspace(0,1,1000);
mu=255;
plot(dat*8159,128*log(1+mu*dat)/log(1+mu));
Matlab code for Figure 7.8. Notice how the input is normalized to the range
[0, 1] before the calculations, and how the output is scaled from the interval
[0, 1] to [0, 128].
P
S2
S1
S0
Q3
Q2
Q1
Q0
Figure 7.9: G.711 µ-Law Codeword.
Q3 Q2 Q1 Q0
0
0
0
1
0
1
0
1
1
0
0
0
1
12
11
10
9
8
7
6
5
4
3
2
1
0
Figure 7.10: Encoding Input Sample −656.
7.4 µ-Law and A-Law Companding
707
positions 8–5, and the remaining ﬁve bits 10001 are ignored. The 8-bit codeword (which
is later inverted) becomes
P
S2
S1
S0
1
1
0
0
Q3 Q2 Q1 Q0
0
1
0
1
The µ-law decoder inputs an 8-bit codeword and inverts it. It then decodes it as
follows:
1.
2.
3.
4.
Multiply the quantization code by 2 and add 33 (the bias) to the result.
Multiply the result by 2 raised to the power of the segment code.
Decrement the result by the bias.
Use bit P to determine the sign of the result.
Applying these steps to our example produces
1. The quantization code is 1012 = 5, so 5×2 + 33 = 43.
2. The segment code is 1002 = 4, so 43×24 = 688.
3. Decrement by the bias 688 − 33 = 655.
4. Bit P is 1, so the ﬁnal result is −655. Thus, the quantization error (the noise) is 1;
very small.
Figure 7.11a illustrates the nature of the µ-law midtread quantization. Zero is one
of the valid output values, and the quantization steps are centered at the input value of
zero. The steps are organized in eight segments of 16 steps each. The steps within each
segment have the same width, but they double in width from one segment to the next.
If we denote the segment number by i (where i = 0, 1, . . . , 7) and the width of a segment
by k (where k = 1, 2, . . . , 16), then the middle of the tread of each step in Figure 7.11a
(i.e., the points labeled xj ) is given by
x(16i + k) = T (i) + k×D(i),
(7.3)
where the constants T (i) and D(i) are the initial value and the step size for segment i,
respectively. They are given by
i
T (i)
D(i)
0
1
2
1
35
4
2
103
8
3
239
16
4
511
32
5
1055
64
6
2143
128
7
4319
256
Table 7.12 lists some values of the breakpoints (points xj ) and outputs (points yj ) shown
in Figure 7.11a.
The operation of the A-law encoder is similar, except that the quantization (Figure 7.11b) is of the midriser variety.. The breakpoints xj are given by Equation (7.3),
but the initial value T (i) and the step size D(i) for segment i are diﬀerent from those
used by the µ-law encoder and are given by
i
T (i)
D(i)
0
0
2
1
32
2
2
64
4
3
128
8
4
256
16
5
512
32
6
1024
64
7
2048
128
708
7.
y4
y output
y output
y4
y3
µ-law midtread
y2
y3
quantization
y1
y0
−x2
Audio Compression
A-law midriser
quantization
x input
y2
x input
y1
−x2
x1 x2 x3 x4
−y1
x1 x2 x3 x4
−y2
(b)
(a)
Figure 7.11: (a) µ-Law Midtread Quantization. (b) A-law Midriser Quantization.
segment 0
break
points
segment 1
output
values
y0 = 0
x1 = 1
break
points
x2 = 3
y17 = 37
···
x18 = 39
y2 = 4
x3 = 5
y18 = 41
x19 = 43
y3 = 6
x4 = 7
···
···
x15 = 29
y19 = 45
x20 = 47
y31 = 93
x32 = 95
segment 7
break
points
···
···
···
···
x31 = 91
y15 = 28
x16 = 31
···
x17 = 35
y1 = 2
···
output
values
y16 = 33
···
output
values
y112 = 4191
x113 = 4319
y113 = 4447
x114 = 4575
y114 = 4703
x115 = 4831
y115 = 4959
x116 = 5087
x127 = 7903
y127 = 8031
x128 = 8159
Table 7.12: Speciﬁcation of the µ-Law Quantizer.
Table 7.13 lists some values of the breakpoints (points xj ) and outputs (points yj ) shown
in Figure 7.11b.
The A-law encoder generates an 8-bit codeword with the same format as the µ-law
encoder. It sets the P bit to the sign of the input sample. It then determines the segment
code by
1. Determining the bit position of the most signiﬁcant 1-bit among the seven most
signiﬁcant bits of the input.
2. If such a 1-bit is found, the segment code becomes that position minus 4. Otherwise,
the segment code becomes zero.
The 4-bit quantization code is set to the four bits following the bit position deter-
7.4 µ-Law and A-Law Companding
segment 0
break
points
x0 = 0
segment 1
output
values
y1 = 1
x1 = 2
y2 = 3
x2 = 4
y3 = 5
x3 = 6
···
···
x15 = 30
y4 = 7
break
output
points
values
x16 = 32
y17 = 33
x17 = 34
y18 = 35
x18 = 36
y19 = 37
x19 = 38
y20 = 39
x31 = 62
y16 = 31
y32 = 63
···
···
···
···
···
···
···
···
709
segment 7
break
points
x112 = 2048
output
values
y113 = 2112
x113 = 2176
y114 = 2240
x114 = 2304
y115 = 2368
x115 = 2432
y116 = 2496
x128 = 4096
y127 = 4032
Table 7.13: Speciﬁcation of the A-Law Quantizer.
mined in step 1, or to half the input value if the segment code is zero. The encoder ignores
the remaining bits of the input sample, and it inverts bit P and the even-numbered bits
of the codeword before it is output.
The A-law decoder decodes an 8-bit codeword into a 13-bit audio sample as follows:
1. It inverts bit P and the even-numbered bits of the codeword.
2. If the segment code is nonzero, the decoder multiplies the quantization code by 2 and
increments this by the bias (33). The result is then multiplied by 2 and raised to the
power of the (segment code minus 1). If the segment code is zero, the decoder outputs
twice the quantization code, plus 1.
3. Bit P is then used to determine the sign of the output.
Normally, the output codewords are generated by the encoder at the rate of 64 Kbps.
The G.711 standard also provides for two other rates, as follows:
1. To achieve an output rate of 48 Kbps, the encoder masks out the two least-signiﬁcant
bits of each codeword. This works, since 6/8 = 48/64.
2. To achieve an output rate of 56 Kpbs, the encoder masks out the least-signiﬁcant bit
of each codeword. This works, since 7/8 = 56/64 = 0.875.
This applies to both the µ-law and the A-law. The decoder typically ﬁlls up the
masked bit positions with zeros before decoding a codeword.
710
7.
Audio Compression
7.5 ADPCM Audio Compression
As always, compression is possible only because sound, and thus audio samples, tend to
have redundancies. Adjacent audio samples tend to be similar in much the same way that
neighboring pixels in an image tend to have similar colors. The simplest way to exploit
this redundancy is to subtract adjacent samples and code the diﬀerences, which tend
to be small numbers. Any audio compression method based on this principle is called
DPCM (diﬀerential pulse code modulation). Such methods, however, are ineﬃcient,
since they do not adapt themselves to the varying magnitudes of the audio stream.
Better results are achieved by an adaptive version, and any such version is called ADPCM
[ITU-T 90].
Similar to predictive image compression, ADPCM uses the previous sample (or
several previous samples) to predict the current sample. It then computes the diﬀerence between the current sample and its prediction, and quantizes the diﬀerence. For
each input sample X[n], the output C[n] of the encoder is simply a certain number of
quantization levels. The decoder multiplies this number by the quantization step (and
may add half the quantization step, for better precision) to obtain the reconstructed
audio sample. The method is eﬃcient because the quantization step is modiﬁed all the
time, by both encoder and decoder, in response to the varying magnitudes of the input
samples. It is also possible to modify adaptively the prediction algorithm.
Various ADPCM methods diﬀer in the way they predict the current sound sample
and in the way they adapt to the input (by changing the quantization step size and/or
the prediction method).
In addition to the quantized values, an ADPCM encoder can provide the decoder
with side information. This information increases the size of the compressed stream,
but this degradation is acceptable to the users, since it makes the compressed audio data
more useful. Typical applications of side information are (1) help the decoder recover
from errors and (2) signal an entry point into the compressed stream. An original audio
stream may be recorded in compressed form on a medium such as a CD-ROM. If the
user (listener) wants to listen to song 5, the decoder can use the side information to
quickly ﬁnd the start of that song.
Figure 7.14a,b shows the general organization of the ADPCM encoder and decoder.
Notice that they share two functional units, a feature that helps in both software and
hardware implementations. The adaptive quantizer receives the diﬀerence D[n] between
the current input sample X[n] and the prediction Xp[n − 1]. The quantizer computes
and outputs the quantized code C[n] of X[n]. The same code is sent to the adaptive
dequantizer (the same dequantizer used by the decoder), which produces the next dequantized diﬀerence value Dq[n]. This value is added to the previous predictor output
Xp[n − 1], and the sum Xp[n] is sent to the predictor to be used in the next step.
Better prediction would be obtained by feeding the actual input X[n] to the predictor. However, the decoder wouldn’t be able to mimic that, since it does not have X[n].
We see that the basic ADPCM encoder is simple, but the decoder is even simpler. It
inputs a code C[n], dequantizes it to a diﬀerence Dq[n], which is added to the preceding
predictor output Xp[n − 1] to form the next output Xp[n]. The next output is also fed
into the predictor, to be used in the next step.
The following describes the particular ADPCM algorithm adopted by the Interactive
7.5 ADPCM Audio Compression
X[n]
+
Xp[n-1]
+
D[n]
−
711
C[n]
adaptive
quantizer
adaptive Xp[n] Dq[n] adaptive
+
+
predictor
quantizer
+
(a)
C[n]
Dq[n]
adaptive
+
dequantizer
Xp[n-1]
Xp[n]
adaptive
predictor
(b)
Figure 7.14: (a) ADPCM Encoder and (b) Decoder.
Multimedia Association (IMA). The IMA is a consortium of computer hardware and
software manufacturers, established to develop standards for multimedia applications.
The goal of the IMA in developing its audio compression standard was to have a public
domain method that is simple and fast enough such that a 20-MHz 386-class personal
computer would be able to decode, in real time, sound recorded in stereo at 44,100 16-bit
samples per second (this is 88,200 16-bit samples per second).
The encoder quantizes each 16-bit audio sample into a 4-bit code. The compression
factor is thus a constant 4.
The “secret” of the IMA algorithm is the simplicity of its predictor. The predicted
value Xp[n − 1] that is output by the predictor is simply the decoded value Xp[n] of the
preceding input X[n]. The predictor just stores Xp[n] for one cycle (one audio sample
interval), then outputs it as Xp[n − 1]. It does not use any of the preceding values Xp[i]
to obtain better prediction. Thus, the predictor is not adaptive (but the quantizer is).
Also, no side information is generated by the encoder.
Figure 7.15a is a block diagram of the IMA quantizer. It is both simple and adaptive,
varying the quantization step size based on both the current step size and the previous
quantizer output. The adaptation is done by means of two table lookups, so it is fast.
The quantizer outputs 4-bit codes where the leftmost bit is a sign and the remaining
three bits are the number of quantization levels computed for the current audio sample.
These three bits are used as an index to the ﬁrst table. The item found in this table
serves as an index adjustment to the second table. The index adjustment is added to a
previously stored index, and the sum, after being checked for proper range, is used as
the index for the second table lookup. The sum is then stored, and it becomes the stored
index used in the next adaptation step. The item found in the second table becomes
the new quantization step size. Figure 7.15b illustrates this process, and Tables 7.17
and 7.18 list the two tables. Table 7.16 shows the 4-bit output produced by the quantizer
712
7.
Audio Compression
as a function of the sample size. For example, if the sample is in the range [1.5ss, 1.75ss),
where ss is the step size, then the output is 0|110.
Table 7.17 adjusts the index by bigger steps when the quantized magnitude is bigger.
Table 7.18 is constructed such that the ratio between successive entries is about 1.1.
ADPCM: Short for Adaptive Diﬀerential Pulse Code Modulation, a form of pulse
code modulation (PCM) that produces a digital signal with a lower bit rate than
standard PCM. ADPCM produces a lower bit rate by recording only the diﬀerence
between samples and adjusting the coding scale dynamically to accommodate large
and small diﬀerences.
—From Webopedia.com
7.6 MLP Audio
Note. The MLP audio compression method described in this section is diﬀerent from
and unrelated to the MLP (multilevel progressive) image compression method of Section 4.21. The identical acronyms are an unfortunate coincidence.
Meridian [Meridian 03] is a British company specializing in high-quality audio products, such as CD and DVD players, loudspeakers, radio tuners, and surround stereo
ampliﬁers. Good-quality digitized sound normally employs two channels (stereo sound),
each sampled at 44.1 kHz with 16-bit samples (Section 7.2). This is, for example, the
sound quality of an audio CD. A typical high-quality digitized sound, on the other hand,
may use six channels (i.e., the sound is originally recorded by six microphones, for surround sound), sampled at the high rate of 96 kHz (to ensure that all the nuances of the
performance can be delivered), with 24-bit samples (to get the highest possible dynamic
range). This kind of audio data is represented by 6×96000×24 = 13.824 Mbps (that’s
megabits, not megabytes). In contrast, a DVD (digital versatile disc) holds 4.7 Gbytes,
which at 13.824 Mbps in uncompressed form translates to only 45 min of playing. (Recall that even CDs, which have a much smaller capacity, hold 74 min of play time. This
is an industry standard.) Also, the maximum data transfer rate for DVD-A (audio)
is 9.6 Mbps, much lower than 13.824 Mbps. It is obvious that compression is the key
to achieving a practical DVD-A format, but the high quality (as opposed to just good
quality) requires lossless compression.
The algorithm that has been selected as the compression standard for DVD-A (audio) is MLP (Meridian Lossless Packing). This algorithm is patented and some of its
details are still kept secret, which is reﬂected in the information provided in this section.
The term “packing” has a dual meaning. It refers to (1) removing redundancy from the
original data in order to “pack” it densely, and (2) the audio samples are encoded in
packets.
MLP operates by reducing or completely removing redundancies in the digitized
sound, without any quantization or other loss of data. Notice that high-quality audio
formats such as 96 kHz with 24-bit samples carry more information than is strictly
necessary for the human listener (or more than is available from modern microphone
and converter techniques). Thus, such audio formats contain much redundancy and can
be compressed eﬃciently. MLP can handle up to 63 audio channels and sampling rates
of up to 192 kHz.
7.6 MLP Audio
713
start
sample<0 ?
yes bit3←1
sample←−sample
no
sample≥
step size ?
bit3←0
yes
no
bit2←1
sample←
sample−step size
yes bit1←1
sample≥
sample←
step size/2 ?
sample−step size/2
bit2←0
no
sample≥
step size/4 ?
no
bit1←0
bit0←0
(a)
ls 3 bits of
quantizer
output
first table
lookup
adjust
index
limit idex to
[0,88]
+
yes
second table
lookup
save index for
next adaptation
new
step
size
(b)
Figure 7.15: (a) IMA ADPCM Quantization. (b) Step Size Adaptation.
If sample
is in range
4-Bit
quant
If sample
is in range
4-Bit
quant
[1.75ss, ∞)
[1.5ss, 1.75ss)
[1.25ss, 1.5ss)
[1ss, 1.25ss)
[.75ss, 1ss)
[.5ss, .75ss)
[.25ss, .5ss)
[0, .25ss)
0|111
0|110
0|101
0|100
0|011
0|010
0|001
0|000
[−∞, −1.75ss)
[−1.75ss, −1.5ss)
[−1.5ss, −1.25ss)
[−1.25ss, −1ss)
[−1ss, −.75ss)
[−.75ss, −.5ss)
[−.5ss, −.25ss)
[−.25ss, 0)
1|111
1|110
1|101
1|100
1|011
1|010
1|001
1|000
Table 7.16: Step Size and 4-Bit Quantizer Outputs.
bit0←1
done
714
7.
Audio Compression
three bits
quantized
magnitude
index
adjust
−1
−1
−1
−1
2
4
6
8
000
001
010
011
100
101
110
111
Table 7.17: First Table for IMA ADPCM.
Index Step Size
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
7
8
9
10
11
12
13
14
16
17
19
21
23
25
28
31
34
37
41
45
50
55
Index Step Size
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
60
66
73
80
88
97
107
118
130
143
157
173
190
209
230
253
279
307
337
371
408
449
Index Step Size
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
494
544
598
658
724
796
876
963
1,060
1,166
1,282
1,411
1,552
1,707
1,878
2,066
2,272
2,499
2,749
3,024
3,327
3,660
Index Step Size
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
Table 7.18: Second Table for IMA ADPCM.
4,026
4,428
4,871
5,358
5,894
6,484
7,132
7,845
8,630
9,493
10,442
11,487
12,635
13,899
15,289
16,818
18,500
20,350
22,358
24,633
27,086
29,794
32,767
7.6 MLP Audio
715
The main features of MLP are as follows:
1. At least 4 bits/sample of compression for both average and peak data rates.
2. Easy transformation between ﬁxed-rate and variable-rate data streams.
3. Careful and economical handling of mixed input sample rates.
4. Simple, fast decoding.
5. It is cascadable. An audio stream can be encoded and decoded multiple times
in succession and the output will always be an exact copy of the original. With MLP,
there no generation loss.
The term “variable data rate” is important. An uncompressed data stream consists
of audio samples and each second of sound requires the same number of samples. Such a
stream has a ﬁxed data rate. In contrast, the compressed stream generated by a lossless
audio encoder has a variable data rate; each second of sound occupies a diﬀerent number
of bits in this stream, depending on the nature of the sound. A second of silence occupies
very few bits, whereas a second of random sound will not compress and will require the
same number of bits in the compressed stream as in the original ﬁle. Most lossless audio
compression methods are designed to reduce the average data rate, but MLP has the
important feature that it reduces the instantaneous peak data rate by a known amount.
This feature makes it possible to record 74 min of any kind of nonrandom sound on a
4.7-Gbyte DVD-A.
In addition to being lossless (which means that the original data is delivered bit-forbit at the playback), MLP is also robust. It does not include any error-correcting code
but has error-protection features. It uses check bits to make sure that each packet decompressed by the decoder is identical to that compressed by the encoder. The compressed
stream contains restart points, placed at intervals of 10–30 ms. When the decoder notices an error, it simply skips to the next restart point, with a minimal loss of sound.
This is another meaning of the term “high-quality sound.” For the ﬁrst time, a listener
hears exactly what the composer/performer intended—bit-for-bit and note-for-note.
With lossy audio compression, the amount of compression is measured by the number of bits per second of sound in the compressed stream, regardless of the audio-sample
size. With lossless compression, a large sample size (which really means more leastsigniﬁcant bits), must be losslessly compressed, so it increases the size of the compressed
stream, but the extra LSBs typically have little redundancy and are thus harder to compress. This is why lossless audio compression should be measured by the number of bits
saved in each audio sample—a relative measure of compression.
MLP reduces the audio samples from their original size (typically 24 bits) depending
on the sampling rate. For average data rates, the reduction is as follows: For sampling
rates of 44.1 kHz and 48 kHz, a sample is reduced by 5 to 11 bits. At 88.2 kHz and
96 kHz, the reduction increases to 9 to 13 bits. At 192 kHz, MLP can sometimes reduce
a sample by 14 bits. Even more important are the savings for peak data rates. They
are 4 bits for 44.1 kHz, 8 bits for 96 kHz, and 9 bits for 192 kHz samples. These peak
data rate savings amount to a virtual guarantee, and they are one of the main reasons
for the adoption of MLP as the DVD-A compression standard.
The remainder of this section covers some of the details of MLP. It is based on
[Stuart et al. 99]. The techniques used by MLP to compress audio samples include:
1. It looks for blocks of consecutive audio samples that are small, i.e., have several