doc/ietf/draft-valin-celt-codec.xml


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701

<?xml version='1.0'?>
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd'>
<?rfc toc="yes" symrefs="yes" ?>

<rfc ipr="trust200902" category="info" docName="draft-valin-celt-codec-00">

<front>
<title abbrev="CELT codec">Constrained-Energy Lapped Transform (CELT) Codec</title>


<author initials="J-M" surname="Valin" fullname="Jean-Marc Valin">
<organization>Octasic Semiconductor</organization>
<address>
<postal>
<street>4101, Molson Street, suite 300</street>
<city>Montreal</city>
<region>Quebec</region>
<code>H1Y 3L1</code>
<country>Canada</country>
</postal>
<email>jean-marc.valin@octasic.com</email>
</address>
</author>

<author initials="T" surname="Terriberry" fullname="Timothy B. Terriberry">
<organization>Xiph.Org Foundation</organization>
<address>
<postal>
<street></street>
<city></city>
<region></region>
<code></code>
<country></country>
</postal>
<email>tterribe@xiph.org</email>
</address>
</author>

<author initials="G" surname="Maxwell" fullname="Gregory Maxwell">
<organization>Juniper Networks</organization>
<address>
<postal>
<street>2251 Corporate Park Drive, Suite 100</street>
<city>Herndon</city>
<region>VA</region>
<code>20171-1817</code>
<country>USA</country>
</postal>
<email>gmaxwell@juniper.net</email>
</address>
</author>

<!-- <author initials="et" surname="al." fullname="et al.">
<organization></organization>
</author>
-->

<date day="8" month="June" year="2009" />

<area>General</area>

<workgroup>AVT Working Group</workgroup>
<keyword>audio codec</keyword>
<keyword>low delay</keyword>
<keyword>Internet-Draft</keyword>
<keyword>CELT</keyword>

<abstract>
<t>
CELT <xref target="celt-website"/> is an open-source voice codec suitable for use in very low delay 
Voice over IP (VoIP) type applications.  This document describes the encoding
and decoding process. 
</t>
</abstract>
</front>

<middle>

<section anchor="Introduction" title="Introduction">
<t>
This document describes the CELT codec, which is designed for transmitting full-bandwidth
audio with very low delay. It is suitable for encoding both
speech and music and rates starting at 32 kbit/s. It is primarly designed for transmission
over packet networks and protocols such as RTP <xref target="rfc3550"/>, but also includes
a certain amount of robustness to bit errors, where this could be done at no significant
cost. The codec features are:
</t>

<t>
<list style="symbols">
<t>Ultra-low algorithmic delay (typically 3 to 9 ms)</t>
<t>Full audio bandwidth (44.1 kHz and 48 kHz)</t>
<t>Support for both voice and music</t>
<t>Stereo support</t>
<t>Packet loss concealment</t>
<t>Constant bit-rates from 32 kbps to 128 kbps and above</t>
<t>Free software/open-source/royalty-free</t>
</list>
</t>

<t>The novel aspect of CELT compared to most other codecs is its very low delay,
below 10 ms. There are two main advantages to having a very low delay audio link.
The lower delay itself is important some interactions, such as playing music
remotely. Another advantage is the behaviour in presence of acoustic echo. When
the round-trip audio delay is sufficiently low, acoustic echo is no longer
perceived as a distinct repetition, but as extra reverberation. Applications
of CELT include:</t>
<t>
<list style="symbols">
<t>Live network music performance</t>
<t>High-quality teleconferencing</t>
<t>Wireless audio equipment</t>
<t>Low-delay links for broadcast applications</t>
</list>
</t>

<t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 <xref target="rfc2119"/>.
</t>
</section>

<section anchor="Overview of the CELT Codec" title="Overview of the CELT Codec">

<t>
CELT stands for <spanx style="emph">Constrained Energy Lapped Transform</spanx>. This is
the fundamental princple of the codec: the quantization process is designed in such a way
as to preserve the energy in a certain number of bands. The theoretical aspects of the
codec is described in greater details <xref target="celt-tasl"/> and 
<xref target="celt-eusipco"/>. Although these papers describe a slightly older version of
the codec (version 0.3.2 and 0.5.1, respectively), the principles remain the same.
</t>

<t>CELT is a transform codec, based on the Modified Discrete Cosine Transform 
<xref target="mdct"/>, which is based on a DCT-IV, with overlap and time-domain
aliasing calcellation.</t>


</section>

<section anchor="CELT Modes" title="CELT Modes">
<t>
The operation of both the encoder and decoder depend on the 
mode data. This data includes:
<list style="symbols">
<t>Frame size</t>
<t>Sampling rate</t>
<t>Windowing overlap</t>
<t>Number of channels</t>
<t>Definition of the bands</t>
<t>Definition of the <spanx style="emph">pitch bands</spanx></t>
<t>Decay coefficients of the Laplace distributions for coarse energy</t>
<t>Fine energy allocation data</t>
<t>Pulse allocation data</t>
</list>
</t>
</section>

<section anchor="CELT Encoder" title="CELT Encoder">

<t>Insert encoder overview</t>

<t>The top-level function for encoding a CELT frame in the reference implementation is
celt_encode() (<xref target="celt.c">celt.c</xref>).
</t>

<figure>
<artwork>
+-----------------+---------------------+------------------------------+
|  Feature flags  | (pitch period if P) | (transient scalefactor if S) |
+-----------------+---------------------+------------------------------+
|  (transient time if scalefactor == 3) |  coarse energy               |
+----------------+----------------------+-------+----------------------+
|  fine energy   |  PVQ indices  for all bands  |  (more fine energy)  |
+----------------+------------------------------+----------------------+
</artwork>
<postamble>Fields within parentheses are not included in every packet</postamble>
</figure>

<section anchor="pre-emphasis" title="Pre-emphasis">

<t>The input audio first goes through a pre-emphasis filter, which attenuates the
<spanx style="emph">spectral tilt</spanx>. The filter is has the transfer function A(z)=1-alpha_p*z^-1, with
alpha_p=0.8. Although it is not a requirement, no part of the reference encoder operates
on the non-pre-emphasised signal. The inverse of the pre-emphasis is applied at the decoder.</t>

</section> <!-- pre-emphasis -->

<section anchor="range-coder" title="Range Coder">
<t>
derf?
</t>
</section>

<section anchor="Encoder Feature Selection" title="Encoder Feature Selection">

<t>
The CELT codec has several optional features that be switched on of off, some of which are mutually exclusive. The four main flags are intra-frame energy (I), pitch (P), short blocks (S), and folding (F). Those are described in more details below. There are eight valid combinations of these four features, and they are encoded first into the stream using a variable length code (<xref target="flags-encoding"></xref>). It is left to the implementor to choose to enable each of the flags, with the only restriction that the combination of the four flags needs to correspond to a valid entry in <xref target="flags-encoding"></xref>.
</t>

<texttable anchor="flags-encoding">
        <preamble>Encoding of the feature flags</preamble>
        <ttcol align='center'>I</ttcol>
        <ttcol align='center'>P</ttcol>
        <ttcol align='center'>S</ttcol>
        <ttcol align='center'>F</ttcol>
        <ttcol align='center'>Encoding</ttcol>
        <c>0</c><c>0</c><c>0</c><c>1</c><c>00</c>
        <c>0</c><c>1</c><c>0</c><c>1</c><c>01</c>
        <c>1</c><c>0</c><c>0</c><c>1</c><c>110</c>
        <c>1</c><c>0</c><c>1</c><c>1</c><c>111</c>
		
        <c>0</c><c>0</c><c>0</c><c>0</c><c>1000</c>
        <c>0</c><c>0</c><c>1</c><c>1</c><c>1001</c>
        <c>0</c><c>1</c><c>0</c><c>0</c><c>1010</c>
        <c>1</c><c>0</c><c>0</c><c>0</c><c>1011</c>
</texttable>

<section anchor="intra" title="Intra-frame energy (I)">
<t>
CELT uses prediction to encode the energy in each frequency band. In order to make frames independent, it is however possible to disable the part of the prediction that depends on previous frames. This is called <spanx style="emph">intra-frame energy</spanx> and requires around 12 more bits per frame to achieve when enabled with the <spanx style="emph">I</spanx> bit (Table. <xref target="flags-encoding">flags-encoding</xref>). The use of intra energy is OPTIONAL and the decision method is left to the implementor. The reference code describes one way of deciding which frames would benefit most from having their energy encoded without prediction. The intra_decision() (<xref target="quant_bands.c">quant_bands.c</xref>) function looks for frames where the log-spectral distance between consecutive frames is more than 9 dB. When such a difference is found between two frames, the next frame (not the one for which the difference is detected) is marked encoded with intra energy. The reason for the one-frame delay is to ensure that if the frame where a transient happens is lost, then the next frame will be decoded with no error.
</t>
</section>

<section anchor="pitch" title="Pitch prediction (P)">
<t>
CELT can use a pitch predictor (also known as long-term predictor) to improve the voice quality at lower bit-rate. While pitch period can be estimated in any way, it is RECOMMENDED for performance reasons to estimate it using a frequency-domain correlation between the current frame and the history buffer, as implemented in find_spectral_pitch() (<xref target="pitch.c">pitch.c</xref>). When the <spanx style="emph">P</spanx> bit is set, the pitch period is encoded after the flag bits. The value encoded is an integer in the range [0, 1024-N-overlap-1].
</t>
</section>

<section anchor="short-blocks" title="Short blocks (S)">
<t>
To improve audio quality during transients, CELT can use a <spanx style="emph">short blocks</spanx> multiple-MDCT transform. Unlike other transform codecs, the multiple MDCTs are jointly quantised as if the coefficients were obtained from a single MDCT. For that reason, it is better to consider the short blocks case as using a different transform of the same length rather than as multiple independent MDCTs. In the reference implementation, the decision to use short blocks is made by transient_analysis() (<xref target="celt.c">celt.c</xref>) based on the pre-emphasized signal's peak values, but other methods can be used. When the <spanx style="emph">S</spanx> bit is set, a 2-bit transient scalefactor is encoded directly after the flag bits. If the scalefactor is 0, then the multiple-MDCT output is unmodified. If the scalefactor is 1 or 2, then the output of the MDCTs that follow the transient is scaled down by 2^scalefactor. If the scalefactor is equal to 3, then a time-domain window is applied <spanx style="strong">before</spanx> computing the MDCTs and no further scaling is applied to the MDCTs output. The window value is 1 from the beginning of the frame to 16 samples before the transient time, it is a hanning window from there to the transient time and then 1/8 up to the end of the frame. The hanning window part is is defined as:
</t>

<t>
static const float transientWindow[16] = {
   0.0085135, 0.0337639, 0.0748914, 0.1304955, 
   0.1986827, 0.2771308, 0.3631685, 0.4538658,
   0.5461342, 0.6368315, 0.7228692, 0.8013173, 
   0.8695045, 0.9251086, 0.9662361, 0.9914865};
</t>

<t>When the scalefactor is 3, the transient time is encoded as an integer in the range [0, N+overlap-1] directly after the scalefactor.</t>


<t>
In the case where the scalefactor is 1 or 2 and the mode is defined to use more than 2 MDCTs, then the last MDCT to which the scaling is <spanx style="strong">not</spanx> applied is encoded using an integer in the range [0, B-2], where B is the number of short MDCTs used for the mode. 
</t>
</section>

<section anchor="folding" title="Spectral folding (F)">
<t>
The last encoding feature in CELT is spectral folding. It is designed to prevent <spanx style="emph">birdie</spanx> artefacts caused by the sparse spectra often generated by low-bitrate transform codecs. When folding is enabled, a copy of the low frequency spectrum is added to the higher frequency bands (above ~6400 Hz). The folding operation is decribed in more details in <xref target="pvq"></xref>.
</t>
</section>

</section>

<section anchor="forward-mdct" title="Forward MDCT">

<t>The MDCT implementation has no special characteristic. The
input is a windowed signal (after pre-emphasis) of 2*N samples and the output is N
frequency-domain samples. A <spanx style="emph">low-overlap</spanx> window is used to reduce the algorithmc delay. 
It is derived from a basic (with full overlap) window that is the same as the one used in the Vorbis codec: W(n)=[sin(pi/2*sin(pi/2*(n+.5)/L))]^2. The low-overlap window is created by zero padding the basic window and inserting ones in the middle, such that the resulting window still satisfies power complementarity. The MDCT is computed in mdct_forward() 
(<xref target="mdct.c">mdct.c</xref>), which includes the windowing operation.
</t>
</section>

<section anchor="normalization" title="Bands and Normalization">
<t>
The MDCT output is divided into bands that are designed to match the ear's critical bands,
with the exception that they have to be at least 3 bins wide. For each band, the encoder
computes the energy, that will later be encoded. Each band is then normalized by the 
square root of the <spanx style="strong">unquantized</spanx> energy, such that each band now forms a unit vector X.
The energy and the normalization are computed by compute_band_energies()
and normalise_bands() (<xref target="bands.c">bands.c</xref>), respectively.
</t>
</section>

<section anchor="energy-quantization" title="Energy Envelope Quantization">

<t>
It is important to quantize the energy with sufficient resolution because
any quantization error in the energy cannot be compensated for at a later
stage. Regardless of the resolution used for encoding the shape of a band,
it is perceptually important to preserve the energy in each band. We use a
coarse-fine strategy for encoding the energy in the base-2 log domain, 
as implemented in <xref target="quant_bands.c">quant_bands.c</xref></t>

<section anchor="coarse-energy" title="Coarse energy quantization">
<t>
The coarse quantization of the energy uses a fixed resolution of
6 dB and is the only place where entropy coding are used.
To minimise the bitrate, prediction is applied both in time (using the previous frame)
and in frequency (using the previous bands). The 2-D z-transform of
the prediction filter is: A(z_l, z_b)=(1-a*z_l^-1)*(1-z_b^-1)/(1-b*z_b^-1)
where b is the band index and l is the frame index. The prediction coefficients are
a=0.8 and b=0.7 when not using intra energy and a=b=0 when using intra energy. 
The prediction is applied on the quantized log-energy. We approximate the ideal 
probability distribution of the prediction error using a Laplace distribution. The
coarse energy quantisation is performed by quant_coarse_energy() and 
quant_coarse_energy_mono() (<xref target="quant_bands.c">quant_bands.c</xref>).
</t>

<t>
The Laplace distribution for each band is defined by a 16-bit (Q15) decay parameter.
Thus, the value 0 has a probability of p[0]=2*(16384*(16384-decay)/(16384+decay)). The 
values +/- i each have a probability p[i] = (p[i-1]*decay)>>14. The value of p[i] is always
rounded down (to avoid exceeding 32768 as the sum of all probabilities), so it is possible
for the sum to be less than 32768. In that case additional values with a probability of 1 are encoded. The signed values corresponding to symbols 0, 1, 2, 3, 4, ... 
are [0, +1, -1, +2, -2, ...]. The encoding of the Laplace-distributed values is 
implemented in ec_laplace_encode() (<xref target="laplace.c">laplace.c</xref>).
</t>
<!-- FIXME: bit budget consideration -->
</section> <!-- coarse energy -->

<section anchor="fine-energy" title="Fine energy quantization">
<t>
After the coarse 
</t>
</section> <!-- fine energy -->


</section> <!-- Energy quant -->

<section anchor="allocation" title="Bit Allocation">
<t>Bit allocation is performed based only on information available to both
the encoder and decoder. The same calculations are performed in a bit-exact
manner in both the encoder and decoder to ensure that the result is always
exactly the same. Any mismatch would cause an error in the decoded output.
The allocation is computed by compute_allocation() (<xref target="rate.c">rate.c</xref>),
which is used in both the encoder and the decoder.</t>

<t>For a given band, the bit allocation is nearly constant across
frames that use the same number of bits for Q1 , yielding a pre-
defined signal-to-mask ratio (SMR) for each band. Because the
bands have a width of one Bark, this is equivalent to modelling the
masking occurring within each critical band, while ignoring inter-
band masking and tone-vs-noise characteristics. While this is not an
optimal bit allocation, it provides good results without requiring the
transmission of any allocation information.
</t>

</section>

<section anchor="pitch-prediction" title="Pitch Prediction">
<t>
The pitch period is computed by find_spectral_pitch()
(<xref target="pitch.c">pitch.c</xref>) and the pitch gain is computed by
compute_pitch_gain() (<xref target="bands.c">bands.c</xref>).
</t>

</section>

<section anchor="pvq" title="Spherical Vector Quantization">
<t>CELT uses a Pyramid Vector Quantization (PVQ) <xref target="PVQ"></xref>
codebook for quantising the details of the spectrum in each band that have not
been predicted by the pitch predictor. The PVQ codebook consists of all sums
of K signed pulses in a vector of N samples, where two pulses at the same position
are required to have the same sign. We can thus say that the codebook includes 
all codevectors y of N dimensions that satisfy sum(abs(y(j))) = K.
</t>

<t>
In bands where no pitch and no folding is used, the PVQ is used directly to encode
the unit vector that results from the normalisation in 
<xref target="normalization"></xref>. Given a PVQ codevector y, the unit vector X is
obtained as X = y/||y||. Where ||.|| denotes the L2 norm. In the case where a pitch
prediction or a folding vector P is used, the unit vector X becomes:
</t>
<t>X = P + g_f * y,</t>
<t>where g_f = ( sqrt( (y^T*P)^2 + ||y||^2*(1-||P||^2) ) - y^T*P ) / ||y||^2. </t>

<t>This is described in mix_pitch_and_residual() (<xref target="vq.c">vq.c</xref>).</t>


<t>
The search for the best codevector y is performed by alg_quant()
(<xref target="vq.c">vq.c</xref>). There are several possible approaches to the 
search with a tradeoff between quality and complexity. The method used in the reference
implementation consists of first projecting the residual signal R = X - P onto the codebook
pyramid. 
</t>

<section anchor="Index Encoding" title="Index Encoding">
<t>
The best PVQ codeword is encoded by encode_pulses() (<xref target="cwrs.c">cwrs.c</xref>).
The codeword is converted to a unique index in the same way as specified in 
<xref target="PVQ"></xref>. The indexing is based on the calculation of V(N,K) (denoted N(L,K) in <xref target="PVQ"></xref>), which is the number of possible combinations of K pulses 
in N samples. The number of combinations can be computed recursively as 
V(N,K) = V(N+1,K) + V(N,K+1) + V(N+1,K+1), with V(N,0) = 1 and V(0,K) = 0 for K != 0. 
There are many different ways to compute V(N,K), including pre-compute tables and direct
use of the recursive formulation. The reference implementation applies the recursive
formulation one line (or column) at a time to save on memory use.
</t>
</section>

</section>

<section anchor="synthesis" title="Synthesis">
<t>
After all the quantisation is completed, the quantised energy is used along with the 
quantised normalised band data to resynthesise the MDCT spectrum. The inverse MDCT (<xref target="inverse-mdct"></xref>) and the weighted overlap-add are applied and the signal is stored in the <spanx style="emph">synthesis buffer</spanx> so it can be used for pitch prediction. 
The encoder MAY omit this step of the processing if it knows that it will not be using
the pitch predictor for the next few frames.
</t>
</section>


</section>

<section anchor="CELT Decoder" title="CELT Decoder">

<t>
Like for most audio codecs, the CELT decoder is less complex than the encoder.
</t>

<t>
If during the decoding process a decoded integer value is out of the specified range
(it can happen due to a minimal amount of redundancy when incoding large integers with
the range coder), then the decoder knows there has been an error in the coding, 
decoding or transmission and SHOULD take measures to conceal the error and/or report
that a problem has occured.
</t>

<section anchor="Range Decoder" title="Range Decoder">
<t>
derf?
</t>
</section>

<section anchor="Energy Envelope Decoding" title="Energy Envelope Decoding">
<t>

</t>
</section>

<section anchor="Spherical VQ Decoder" title="Spherical VQ Decoder">
<t>
The spherical codebook is decoded by alg_unquant() (<xref target="vq.c">vq.c</xref>).
The index of the PVQ entry is obtained from the range coder and converted to 
a pulse vector by decode_pulses() (<xref target="cwrs.c">cwrs.c</xref>). Derf??
</t>

<t>
mix_pitch_and_residual() (<xref target="vq.c">vq.c</xref>).
</t>
</section>

<section anchor="Index Decoding" title="Index Decoding">
</section>

<section anchor="Denormalization" title="Denormalization">
<t>
Just like each band was normalised in the encoder, the last step of the decoder before
the inverse MDCT is to denormalize the bands. Each decoded normalized band is
multiplied by the square root of the decoded energy. This is done by denormalise_bands()
(<xref target="bands.c">bands.c</xref>).
</t>
</section>

<section anchor="inverse-mdct" title="Inverse MDCT">
<t>The inverse MDCT implementation has no special characteristic. The
input is N frequency-domain samples and the output is 2*N time-domain 
samples. The output is windowed using the same <spanx style="emph">low-overlap</spanx> window 
as the encoder. The IMDCT and windowing are performed by mdct_backward
(<xref target="mdct.c">mdct.c</xref>). After the overlap-add process, 
the signal is de-emphasised using the inverse of the pre-emphasis filter 
used in the encoder: 1/A(z)=1/(1-alpha_p*z^-1).
</t>
</section>

<section anchor="Packet Loss Concealment" title="Packet Loss Concealment (PLC)">
<t>
Packet loss concealment (PLC) is an optional decoder-side feature which 
SHOULD be included when transmitting over an unreliable channel. Because 
PLC is not part of the bit-stream, there are several possible ways to 
implement PLC with different complexity/quality trade-offs. The PLC in
the reference implementation simply finds a periodicity in the decoded
signal and repeats the windowed waveform using the pitch offset. Care
must be taken to preserve the time-domain aliasing cancellation property
of the inverse MDCT. This is implemented in celt_decode_lost() 
(<xref target="celt.c">mdct.c</xref>).
</t>
</section>

</section>


<section anchor="Security Considerations" title="Security Considerations">

<t>
A potential denial-of-service threat exists for data encodings using
compression techniques that have non-uniform receiver-end
computational load.  The attacker can inject pathological datagrams
into the stream which are complex to decode and cause the receiver to
be overloaded.  However, this encoding does not exhibit any
significant non-uniformity.
</t>

</section> 

<section anchor="Evaluation of CELT Implementations" title="Evaluation of CELT Implementations">

<t>
Insert some text here.
</t>

</section>


<section anchor="Issues that need to be addressed" title="Issues that need to be addressed">

<t>
<list>
<t>Dynamic bit allocation</t>
<t>Stereo coupling</t>
</list>
</t>

</section>


<section anchor="Acknowledgments" title="Acknowledgments">

<t>
The authors would also like to thank the following members of the 
CELT and AVT communities for their input:
</t>
</section> 

</middle>

<back>

<references title="Normative References">

<reference anchor="rfc2119">
<front>
<title>Key words for use in RFCs to Indicate Requirement Levels </title>
<author initials="S." surname="Bradner" fullname="Scott Bradner"><organization/></author>
</front>
<seriesInfo name="RFC" value="2119" />
</reference> 

<reference anchor="rfc3550">
<front>
<title>RTP: A Transport Protocol for real-time applications</title>
<author initials="H." surname="Schulzrinne" fullname=""><organization/></author>
<author initials="S." surname="Casner" fullname=""><organization/></author>
<author initials="R." surname="Frederick" fullname=""><organization/></author>
<author initials="V." surname="Jacobson" fullname=""><organization/></author>
</front>
<seriesInfo name="RFC" value="3550" />
</reference> 


</references> 

<references title="Informative References">

<reference anchor="celt-tasl">
<front>
<title>A High-Quality Speech and Audio Codec With Less Than 10 ms delay</title>
<author initials="JM" surname="Valin" fullname="Jean-Marc Valin"><organization/></author>
<author initials="T. B." surname="Terriberry" fullname="Timothy Terriberry"><organization/></author>
<author initials="C." surname="Montgomery" fullname="Christopher Montgomery"><organization/></author>
<author initials="G." surname="Maxwell" fullname="Gregory Maxwell"><organization/></author>
</front>
<seriesInfo name="To appear in IEEE Transactions on Audio, Speech and Language Processing" value="2009" />
</reference> 

<reference anchor="celt-eusipco">
<front>
<title>A Full-Bandwidth Audio Codec with Low Complexity and Very Low Delay</title>
<author initials="JM" surname="Valin" fullname="Jean-Marc Valin"><organization/></author>
<author initials="T. B." surname="Terriberry" fullname="Timothy Terriberry"><organization/></author>
<author initials="G." surname="Maxwell" fullname="Gregory Maxwell"><organization/></author>
</front>
<seriesInfo name="Accepted for EUSIPCO" value="2009" />
</reference> 

<reference anchor="celt-website">
<front>
<title>The CELT ultra-low delay audio codec</title>
<author><organization/></author>
</front>
<seriesInfo name="CELT website" value="http://www.celt-codec.org/" />
</reference> 

<reference anchor="mdct">
<front>
<title>Modified Discrete Cosine Transform</title>
<author><organization/></author>
</front>
<seriesInfo name="MDCT" value="http://en.wikipedia.org/wiki/Modified_discrete_cosine_transform" />
</reference> 

<reference anchor="PVQ">
<front>
<title>A Pyramid Vector Quantizer</title>
<author initials="T." surname="Fischer" fullname=""><organization/></author>
<date month="July" year="1986" />
</front>
<seriesInfo name="Pyramid Vector Quantizer" value="http://en.wikipedia.org/wiki/Modified_discrete_cosine_transform" />
</reference> 

</references>

<section anchor="Reference Implementation" title="Reference Implementation">

<t>This appendix contains the complete source code for a reference
implementation of the CELT codec written in C. This implementation
can be compiled for either floating-point or fixed-point machines.
Floating-point is the default and fixed-point can be enabled by
defining FIXED_POINT when compiling.
</t>

<t>The implementation can be compiled with either a C89 or a C99
compiler. It is reasonably optimized for most platforms such that
only architecture-specific optimizations are likely to be useful.
The FFT used is a slightly modified version of the KISS-FFT package,
but it is easy to substitute any other FFT library.
</t>

<t>
The testcelt executable can be used to test the encoding and decoding
process:
<list style="empty">
<t><![CDATA[ testcelt <rate> <channels> <frame size> <bytes per packet> [<complexity> [packet loss rate]] <input> <output> ]]></t>
</list>
where "rate" is the sampling rate in Hz, "channels" is the number of
channels (1 or 2), "frame size" is the number of samples in a frame 
(64 to 512) and "bytes per packet" is the number of bytes desired for each
compressed frame. The input and output files are assumed to be a 16-bit
PCM file in the machine native endianness. The optional "complexity" argument
can select the quality vs complexity tradeoff (0-10) and the "packet loss rate"
argument simulates random packet loss (argument is in tenths or a percent).
</t>

<?rfc include="xml_source/testcelt.c"?>
<?rfc include="xml_source/celt.h"?>
<?rfc include="xml_source/celt.c"?>
<?rfc include="xml_source/modes.h"?>
<?rfc include="xml_source/modes.c"?>
<?rfc include="xml_source/bands.h"?>
<?rfc include="xml_source/bands.c"?>
<?rfc include="xml_source/cwrs.h"?>
<?rfc include="xml_source/cwrs.c"?>
<?rfc include="xml_source/vq.h"?>
<?rfc include="xml_source/vq.c"?>
<?rfc include="xml_source/pitch.h"?>
<?rfc include="xml_source/pitch.c"?>
<?rfc include="xml_source/rate.h"?>
<?rfc include="xml_source/rate.c"?>
<?rfc include="xml_source/psy.h"?>
<?rfc include="xml_source/psy.c"?>
<?rfc include="xml_source/mdct.h"?>
<?rfc include="xml_source/mdct.c"?>
<?rfc include="xml_source/ecintrin.h"?>
<?rfc include="xml_source/entcode.h"?>
<?rfc include="xml_source/entcode.c"?>
<?rfc include="xml_source/entenc.h"?>
<?rfc include="xml_source/entenc.c"?>
<?rfc include="xml_source/entdec.h"?>
<?rfc include="xml_source/entdec.c"?>
<?rfc include="xml_source/mfrngcod.h"?>
<?rfc include="xml_source/rangeenc.c"?>
<?rfc include="xml_source/rangedec.c"?>
<?rfc include="xml_source/laplace.h"?>
<?rfc include="xml_source/laplace.c"?>
<?rfc include="xml_source/quant_bands.h"?>
<?rfc include="xml_source/quant_bands.c"?>
<?rfc include="xml_source/arch.h"?>
<?rfc include="xml_source/mathops.h"?>
<?rfc include="xml_source/os_support.h"?>
<?rfc include="xml_source/float_cast.h"?>
<?rfc include="xml_source/stack_alloc.h"?>
<?rfc include="xml_source/celt_types.h"?>
<?rfc include="xml_source/_kiss_fft_guts.h"?>
<?rfc include="xml_source/kiss_fft.h"?>
<?rfc include="xml_source/kiss_fft.c"?>
<?rfc include="xml_source/kiss_fftr.h"?>
<?rfc include="xml_source/kiss_fftr.c"?>
<?rfc include="xml_source/kfft_single.h"?>
<?rfc include="xml_source/kfft_double.h"?>
<?rfc include="xml_source/Makefile"?>

</section>


</back>

</rfc>