Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.xiph.org/xiph/opus.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTimothy B. Terriberry <tterribe@xiph.org>2012-07-17 00:17:27 +0400
committerTimothy B. Terriberry <tterribe@xiph.org>2012-07-17 00:31:11 +0400
commitb3744613b7aa0d56913950740bc24db75c807a68 (patch)
treec338d2fcc94b2ca272479d410e6d99fc0532cbb9
parent3527f9d4c4a15e2303a19d0a72564a32383d3a50 (diff)
Updates from mailing list and other small fixes.
* Bump the document date. * Mandate that the ID header must complete on the first page (to remove any ambiguities about this requirement in RFC 3533). * Remove reundant wording that rillian forgot to remove in 360a4117. * Split the "Granule Position" section into subsections. * Move the first paragraph of the "Other Implementation Notes" section into the "Granule Position" section, add general seeking implementation guidance, and be specific about the interaction between pre-roll and pre-skip. * Retitle the remaining contents of the "Other Implementation Notes" section to "Packet Size Limits" * Specify that all the header fields are REQUIRED (and add a description of the Channel Mapping Table as a whole, so we can say when it is REQUIRED). * Specify that implementations MUST NOT reject headers with extra data if they have an unknown minor version number. * Add a reference to RFC 3629 (UTF-8). * Minor formatting adjustments to vorbis-trim and vorbis-mapping cites. * Eliminate semicolons and terrible "Else, if" constructs.
-rw-r--r--doc/draft-terriberry-oggopus.xml122
1 files changed, 94 insertions, 28 deletions
diff --git a/doc/draft-terriberry-oggopus.xml b/doc/draft-terriberry-oggopus.xml
index 82341feb..ce8cddcf 100644
--- a/doc/draft-terriberry-oggopus.xml
+++ b/doc/draft-terriberry-oggopus.xml
@@ -51,7 +51,7 @@
</address>
</author>
-<date day="3" month="July" year="2012"/>
+<date day="16" month="July" year="2012"/>
<area>RAI</area>
<workgroup>codec</workgroup>
@@ -141,7 +141,7 @@ The first packet in the logical Ogg bitstream MUST contain the identification
(ID) header, which uniquely identifies a stream as Opus audio.
The format of this header is defined in <xref target="id_header"/>.
It MUST be placed alone (without any other packet data) on the first page of
- the logical Ogg bitstream.
+ the logical Ogg bitstream, and must complete on that page.
This page MUST have its 'beginning of stream' flag set.
</t>
<t>
@@ -164,9 +164,9 @@ The value N is specified in the ID header (see
logical Ogg bitstream.
</t>
<t>
-The first N-1 Opus packets, if any, are packed one after another in sequence
- into the Ogg packet, using the self-delimiting framing from Appendix&nbsp;B
- of <xref target="RFCOpus"/>.
+The first N-1 Opus packets, if any, are packed one after another into the Ogg
+ packet, using the self-delimiting framing from Appendix&nbsp;B of
+ <xref target="RFCOpus"/>.
The remaining Opus packet is packed at the end of the Ogg packet using the
regular, undelimited framing from Section&nbsp;3 of <xref target="RFCOpus"/>.
All of the Opus packets in a single Ogg packet MUST be constrained to have the
@@ -244,6 +244,7 @@ In order to support capturing a stream that uses discontinuous transmission
not transmitted.
</t>
+<section anchor="preskip" title="Pre-skip">
<t>
There is some amount of latency introduced during the decoding process, to
allow for overlap in the MDCT modes, stereo mixing in the LP modes, and
@@ -269,7 +270,9 @@ It may also be used to perform sample-accurate cropping of existing encoded
This amount need not be a multiple of 2.5&nbsp;ms, may be smaller than a single
packet, or may span the contents of several packets.
</t>
+</section>
+<section anchor="pcm_sample_position" title="PCM Sample Position">
<t>
The PCM sample position is determined from the granule position using the
formula
@@ -306,7 +309,7 @@ In this case, the PCM sample position of the first audio sample to be played
<t>
Vorbis streams use a granule position smaller than the number of audio samples
contained in the first audio data page to indicate that some of those samples
- must be trimmed from the output. See <xref target="vorbis-trim"/>.
+ must be trimmed from the output (see <xref target="vorbis-trim"/>).
However, to do so, Vorbis requires that the first audio data page contains
exactly two packets, in order to allow the decoder to perform PCM position
adjustments before needing to return any PCM data.
@@ -315,7 +318,9 @@ Opus uses the pre-skip mechanism for this purpose instead, since the encoder
large packets in streams with a very large number of channels might not fit on
a single page.
</t>
+</section>
+<section title="end_trimming" title="End Trimming">
<t>
The page with the 'end of stream' flag set MAY have a granule position that
indicates the page contains less audio data than would normally be returned by
@@ -330,7 +335,10 @@ The remaining samples are discarded.
The number of discarded samples SHOULD be no larger than the number decoded
from the last packet.
</t>
+</section>
+<section anchor="start_granpos_restrictions"
+ title="Restrictions on the Initial Granule Position">
<t>
The granule position of the first audio data page with a completed packet MAY
be larger than the number of samples contained in packets that complete on
@@ -367,6 +375,32 @@ This would indicate that more samples should be skipped from the initial
</t>
</section>
+<section anchor="seeking_and_preroll" title="Seeking and Pre-roll">
+<t>
+Seeking in Ogg files is best performed using a bisection search for a page
+ whose granule position corresponds to a PCM position at or before the seek
+ target.
+With appropriately weighted bisection, accurate seeking can be performed with
+ just three or four bisections even in multi-gigabyte files.
+See <xref target="seeking"/> for general implementation guidance.
+</t>
+
+<t>
+When seeking within an Ogg Opus stream, the decoder SHOULD start decoding (and
+ discarding the output) at least 3840&nbsp;samples (80&nbsp;ms) prior to the
+ seek target in order to ensure that the output audio is correct by the time it
+ reaches the seek target.
+This 'pre-roll' is separate from, and unrelated to, the 'pre-skip' used at the
+ beginning of the stream.
+If the point 80&nbsp;ms prior to the seek target comes before the initial PCM
+ sample position, the decoder SHOULD start decoding from the beginning of the
+ stream, applying pre-skip as normal, regardless of whether the pre-skip is
+ larger or smaller than 80&nbsp;ms.
+</t>
+</section>
+
+</section>
+
<section anchor="headers" title="Header Packets">
<t>
An Opus stream contains exactly two mandatory header packets.
@@ -473,12 +507,12 @@ The original sample rate of the encoder input is not preserved by the lossy
An Ogg Opus player SHOULD select the playback sample rate according to the
following procedure:
<list style="numbers">
-<t>If the hardware supports 48&nbsp;kHz playback, decode at 48&nbsp;kHz;</t>
-<t>Else, if the hardware's highest available sample rate is a supported
- rate, decode at this sample rate;</t>
-<t>Else, if the hardware's highest available sample rate is less than
- 48&nbsp;kHz, decode at the highest supported rate above this and resample;</t>
-<t>Else, decode at 48&nbsp;kHz and resample.</t>
+<t>If the hardware supports 48&nbsp;kHz playback, decode at 48&nbsp;kHz.</t>
+<t>Otherwise, if the hardware's highest available sample rate is a supported
+ rate, decode at this sample rate.</t>
+<t>Otherwise, if the hardware's highest available sample rate is less than
+ 48&nbsp;kHz, decode at the highest supported rate above this and resample.</t>
+<t>Otherwise, decode at 48&nbsp;kHz and resample.</t>
</list>
However, the 'Input Sample Rate' field allows the encoder to pass the sample
rate of the original input stream as metadata.
@@ -542,9 +576,28 @@ Each possible value of this octet indicates a mapping family, which defines a
allowed channel count.
The details are described in <xref target="channel_mapping"/>.
</t>
+<t><spanx style="strong">Channel Mapping Table</spanx>:
+This table defines the mapping from encoded streams to output channels.
+It is omitted when the channel mapping family is 0, but REQUIRED otherwise.
+Its contents are specified in <xref target="channel_mapping"/>.
+</t>
</list>
</t>
+<t>
+All fields in the ID headers are REQUIRED, except for the channel mapping
+ table, which is omitted when the channel mapping family is 0.
+Implementations SHOULD reject ID headers which do not contain enough data for
+ these fields, even if they contain a valid Magic Signature.
+Future versions of this specification, even backwards-compatible versions,
+ might include additional fields in the ID header.
+If an ID header has a compatible major version, but a larger minor version,
+ an implementation MUST NOT reject it for containing additional data not
+ specified here.
+However, implementations MAY reject streams in which the ID header does not
+ complete on the first page.
+</t>
+
<section anchor="channel_mapping" title="Channel Mapping">
<t>
An Ogg Opus stream allows mapping one number of Opus streams (N) to a possibly
@@ -658,9 +711,8 @@ When the 'channel mapping family' octet has this value, the channel mapping
<vspace blankLines="1"/>
Allowed numbers of channels: 1...8.<vspace/>
Channel meanings depend on the number of channels.
-See <xref target="vorbis-mapping">the
- Vorbis mapping</xref> for the assignments from output channel number to
- specific speaker locations.
+See <xref target="vorbis-mapping"/> for the assignments from output channel
+ number to specific speaker locations.
<vspace blankLines="1"/>
</t>
<t>Family&nbsp;255 (no defined channel meaning):
@@ -756,13 +808,13 @@ It MUST NOT indicate that the vendor string is longer than the rest of the
<t><spanx style="strong">Vendor String</spanx> (variable length, UTF-8 vector):
<vspace blankLines="1"/>
This is a simple human-readable tag for vendor information, encoded as a UTF-8
- string.
+ string&nbsp;<xref target="RFC3629"/>.
No terminating NUL octet is required.
<vspace blankLines="1"/>
This tag is intended to identify the codec encoder and encapsulation
- implementations, for tracing differences in technical behavior. The
- user-facing encoding application can use the 'ENCODER' user commment
- tag name to identify themselves.
+ implementations, for tracing differences in technical behavior.
+The user-facing encoding application can use the 'ENCODER' user commment tag
+ name to identify themselves.
<vspace blankLines="1"/>
</t>
<t><spanx style="strong">User Comment List Length</spanx> (32 bits, unsigned,
@@ -795,6 +847,17 @@ There is one for each user comment indicated by the 'user comment list length'
</t>
<t>
+The vendor string length and user comment list length are REQUIRED, and
+ implementations SHOULD reject comment headers that do not contain enough data
+ for these fields, or that do not contain enough data for the corresponding
+ vendor string or user comments they describe.
+Making this check before allocating the associated memory to contain the data
+ may help prevent a possible Denial-of-Service (DoS) attack from small comment
+ headers that claim to contain strings longer than the entire packet or more
+ user comments than than could possibly fit in the packet.
+</t>
+
+<t>
The user comment strings follow the NAME=value format described by
<xref target="vorbis-comment"/> with the same recommended tag names.
One new comment tag is introduced for Ogg Opus:
@@ -836,19 +899,11 @@ There is no Opus comment tag corresponding to REPLAYGAIN_ALBUM_GAIN.
That information should instead be stored in the ID header's 'output gain'
field.
</t>
-
</section>
</section>
-<section anchor="other_implementation_notes"
- title="Other Implementation Notes">
-<t>
-When seeking within an Ogg Opus stream, the decoder should start decoding (and
- discarding the output) at least 3840&nbsp;samples (80&nbsp;ms) prior to the
- seek point in order to ensure that the output audio is correct at the seek
- point.
-</t>
+<section anchor="packet_size_limits" title="Packet Size Limits">
<t>
Technically valid Opus packets can be arbitrarily large due to the padding
format, although the amount of non-padding data they can contain is bounded.
@@ -978,6 +1033,7 @@ The authors agree to grant third parties the irrevocable right to copy, use,
<references title="Normative References">
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>
+<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml"?>
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3533.xml"?>
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.5334.xml"?>
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6381.xml"?>
@@ -1034,6 +1090,16 @@ The authors agree to grant third parties the irrevocable right to copy, use,
<!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml"?-->
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4732.xml"?>
+<reference anchor="seeking"
+ target="http://wiki.xiph.org/Seeking">
+<front>
+<title>Granulepos Encoding and How Seeking Really Works</title>
+<author initials="S." surname="Pfeiffer" fullname="Silvia Pfeiffer"/>
+<author initials="C." surname="Parker" fullname="Conrad Parker"/>
+<author initials="G." surname="Maxwell" fullname="Greg Maxwell"/>
+</front>
+</reference>
+
<reference anchor="replay-gain"
target="http://wiki.xiph.org/VorbisComment#Replay_Gain">
<front>