diff options
author | Timothy B. Terriberry <tterribe@xiph.org> | 2012-07-17 00:17:27 +0400 |
---|---|---|
committer | Timothy B. Terriberry <tterribe@xiph.org> | 2012-07-17 00:31:11 +0400 |
commit | b3744613b7aa0d56913950740bc24db75c807a68 (patch) | |
tree | c338d2fcc94b2ca272479d410e6d99fc0532cbb9 | |
parent | 3527f9d4c4a15e2303a19d0a72564a32383d3a50 (diff) |
Updates from mailing list and other small fixes.
* Bump the document date.
* Mandate that the ID header must complete on the first page (to
remove any ambiguities about this requirement in RFC 3533).
* Remove reundant wording that rillian forgot to remove in 360a4117.
* Split the "Granule Position" section into subsections.
* Move the first paragraph of the "Other Implementation Notes"
section into the "Granule Position" section, add general seeking
implementation guidance, and be specific about the interaction
between pre-roll and pre-skip.
* Retitle the remaining contents of the "Other Implementation Notes"
section to "Packet Size Limits"
* Specify that all the header fields are REQUIRED (and add a
description of the Channel Mapping Table as a whole, so we can
say when it is REQUIRED).
* Specify that implementations MUST NOT reject headers with extra
data if they have an unknown minor version number.
* Add a reference to RFC 3629 (UTF-8).
* Minor formatting adjustments to vorbis-trim and vorbis-mapping
cites.
* Eliminate semicolons and terrible "Else, if" constructs.
-rw-r--r-- | doc/draft-terriberry-oggopus.xml | 122 |
1 files changed, 94 insertions, 28 deletions
diff --git a/doc/draft-terriberry-oggopus.xml b/doc/draft-terriberry-oggopus.xml index 82341feb..ce8cddcf 100644 --- a/doc/draft-terriberry-oggopus.xml +++ b/doc/draft-terriberry-oggopus.xml @@ -51,7 +51,7 @@ </address> </author> -<date day="3" month="July" year="2012"/> +<date day="16" month="July" year="2012"/> <area>RAI</area> <workgroup>codec</workgroup> @@ -141,7 +141,7 @@ The first packet in the logical Ogg bitstream MUST contain the identification (ID) header, which uniquely identifies a stream as Opus audio. The format of this header is defined in <xref target="id_header"/>. It MUST be placed alone (without any other packet data) on the first page of - the logical Ogg bitstream. + the logical Ogg bitstream, and must complete on that page. This page MUST have its 'beginning of stream' flag set. </t> <t> @@ -164,9 +164,9 @@ The value N is specified in the ID header (see logical Ogg bitstream. </t> <t> -The first N-1 Opus packets, if any, are packed one after another in sequence - into the Ogg packet, using the self-delimiting framing from Appendix B - of <xref target="RFCOpus"/>. +The first N-1 Opus packets, if any, are packed one after another into the Ogg + packet, using the self-delimiting framing from Appendix B of + <xref target="RFCOpus"/>. The remaining Opus packet is packed at the end of the Ogg packet using the regular, undelimited framing from Section 3 of <xref target="RFCOpus"/>. All of the Opus packets in a single Ogg packet MUST be constrained to have the @@ -244,6 +244,7 @@ In order to support capturing a stream that uses discontinuous transmission not transmitted. </t> +<section anchor="preskip" title="Pre-skip"> <t> There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and @@ -269,7 +270,9 @@ It may also be used to perform sample-accurate cropping of existing encoded This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets. </t> +</section> +<section anchor="pcm_sample_position" title="PCM Sample Position"> <t> The PCM sample position is determined from the granule position using the formula @@ -306,7 +309,7 @@ In this case, the PCM sample position of the first audio sample to be played <t> Vorbis streams use a granule position smaller than the number of audio samples contained in the first audio data page to indicate that some of those samples - must be trimmed from the output. See <xref target="vorbis-trim"/>. + must be trimmed from the output (see <xref target="vorbis-trim"/>). However, to do so, Vorbis requires that the first audio data page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. @@ -315,7 +318,9 @@ Opus uses the pre-skip mechanism for this purpose instead, since the encoder large packets in streams with a very large number of channels might not fit on a single page. </t> +</section> +<section title="end_trimming" title="End Trimming"> <t> The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by @@ -330,7 +335,10 @@ The remaining samples are discarded. The number of discarded samples SHOULD be no larger than the number decoded from the last packet. </t> +</section> +<section anchor="start_granpos_restrictions" + title="Restrictions on the Initial Granule Position"> <t> The granule position of the first audio data page with a completed packet MAY be larger than the number of samples contained in packets that complete on @@ -367,6 +375,32 @@ This would indicate that more samples should be skipped from the initial </t> </section> +<section anchor="seeking_and_preroll" title="Seeking and Pre-roll"> +<t> +Seeking in Ogg files is best performed using a bisection search for a page + whose granule position corresponds to a PCM position at or before the seek + target. +With appropriately weighted bisection, accurate seeking can be performed with + just three or four bisections even in multi-gigabyte files. +See <xref target="seeking"/> for general implementation guidance. +</t> + +<t> +When seeking within an Ogg Opus stream, the decoder SHOULD start decoding (and + discarding the output) at least 3840 samples (80 ms) prior to the + seek target in order to ensure that the output audio is correct by the time it + reaches the seek target. +This 'pre-roll' is separate from, and unrelated to, the 'pre-skip' used at the + beginning of the stream. +If the point 80 ms prior to the seek target comes before the initial PCM + sample position, the decoder SHOULD start decoding from the beginning of the + stream, applying pre-skip as normal, regardless of whether the pre-skip is + larger or smaller than 80 ms. +</t> +</section> + +</section> + <section anchor="headers" title="Header Packets"> <t> An Opus stream contains exactly two mandatory header packets. @@ -473,12 +507,12 @@ The original sample rate of the encoder input is not preserved by the lossy An Ogg Opus player SHOULD select the playback sample rate according to the following procedure: <list style="numbers"> -<t>If the hardware supports 48 kHz playback, decode at 48 kHz;</t> -<t>Else, if the hardware's highest available sample rate is a supported - rate, decode at this sample rate;</t> -<t>Else, if the hardware's highest available sample rate is less than - 48 kHz, decode at the highest supported rate above this and resample;</t> -<t>Else, decode at 48 kHz and resample.</t> +<t>If the hardware supports 48 kHz playback, decode at 48 kHz.</t> +<t>Otherwise, if the hardware's highest available sample rate is a supported + rate, decode at this sample rate.</t> +<t>Otherwise, if the hardware's highest available sample rate is less than + 48 kHz, decode at the highest supported rate above this and resample.</t> +<t>Otherwise, decode at 48 kHz and resample.</t> </list> However, the 'Input Sample Rate' field allows the encoder to pass the sample rate of the original input stream as metadata. @@ -542,9 +576,28 @@ Each possible value of this octet indicates a mapping family, which defines a allowed channel count. The details are described in <xref target="channel_mapping"/>. </t> +<t><spanx style="strong">Channel Mapping Table</spanx>: +This table defines the mapping from encoded streams to output channels. +It is omitted when the channel mapping family is 0, but REQUIRED otherwise. +Its contents are specified in <xref target="channel_mapping"/>. +</t> </list> </t> +<t> +All fields in the ID headers are REQUIRED, except for the channel mapping + table, which is omitted when the channel mapping family is 0. +Implementations SHOULD reject ID headers which do not contain enough data for + these fields, even if they contain a valid Magic Signature. +Future versions of this specification, even backwards-compatible versions, + might include additional fields in the ID header. +If an ID header has a compatible major version, but a larger minor version, + an implementation MUST NOT reject it for containing additional data not + specified here. +However, implementations MAY reject streams in which the ID header does not + complete on the first page. +</t> + <section anchor="channel_mapping" title="Channel Mapping"> <t> An Ogg Opus stream allows mapping one number of Opus streams (N) to a possibly @@ -658,9 +711,8 @@ When the 'channel mapping family' octet has this value, the channel mapping <vspace blankLines="1"/> Allowed numbers of channels: 1...8.<vspace/> Channel meanings depend on the number of channels. -See <xref target="vorbis-mapping">the - Vorbis mapping</xref> for the assignments from output channel number to - specific speaker locations. +See <xref target="vorbis-mapping"/> for the assignments from output channel + number to specific speaker locations. <vspace blankLines="1"/> </t> <t>Family 255 (no defined channel meaning): @@ -756,13 +808,13 @@ It MUST NOT indicate that the vendor string is longer than the rest of the <t><spanx style="strong">Vendor String</spanx> (variable length, UTF-8 vector): <vspace blankLines="1"/> This is a simple human-readable tag for vendor information, encoded as a UTF-8 - string. + string <xref target="RFC3629"/>. No terminating NUL octet is required. <vspace blankLines="1"/> This tag is intended to identify the codec encoder and encapsulation - implementations, for tracing differences in technical behavior. The - user-facing encoding application can use the 'ENCODER' user commment - tag name to identify themselves. + implementations, for tracing differences in technical behavior. +The user-facing encoding application can use the 'ENCODER' user commment tag + name to identify themselves. <vspace blankLines="1"/> </t> <t><spanx style="strong">User Comment List Length</spanx> (32 bits, unsigned, @@ -795,6 +847,17 @@ There is one for each user comment indicated by the 'user comment list length' </t> <t> +The vendor string length and user comment list length are REQUIRED, and + implementations SHOULD reject comment headers that do not contain enough data + for these fields, or that do not contain enough data for the corresponding + vendor string or user comments they describe. +Making this check before allocating the associated memory to contain the data + may help prevent a possible Denial-of-Service (DoS) attack from small comment + headers that claim to contain strings longer than the entire packet or more + user comments than than could possibly fit in the packet. +</t> + +<t> The user comment strings follow the NAME=value format described by <xref target="vorbis-comment"/> with the same recommended tag names. One new comment tag is introduced for Ogg Opus: @@ -836,19 +899,11 @@ There is no Opus comment tag corresponding to REPLAYGAIN_ALBUM_GAIN. That information should instead be stored in the ID header's 'output gain' field. </t> - </section> </section> -<section anchor="other_implementation_notes" - title="Other Implementation Notes"> -<t> -When seeking within an Ogg Opus stream, the decoder should start decoding (and - discarding the output) at least 3840 samples (80 ms) prior to the - seek point in order to ensure that the output audio is correct at the seek - point. -</t> +<section anchor="packet_size_limits" title="Packet Size Limits"> <t> Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. @@ -978,6 +1033,7 @@ The authors agree to grant third parties the irrevocable right to copy, use, <references title="Normative References"> <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?> +<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml"?> <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3533.xml"?> <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.5334.xml"?> <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6381.xml"?> @@ -1034,6 +1090,16 @@ The authors agree to grant third parties the irrevocable right to copy, use, <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml"?--> <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4732.xml"?> +<reference anchor="seeking" + target="http://wiki.xiph.org/Seeking"> +<front> +<title>Granulepos Encoding and How Seeking Really Works</title> +<author initials="S." surname="Pfeiffer" fullname="Silvia Pfeiffer"/> +<author initials="C." surname="Parker" fullname="Conrad Parker"/> +<author initials="G." surname="Maxwell" fullname="Greg Maxwell"/> +</front> +</reference> + <reference anchor="replay-gain" target="http://wiki.xiph.org/VorbisComment#Replay_Gain"> <front> |