diff options
Diffstat (limited to 'doc/isoflac.txt')
-rw-r--r-- | doc/isoflac.txt | 666 |
1 files changed, 666 insertions, 0 deletions
diff --git a/doc/isoflac.txt b/doc/isoflac.txt new file mode 100644 index 0000000..888aed7 --- /dev/null +++ b/doc/isoflac.txt @@ -0,0 +1,666 @@ +Encapsulation of FLAC in ISO Base Media File Format +Version 0.0.4 + +Table of Contents +1 Scope +2 Supporting Normative References +3 Design Rules of Encapsulation + 3.1 File Type Identification + 3.2 Overview of Track Structure + 3.3 Definition of FLAC sample + 3.3.1 Sample entry format + 3.3.2 FLAC Specific Box + 3.3.3 Sample format + 3.3.4 Duration of FLAC sample + 3.3.5 Sub-sample + 3.3.6 Random Access + 3.3.6.1 Random Access Point + 3.4 Basic Structure (informative) + 3.4.1 Initial Movie + 3.5 Example of Encapsulation (informative) +4 Acknowledgements +5 Author's Address + +1 Scope + + This document specifies the normative mapping for encapsulation of + FLAC coded audio bitstreams in ISO Base Media file format and its + derivatives. The encapsulation of FLAC coded bitstreams in + QuickTime file format is outside the scope of this specification. + +2 Supporting Normative References + + [1] ISO/IEC 14496-12:2012 Corrected version + + Information technology — Coding of audio-visual objects — Part + 12: ISO base media file format + + [2] ISO/IEC 14496-12:2012/Amd.1:2013 + + Information technology — Coding of audio-visual objects — Part + 12: ISO base media file format AMENDMENT 1: Various + enhancements including support for large metadata + + [3] FLAC format specification + + https://xiph.org/flac/format.html + + Definition of the FLAC Audio Codec stream format + + [4] FLAC-in-Ogg mapping specification + + https://xiph.org/flac/ogg_mapping.html + + Ogg Encapsulation for the FLAC Audio Codec + + [5] Matroska specification + +3 Design Rules of Encapsulation + + 3.1 File Type Identification + + This specification does not define any brand to declare files + which conform to this specification. Files which conform to + this specification shall contain at least one brand which + supports the requirements and the requirements described in + this clause without contradiction in the compatible brands + list of the File Type Box. The minimal support of the + encapsulation of FLAC bitstreams in ISO Base Media file format + requires the 'isom' brand. + + 3.2 Overview of Track Structure + + FLAC coded audio shall be encapsulated into the ISO Base + Media File Format as media data within an audio track. + + + The handler_type field in the Handler Reference Box + shall be set to 'soun'. + + + The Media Information Box shall contain the Sound Media + Header Box. + + + The codingname of the sample entry is 'fLaC'. + + This specification does not define any encapsulation + using MP4AudioSampleEntry with objectTypeIndication + specified by the MPEG-4 Registration Authority + (http://www.mp4ra.org/). See section 'Sample entry + format' for the definition of the sample entry. + + + The 'dfLa' box is added to the sample entry to convey + initializing information for the decoder. + + See section 'FLAC Specific Box' for the definition of + the box contents. + + + A FLAC sample is exactly one FLAC frame as described + in the format specification[3]. See section + 'Sample format' for details of the frame contents. + + + Every FLAC sample is a sync sample. No pre-roll or + lapping is required. See section 'Random Access' for + further details. + + 3.3 Definition of a FLAC sample + + 3.3.1 Sample entry format + + For any track containing one or more FLAC bitstreams, a + sample entry describing the corresponding FLAC bitstream + shall be present inside the Sample Table Box. This version + of the specification defines only one sample entry format + named FLACSampleEntry whose codingname is 'fLaC'. This + sample entry includes exactly one FLAC Specific Box + defined in section 'FLAC specific box' as a mandatory box + and indicates that FLAC samples described by this sample + entry are stored by the sample format described in section + 'Sample format'. + + The syntax and semantics of the FLACSampleEntry is shown + as follows. The data fields of this box and native + FLAC[3] structures encoded within FLAC blocks are both + stored in big-endian format, though for purposes of the + ISO BMFF container, FLAC native metadata and data blocks + are treated as unstructured octet streams. + + class FLACSampleEntry() extends AudioSampleEntry ('fLaC'){ + FLACSpecificBox(); + } + + The fields of the AudioSampleEntry portion shall be set as + follows: + + + channelcount: + + The channelcount field shall be set equal to the + channel count specified by the FLAC bitstream's native + METADATA_BLOCK_STREAMINFO header as described in [3]. + Note that the FLAC FRAME_HEADER structure that begins + each FLAC sample redundantly encodes channel number; + the number of channels declared in each FRAME_HEADER + MUST match the number of channels declared here and in + the METADATA_BLOCK_STREAMINFO header. + + + samplesize: + + The samplesize field shall be set equal to the bits + per sample specified by the FLAC bitstream's native + METADATA_BLOCK_STREAMINFO header as described in [3]. + Note that the FLAC FRAME_HEADER structure that begins + each FLAC sample redundantly encodes the number of + bits per sample; the bits per sample declared in each + FRAME_HEADER MUST match the samplesize declared here + and the bits per sample field declared in the + METADATA_BLOCK_STREAMINFO header. + + + samplerate: + + When possible, the samplerate field shall be set + equal to the sample rate specified by the FLAC + bitstream's native METADATA_BLOCK_STREAMINFO header + as described in [3], left-shifted by 16 bits to + create the appropriate 16.16 fixed-point + representation. + + When the bitstream's native sample rate is greater + than the maximum expressible value of 65535 Hz, + the samplerate field shall hold the greatest + expressible regular division of that rate. I.e. + the samplerate field shall hold 48000.0 for + native sample rates of 96 and 192 kHz. In the + case of unusual sample rates which do not have + an expressible regular division, the maximum value + of 65535.0 Hz should be used. + + High-rate FLAC bitstreams are common, and the native + value from the METADATA_BLOCK_STREAMINFO header in + the FLACSpecificBox MUST be read to determine the + correct sample rate of the bitstream. + + Note that the FLAC FRAME_HEADER structure that begins + each FLAC sample redundantly encodes the sample rate; + the sample rate declared in each FRAME_HEADER MUST + match the sample rate declared in the + METADATA_BLOCK_STREAMINFO header, and here in the + AudioSampleEntry portion of the FLACSampleEntry + as much as is allowed by the encoding restrictions + described above. + + Finally, the FLACSpecificBox carries codec headers: + + + FLACSpecificBox + + This box contains initializing information for the + decoder as defined in section 'FLAC specific box'. + + 3.3.2 FLAC Specific Box + + Exactly one FLAC Specific Box shall be present in each + FLACSampleEntry. This specification defines version 0 + of this box. If incompatible changes occur in future + versions of this specification, another version number + will be defined. The data fields of this box and native + FLAC[3] structures encoded within FLAC blocks are both + stored in big-endian format, though for purposes of the + ISO BMFF container, FLAC native metadata and data blocks + are treated as unstructured octet streams. + + The syntax and semantics of the FLAC Specific Box is shown + as follows. + + class FLACMetadataBlock { + unsigned int(1) LastMetadataBlockFlag; + unsigned int(7) BlockType; + unsigned int(24) Length; + unsigned int(8) BlockData[Length]; + } + + aligned(8) class FLACSpecificBox + extends FullBox('dfLa', version=0, 0){ + for (i=0; ; i++) { // to end of box + FLACMetadataBlock(); + } + } + + + Version: + + The Version field shall be set to 0. + + In the future versions of this specification, this + field may be set to other values. And without support + of those values, the reader shall not read the fields + after this within the FLACSpecificBox. + + + Flags: + + The Flags field shall be set to 0. + + After the FullBox header, the box contains a sequence of + FLAC[3] native-metadata block structures that fill the + remainder of the box. + + Each FLACMetadataBlock structure consists of three fields + filling a total of four bytes that form a FLAC[3] native + METADATA_BLOCK_HEADER, followed by raw octet bytes that + comprise the FLAC[3] native METADATA_BLOCK_DATA. + + + LastMetadataBlockFlag: + + The LastMetadataBlockFlag field maps semantically to + the FLAC[3] native METADATA_BLOCK_HEADER + Last-metadata-block flag as defined in the FLAC[3] + file specification. + + The LastMetadataBlockFlag is set to 1 if this + MetadataBlock is the last metadata block in the + FLACSpecificBox. It is set to 0 otherwise. + + + BlockType: + + The BlockType field maps semantically to the FLAC[3] + native METADATA_BLOCK_HEADER BLOCK_TYPE field as + defined in the FLAC[3] file specification. + + The BlockType is set to a valid FLAC[3] BLOCK_TYPE + value that identifies the type of this native metadata + block. The BlockType of the first FLACMetadataBlock + must be set to 0, signifying this is a FLAC[3] native + METADATA_BLOCK_STREAMINFO block. + + + Length: + + The Length field maps semantically to the FLAC[3] + native METADATA_BLOCK_HEADER Length field as + defined in the FLAC[3] file specification. + + The length field specifies the number of bytes of + MetadataBlockData to follow. + + + BlockData + + The BlockData field maps semantically to the FLAC[3] + native METADATA_BLOCK_HEADER METADATA_BLOCK_DATA as + defined in the FLAC[3] file specification. + + Taken together, the bytes of the FLACMetadataBlock form a + complete FLAC[3] native METADATA_BLOCK structure. + + Note that a minimum of a single FLACMetadataBlock, + consisting of a FLAC[3] native METADATA_BLOCK_STREAMINFO + structure, is required. Should the FLACSpecificBox + contain more than a single FLACMetadataBlock structure, + the FLACMetadataBlock containing the FLAC[3] native + METADATA_BLOCK_STREAMINFO must occur first in the list. + + Other containers that package FLAC audio streams, such as + Ogg[4] and Matroska[5], wrap FLAC[3] native metadata without + modification similar to this specification. When + repackaging or remuxing FLAC[3] streams from another + format that contains FLAC[3] native metadata into an ISO + BMFF file, the complete FLAC[3] native metadata should be + preserved in the ISO BMFF stream as described above. It + is also allowed to parse this native metadata and include + contextually redundant ISO BMFF-native repackagings and/or + reparsings of FLAC[3] native metadata, so long as the + native metadata is also preserved. + + 3.3.3 Sample format + + A FLAC sample is exactly one FLAC audio FRAME (as defined + in the FLAC[3] file specification) belonging to a FLAC + bitstreams. The FLAC sample data begins with a complete + FLAC FRAME_HEADER, followed by one FLAC SUBFRAME per + channel, any necessary bit padding, and ends with the + usual FLAC FRAME_FOOTER. + + Note that the FLAC native FRAME_HEADER structure that + begins each FLAC sample redundantly encodes channel count, + sample rate, and sample size. The values of these fields + must agree both with the values declared in the FLAC + METADATA_BLOCK_STREAMINFO structure as well as the + FLACSampleEntry box. + + 3.3.4 Duration of a FLAC sample + + The duration of any given FLAC sample is determined by + dividing the decoded block size of a FLAC frame, as + encoded in the FLAC FRAME's FRAME_HEADER structure, by the + value of the timescale field in the Media Header Box. + FLAC samples are permitted to have variable durations + within a given audio stream. FLAC does not use padding + values. + + 3.3.5 Sub-sample + + Sub-samples are not defined for FLAC samples in this + specification. + + 3.3.6 Random Access + + This subclause describes the nature of the random access + of FLAC sample. + + 3.3.6.1 Random Access Point + + All FLAC samples can be independently decoded + i.e. every FLAC sample is a sync sample. The Sync + Sample Box shall not be present as long as there are + no samples other than FLAC samples in the same + track. The sample_is_non_sync_sample field for FLAC + samples shall be set to 0. + + 3.4 Basic Structure (informative) + + 3.4.1 Initial Movie + + This subclause shows a basic structure of the Movie Box as follows: + + +----+----+----+----+----+----+----+----+------------------------------+ + |moov| | | | | | | | Movie Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | |mvhd| | | | | | | Movie Header Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | |trak| | | | | | | Track Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | |tkhd| | | | | | Track Header Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | |edts|* | | | | | Edit Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | |elst|* | | | | Edit List Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | |mdia| | | | | | Media Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | |mdhd| | | | | Media Header Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | |hdlr| | | | | Handler Reference Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | |minf| | | | | Media Information Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | |smhd| | | | Sound Media Header Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | |dinf| | | | Data Information Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | |dref| | | Data Reference Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | | |url | | DataEntryUrlBox | + +----+----+----+----+----+----+ or +----+------------------------------+ + | | | | | | |urn | | DataEntryUrnBox | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | |stbl| | | | Sample Table | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | |stsd| | | Sample Description Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | | |fLaC| | FLACSampleEntry | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | | | |dfLa| FLAC Specific Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | |stts| | | Decoding Time to Sample Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | |stsc| | | Sample To Chunk Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | |stsz| | | Sample Size Box | + +----+----+----+----+----+ or +----+----+------------------------------+ + | | | | | |stz2| | | Compact Sample Size Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | |stco| | | Chunk Offset Box | + +----+----+----+----+----+ or +----+----+------------------------------+ + | | | | | |co64| | | Chunk Large Offset Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | |mvex|* | | | | | | Movie Extends Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | |trex|* | | | | | Track Extends Box | + +----+----+----+----+----+----+----+----+------------------------------+ + + Figure 1 - Basic structure of Movie Box + + It is strongly recommended that the order of boxes should + follow the above structure. Boxes marked with an asterisk + (*) may or may not be present depending on context. For + most boxes listed above, the definition is as is defined + in ISO/IEC 14496-12 [1]. The additional boxes and the + additional requirements, restrictions and recommendations + to the other boxes are described in this specification. + + 3.5 Example of Encapsulation (informative) + [File] + size = 17790 + [ftyp: File Type Box] + position = 0 + size = 24 + major_brand = mp42 : MP4 version 2 + minor_version = 0 + compatible_brands + brand[0] = mp42 : MP4 version 2 + brand[1] = isom : ISO Base Media file format + [moov: Movie Box] + position = 24 + size = 757 + [mvhd: Movie Header Box] + position = 32 + size = 108 + version = 0 + flags = 0x000000 + creation_time = UTC 2014/12/12, 18:41:19 + modification_time = UTC 2014/12/12, 18:41:19 + timescale = 48000 + duration = 33600 (00:00:00.700) + rate = 1.000000 + volume = 1.000000 + reserved = 0x0000 + reserved = 0x00000000 + reserved = 0x00000000 + transformation matrix + | a, b, u | | 1.000000, 0.000000, 0.000000 | + | c, d, v | = | 0.000000, 1.000000, 0.000000 | + | x, y, w | | 0.000000, 0.000000, 1.000000 | + pre_defined = 0x00000000 + pre_defined = 0x00000000 + pre_defined = 0x00000000 + pre_defined = 0x00000000 + pre_defined = 0x00000000 + pre_defined = 0x00000000 + next_track_ID = 2 + [iods: Object Descriptor Box] + position = 140 + size = 33 + version = 0 + flags = 0x000000 + [tag = 0x10: MP4_IOD] + expandableClassSize = 16 + ObjectDescriptorID = 1 + URL_Flag = 0 + includeInlineProfileLevelFlag = 0 + reserved = 0xf + ODProfileLevelIndication = 0xff + sceneProfileLevelIndication = 0xff + audioProfileLevelIndication = 0xfe + visualProfileLevelIndication = 0xff + graphicsProfileLevelIndication = 0xff + [tag = 0x0e: ES_ID_Inc] + expandableClassSize = 4 + Track_ID = 1 + [trak: Track Box] + position = 173 + size = 608 + [tkhd: Track Header Box] + position = 181 + size = 92 + version = 0 + flags = 0x000007 + Track enabled + Track in movie + Track in preview + creation_time = UTC 2014/12/12, 18:41:19 + modification_time = UTC 2014/12/12, 18:41:19 + track_ID = 1 + reserved = 0x00000000 + duration = 33600 (00:00:00.700) + reserved = 0x00000000 + reserved = 0x00000000 + layer = 0 + alternate_group = 0 + volume = 1.000000 + reserved = 0x0000 + transformation matrix + | a, b, u | | 1.000000, 0.000000, 0.000000 | + | c, d, v | = | 0.000000, 1.000000, 0.000000 | + | x, y, w | | 0.000000, 0.000000, 1.000000 | + width = 0.000000 + height = 0.000000 + [mdia: Media Box] + position = 273 + size = 472 + [mdhd: Media Header Box] + position = 281 + size = 32 + version = 0 + flags = 0x000000 + creation_time = UTC 2014/12/12, 18:41:19 + modification_time = UTC 2014/12/12, 18:41:19 + timescale = 48000 + duration = 34560 (00:00:00.720) + language = und + pre_defined = 0x0000 + [hdlr: Handler Reference Box] + position = 313 + size = 51 + version = 0 + flags = 0x000000 + pre_defined = 0x00000000 + handler_type = soun + reserved = 0x00000000 + reserved = 0x00000000 + reserved = 0x00000000 + name = Xiph Audio Handler + [minf: Media Information Box] + position = 364 + size = 381 + [smhd: Sound Media Header Box] + position = 372 + size = 16 + version = 0 + flags = 0x000000 + balance = 0.000000 + reserved = 0x0000 + [dinf: Data Information Box] + position = 388 + size = 36 + [dref: Data Reference Box] + position = 396 + size = 28 + version = 0 + flags = 0x000000 + entry_count = 1 + [url : Data Entry Url Box] + position = 412 + size = 12 + version = 0 + flags = 0x000001 + location = in the same file + [stbl: Sample Table Box] + position = 424 + size = 321 + [stsd: Sample Description Box] + position = 432 + size = 79 + version = 0 + flags = 0x000000 + entry_count = 1 + [fLaC: Audio Description] + position = 448 + size = 63 + reserved = 0x000000000000 + data_reference_index = 1 + reserved = 0x0000 + reserved = 0x0000 + reserved = 0x00000000 + channelcount = 2 + samplesize = 16 + pre_defined = 0 + reserved = 0 + samplerate = 48000.000000 + [dfLa: FLAC Specific Box] + position = 484 + size = 50 + version = 0 + flags = 0x000000 + [FLACMetadataBlock] + LastMetadataBlockFlag = 1 + BlockType = 0 + Length = 34 + BlockData[34]; + [stts: Decoding Time to Sample Box] + position = 492 + size = 24 + version = 0 + flags = 0x000000 + entry_count = 1 + entry[0] + sample_count = 18 + sample_delta = 1920 + [stsc: Sample To Chunk Box] + position = 516 + size = 40 + version = 0 + flags = 0x000000 + entry_count = 2 + entry[0] + first_chunk = 1 + samples_per_chunk = 13 + sample_description_index = 1 + entry[1] + first_chunk = 2 + samples_per_chunk = 5 + sample_description_index = 1 + [stsz: Sample Size Box] + position = 556 + size = 92 + version = 0 + flags = 0x000000 + sample_size = 0 (variable) + sample_count = 18 + entry_size[0] = 977 + entry_size[1] = 938 + entry_size[2] = 939 + entry_size[3] = 938 + entry_size[4] = 934 + entry_size[5] = 945 + entry_size[6] = 948 + entry_size[7] = 956 + entry_size[8] = 955 + entry_size[9] = 930 + entry_size[10] = 933 + entry_size[11] = 934 + entry_size[12] = 972 + entry_size[13] = 977 + entry_size[14] = 958 + entry_size[15] = 949 + entry_size[16] = 962 + entry_size[17] = 848 + [stco: Chunk Offset Box] + position = 648 + size = 24 + version = 0 + flags = 0x000000 + entry_count = 2 + chunk_offset[0] = 686 + chunk_offset[1] = 12985 + [free: Free Space Box] + position = 672 + size = 8 + [mdat: Media Data Box] + position = 680 + size = 17001 + +4 Acknowledgements + + This spec draws heavily from the Opus-in-ISOBMFF specification + work done by Yusuke Nakamura <muken.the.vfrmaniac |at| gmail.com> + + Thank you to Tim Terriberry, David Evans, and Yusuke Nakamura + for valuable feedback. Thank you to Ralph Giles for editorial + help. + +5 Author Address + + Monty Montgomery <cmontgomery@mozilla.com> |