summaryrefslogtreecommitdiffstats
path: root/doc/isoflac.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/isoflac.txt')
-rw-r--r--doc/isoflac.txt666
1 files changed, 666 insertions, 0 deletions
diff --git a/doc/isoflac.txt b/doc/isoflac.txt
new file mode 100644
index 0000000..888aed7
--- /dev/null
+++ b/doc/isoflac.txt
@@ -0,0 +1,666 @@
+Encapsulation of FLAC in ISO Base Media File Format
+Version 0.0.4
+
+Table of Contents
+1 Scope
+2 Supporting Normative References
+3 Design Rules of Encapsulation
+ 3.1 File Type Identification
+ 3.2 Overview of Track Structure
+ 3.3 Definition of FLAC sample
+ 3.3.1 Sample entry format
+ 3.3.2 FLAC Specific Box
+ 3.3.3 Sample format
+ 3.3.4 Duration of FLAC sample
+ 3.3.5 Sub-sample
+ 3.3.6 Random Access
+ 3.3.6.1 Random Access Point
+ 3.4 Basic Structure (informative)
+ 3.4.1 Initial Movie
+ 3.5 Example of Encapsulation (informative)
+4 Acknowledgements
+5 Author's Address
+
+1 Scope
+
+ This document specifies the normative mapping for encapsulation of
+ FLAC coded audio bitstreams in ISO Base Media file format and its
+ derivatives. The encapsulation of FLAC coded bitstreams in
+ QuickTime file format is outside the scope of this specification.
+
+2 Supporting Normative References
+
+ [1] ISO/IEC 14496-12:2012 Corrected version
+
+ Information technology — Coding of audio-visual objects — Part
+ 12: ISO base media file format
+
+ [2] ISO/IEC 14496-12:2012/Amd.1:2013
+
+ Information technology — Coding of audio-visual objects — Part
+ 12: ISO base media file format AMENDMENT 1: Various
+ enhancements including support for large metadata
+
+ [3] FLAC format specification
+
+ https://xiph.org/flac/format.html
+
+ Definition of the FLAC Audio Codec stream format
+
+ [4] FLAC-in-Ogg mapping specification
+
+ https://xiph.org/flac/ogg_mapping.html
+
+ Ogg Encapsulation for the FLAC Audio Codec
+
+ [5] Matroska specification
+
+3 Design Rules of Encapsulation
+
+ 3.1 File Type Identification
+
+ This specification does not define any brand to declare files
+ which conform to this specification. Files which conform to
+ this specification shall contain at least one brand which
+ supports the requirements and the requirements described in
+ this clause without contradiction in the compatible brands
+ list of the File Type Box. The minimal support of the
+ encapsulation of FLAC bitstreams in ISO Base Media file format
+ requires the 'isom' brand.
+
+ 3.2 Overview of Track Structure
+
+ FLAC coded audio shall be encapsulated into the ISO Base
+ Media File Format as media data within an audio track.
+
+ + The handler_type field in the Handler Reference Box
+ shall be set to 'soun'.
+
+ + The Media Information Box shall contain the Sound Media
+ Header Box.
+
+ + The codingname of the sample entry is 'fLaC'.
+
+ This specification does not define any encapsulation
+ using MP4AudioSampleEntry with objectTypeIndication
+ specified by the MPEG-4 Registration Authority
+ (http://www.mp4ra.org/). See section 'Sample entry
+ format' for the definition of the sample entry.
+
+ + The 'dfLa' box is added to the sample entry to convey
+ initializing information for the decoder.
+
+ See section 'FLAC Specific Box' for the definition of
+ the box contents.
+
+ + A FLAC sample is exactly one FLAC frame as described
+ in the format specification[3]. See section
+ 'Sample format' for details of the frame contents.
+
+ + Every FLAC sample is a sync sample. No pre-roll or
+ lapping is required. See section 'Random Access' for
+ further details.
+
+ 3.3 Definition of a FLAC sample
+
+ 3.3.1 Sample entry format
+
+ For any track containing one or more FLAC bitstreams, a
+ sample entry describing the corresponding FLAC bitstream
+ shall be present inside the Sample Table Box. This version
+ of the specification defines only one sample entry format
+ named FLACSampleEntry whose codingname is 'fLaC'. This
+ sample entry includes exactly one FLAC Specific Box
+ defined in section 'FLAC specific box' as a mandatory box
+ and indicates that FLAC samples described by this sample
+ entry are stored by the sample format described in section
+ 'Sample format'.
+
+ The syntax and semantics of the FLACSampleEntry is shown
+ as follows. The data fields of this box and native
+ FLAC[3] structures encoded within FLAC blocks are both
+ stored in big-endian format, though for purposes of the
+ ISO BMFF container, FLAC native metadata and data blocks
+ are treated as unstructured octet streams.
+
+ class FLACSampleEntry() extends AudioSampleEntry ('fLaC'){
+ FLACSpecificBox();
+ }
+
+ The fields of the AudioSampleEntry portion shall be set as
+ follows:
+
+ + channelcount:
+
+ The channelcount field shall be set equal to the
+ channel count specified by the FLAC bitstream's native
+ METADATA_BLOCK_STREAMINFO header as described in [3].
+ Note that the FLAC FRAME_HEADER structure that begins
+ each FLAC sample redundantly encodes channel number;
+ the number of channels declared in each FRAME_HEADER
+ MUST match the number of channels declared here and in
+ the METADATA_BLOCK_STREAMINFO header.
+
+ + samplesize:
+
+ The samplesize field shall be set equal to the bits
+ per sample specified by the FLAC bitstream's native
+ METADATA_BLOCK_STREAMINFO header as described in [3].
+ Note that the FLAC FRAME_HEADER structure that begins
+ each FLAC sample redundantly encodes the number of
+ bits per sample; the bits per sample declared in each
+ FRAME_HEADER MUST match the samplesize declared here
+ and the bits per sample field declared in the
+ METADATA_BLOCK_STREAMINFO header.
+
+ + samplerate:
+
+ When possible, the samplerate field shall be set
+ equal to the sample rate specified by the FLAC
+ bitstream's native METADATA_BLOCK_STREAMINFO header
+ as described in [3], left-shifted by 16 bits to
+ create the appropriate 16.16 fixed-point
+ representation.
+
+ When the bitstream's native sample rate is greater
+ than the maximum expressible value of 65535 Hz,
+ the samplerate field shall hold the greatest
+ expressible regular division of that rate. I.e.
+ the samplerate field shall hold 48000.0 for
+ native sample rates of 96 and 192 kHz. In the
+ case of unusual sample rates which do not have
+ an expressible regular division, the maximum value
+ of 65535.0 Hz should be used.
+
+ High-rate FLAC bitstreams are common, and the native
+ value from the METADATA_BLOCK_STREAMINFO header in
+ the FLACSpecificBox MUST be read to determine the
+ correct sample rate of the bitstream.
+
+ Note that the FLAC FRAME_HEADER structure that begins
+ each FLAC sample redundantly encodes the sample rate;
+ the sample rate declared in each FRAME_HEADER MUST
+ match the sample rate declared in the
+ METADATA_BLOCK_STREAMINFO header, and here in the
+ AudioSampleEntry portion of the FLACSampleEntry
+ as much as is allowed by the encoding restrictions
+ described above.
+
+ Finally, the FLACSpecificBox carries codec headers:
+
+ + FLACSpecificBox
+
+ This box contains initializing information for the
+ decoder as defined in section 'FLAC specific box'.
+
+ 3.3.2 FLAC Specific Box
+
+ Exactly one FLAC Specific Box shall be present in each
+ FLACSampleEntry. This specification defines version 0
+ of this box. If incompatible changes occur in future
+ versions of this specification, another version number
+ will be defined. The data fields of this box and native
+ FLAC[3] structures encoded within FLAC blocks are both
+ stored in big-endian format, though for purposes of the
+ ISO BMFF container, FLAC native metadata and data blocks
+ are treated as unstructured octet streams.
+
+ The syntax and semantics of the FLAC Specific Box is shown
+ as follows.
+
+ class FLACMetadataBlock {
+ unsigned int(1) LastMetadataBlockFlag;
+ unsigned int(7) BlockType;
+ unsigned int(24) Length;
+ unsigned int(8) BlockData[Length];
+ }
+
+ aligned(8) class FLACSpecificBox
+ extends FullBox('dfLa', version=0, 0){
+ for (i=0; ; i++) { // to end of box
+ FLACMetadataBlock();
+ }
+ }
+
+ + Version:
+
+ The Version field shall be set to 0.
+
+ In the future versions of this specification, this
+ field may be set to other values. And without support
+ of those values, the reader shall not read the fields
+ after this within the FLACSpecificBox.
+
+ + Flags:
+
+ The Flags field shall be set to 0.
+
+ After the FullBox header, the box contains a sequence of
+ FLAC[3] native-metadata block structures that fill the
+ remainder of the box.
+
+ Each FLACMetadataBlock structure consists of three fields
+ filling a total of four bytes that form a FLAC[3] native
+ METADATA_BLOCK_HEADER, followed by raw octet bytes that
+ comprise the FLAC[3] native METADATA_BLOCK_DATA.
+
+ + LastMetadataBlockFlag:
+
+ The LastMetadataBlockFlag field maps semantically to
+ the FLAC[3] native METADATA_BLOCK_HEADER
+ Last-metadata-block flag as defined in the FLAC[3]
+ file specification.
+
+ The LastMetadataBlockFlag is set to 1 if this
+ MetadataBlock is the last metadata block in the
+ FLACSpecificBox. It is set to 0 otherwise.
+
+ + BlockType:
+
+ The BlockType field maps semantically to the FLAC[3]
+ native METADATA_BLOCK_HEADER BLOCK_TYPE field as
+ defined in the FLAC[3] file specification.
+
+ The BlockType is set to a valid FLAC[3] BLOCK_TYPE
+ value that identifies the type of this native metadata
+ block. The BlockType of the first FLACMetadataBlock
+ must be set to 0, signifying this is a FLAC[3] native
+ METADATA_BLOCK_STREAMINFO block.
+
+ + Length:
+
+ The Length field maps semantically to the FLAC[3]
+ native METADATA_BLOCK_HEADER Length field as
+ defined in the FLAC[3] file specification.
+
+ The length field specifies the number of bytes of
+ MetadataBlockData to follow.
+
+ + BlockData
+
+ The BlockData field maps semantically to the FLAC[3]
+ native METADATA_BLOCK_HEADER METADATA_BLOCK_DATA as
+ defined in the FLAC[3] file specification.
+
+ Taken together, the bytes of the FLACMetadataBlock form a
+ complete FLAC[3] native METADATA_BLOCK structure.
+
+ Note that a minimum of a single FLACMetadataBlock,
+ consisting of a FLAC[3] native METADATA_BLOCK_STREAMINFO
+ structure, is required. Should the FLACSpecificBox
+ contain more than a single FLACMetadataBlock structure,
+ the FLACMetadataBlock containing the FLAC[3] native
+ METADATA_BLOCK_STREAMINFO must occur first in the list.
+
+ Other containers that package FLAC audio streams, such as
+ Ogg[4] and Matroska[5], wrap FLAC[3] native metadata without
+ modification similar to this specification. When
+ repackaging or remuxing FLAC[3] streams from another
+ format that contains FLAC[3] native metadata into an ISO
+ BMFF file, the complete FLAC[3] native metadata should be
+ preserved in the ISO BMFF stream as described above. It
+ is also allowed to parse this native metadata and include
+ contextually redundant ISO BMFF-native repackagings and/or
+ reparsings of FLAC[3] native metadata, so long as the
+ native metadata is also preserved.
+
+ 3.3.3 Sample format
+
+ A FLAC sample is exactly one FLAC audio FRAME (as defined
+ in the FLAC[3] file specification) belonging to a FLAC
+ bitstreams. The FLAC sample data begins with a complete
+ FLAC FRAME_HEADER, followed by one FLAC SUBFRAME per
+ channel, any necessary bit padding, and ends with the
+ usual FLAC FRAME_FOOTER.
+
+ Note that the FLAC native FRAME_HEADER structure that
+ begins each FLAC sample redundantly encodes channel count,
+ sample rate, and sample size. The values of these fields
+ must agree both with the values declared in the FLAC
+ METADATA_BLOCK_STREAMINFO structure as well as the
+ FLACSampleEntry box.
+
+ 3.3.4 Duration of a FLAC sample
+
+ The duration of any given FLAC sample is determined by
+ dividing the decoded block size of a FLAC frame, as
+ encoded in the FLAC FRAME's FRAME_HEADER structure, by the
+ value of the timescale field in the Media Header Box.
+ FLAC samples are permitted to have variable durations
+ within a given audio stream. FLAC does not use padding
+ values.
+
+ 3.3.5 Sub-sample
+
+ Sub-samples are not defined for FLAC samples in this
+ specification.
+
+ 3.3.6 Random Access
+
+ This subclause describes the nature of the random access
+ of FLAC sample.
+
+ 3.3.6.1 Random Access Point
+
+ All FLAC samples can be independently decoded
+ i.e. every FLAC sample is a sync sample. The Sync
+ Sample Box shall not be present as long as there are
+ no samples other than FLAC samples in the same
+ track. The sample_is_non_sync_sample field for FLAC
+ samples shall be set to 0.
+
+ 3.4 Basic Structure (informative)
+
+ 3.4.1 Initial Movie
+
+ This subclause shows a basic structure of the Movie Box as follows:
+
+ +----+----+----+----+----+----+----+----+------------------------------+
+ |moov| | | | | | | | Movie Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | |mvhd| | | | | | | Movie Header Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | |trak| | | | | | | Track Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | |tkhd| | | | | | Track Header Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | |edts|* | | | | | Edit Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | |elst|* | | | | Edit List Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | |mdia| | | | | | Media Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | |mdhd| | | | | Media Header Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | |hdlr| | | | | Handler Reference Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | |minf| | | | | Media Information Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | | |smhd| | | | Sound Media Header Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | | |dinf| | | | Data Information Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | | | |dref| | | Data Reference Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | | | | |url | | DataEntryUrlBox |
+ +----+----+----+----+----+----+ or +----+------------------------------+
+ | | | | | | |urn | | DataEntryUrnBox |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | | |stbl| | | | Sample Table |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | | | |stsd| | | Sample Description Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | | | | |fLaC| | FLACSampleEntry |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | | | | | |dfLa| FLAC Specific Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | | | |stts| | | Decoding Time to Sample Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | | | |stsc| | | Sample To Chunk Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | | | |stsz| | | Sample Size Box |
+ +----+----+----+----+----+ or +----+----+------------------------------+
+ | | | | | |stz2| | | Compact Sample Size Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | | | | |stco| | | Chunk Offset Box |
+ +----+----+----+----+----+ or +----+----+------------------------------+
+ | | | | | |co64| | | Chunk Large Offset Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | |mvex|* | | | | | | Movie Extends Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+ | | |trex|* | | | | | Track Extends Box |
+ +----+----+----+----+----+----+----+----+------------------------------+
+
+ Figure 1 - Basic structure of Movie Box
+
+ It is strongly recommended that the order of boxes should
+ follow the above structure. Boxes marked with an asterisk
+ (*) may or may not be present depending on context. For
+ most boxes listed above, the definition is as is defined
+ in ISO/IEC 14496-12 [1]. The additional boxes and the
+ additional requirements, restrictions and recommendations
+ to the other boxes are described in this specification.
+
+ 3.5 Example of Encapsulation (informative)
+ [File]
+ size = 17790
+ [ftyp: File Type Box]
+ position = 0
+ size = 24
+ major_brand = mp42 : MP4 version 2
+ minor_version = 0
+ compatible_brands
+ brand[0] = mp42 : MP4 version 2
+ brand[1] = isom : ISO Base Media file format
+ [moov: Movie Box]
+ position = 24
+ size = 757
+ [mvhd: Movie Header Box]
+ position = 32
+ size = 108
+ version = 0
+ flags = 0x000000
+ creation_time = UTC 2014/12/12, 18:41:19
+ modification_time = UTC 2014/12/12, 18:41:19
+ timescale = 48000
+ duration = 33600 (00:00:00.700)
+ rate = 1.000000
+ volume = 1.000000
+ reserved = 0x0000
+ reserved = 0x00000000
+ reserved = 0x00000000
+ transformation matrix
+ | a, b, u | | 1.000000, 0.000000, 0.000000 |
+ | c, d, v | = | 0.000000, 1.000000, 0.000000 |
+ | x, y, w | | 0.000000, 0.000000, 1.000000 |
+ pre_defined = 0x00000000
+ pre_defined = 0x00000000
+ pre_defined = 0x00000000
+ pre_defined = 0x00000000
+ pre_defined = 0x00000000
+ pre_defined = 0x00000000
+ next_track_ID = 2
+ [iods: Object Descriptor Box]
+ position = 140
+ size = 33
+ version = 0
+ flags = 0x000000
+ [tag = 0x10: MP4_IOD]
+ expandableClassSize = 16
+ ObjectDescriptorID = 1
+ URL_Flag = 0
+ includeInlineProfileLevelFlag = 0
+ reserved = 0xf
+ ODProfileLevelIndication = 0xff
+ sceneProfileLevelIndication = 0xff
+ audioProfileLevelIndication = 0xfe
+ visualProfileLevelIndication = 0xff
+ graphicsProfileLevelIndication = 0xff
+ [tag = 0x0e: ES_ID_Inc]
+ expandableClassSize = 4
+ Track_ID = 1
+ [trak: Track Box]
+ position = 173
+ size = 608
+ [tkhd: Track Header Box]
+ position = 181
+ size = 92
+ version = 0
+ flags = 0x000007
+ Track enabled
+ Track in movie
+ Track in preview
+ creation_time = UTC 2014/12/12, 18:41:19
+ modification_time = UTC 2014/12/12, 18:41:19
+ track_ID = 1
+ reserved = 0x00000000
+ duration = 33600 (00:00:00.700)
+ reserved = 0x00000000
+ reserved = 0x00000000
+ layer = 0
+ alternate_group = 0
+ volume = 1.000000
+ reserved = 0x0000
+ transformation matrix
+ | a, b, u | | 1.000000, 0.000000, 0.000000 |
+ | c, d, v | = | 0.000000, 1.000000, 0.000000 |
+ | x, y, w | | 0.000000, 0.000000, 1.000000 |
+ width = 0.000000
+ height = 0.000000
+ [mdia: Media Box]
+ position = 273
+ size = 472
+ [mdhd: Media Header Box]
+ position = 281
+ size = 32
+ version = 0
+ flags = 0x000000
+ creation_time = UTC 2014/12/12, 18:41:19
+ modification_time = UTC 2014/12/12, 18:41:19
+ timescale = 48000
+ duration = 34560 (00:00:00.720)
+ language = und
+ pre_defined = 0x0000
+ [hdlr: Handler Reference Box]
+ position = 313
+ size = 51
+ version = 0
+ flags = 0x000000
+ pre_defined = 0x00000000
+ handler_type = soun
+ reserved = 0x00000000
+ reserved = 0x00000000
+ reserved = 0x00000000
+ name = Xiph Audio Handler
+ [minf: Media Information Box]
+ position = 364
+ size = 381
+ [smhd: Sound Media Header Box]
+ position = 372
+ size = 16
+ version = 0
+ flags = 0x000000
+ balance = 0.000000
+ reserved = 0x0000
+ [dinf: Data Information Box]
+ position = 388
+ size = 36
+ [dref: Data Reference Box]
+ position = 396
+ size = 28
+ version = 0
+ flags = 0x000000
+ entry_count = 1
+ [url : Data Entry Url Box]
+ position = 412
+ size = 12
+ version = 0
+ flags = 0x000001
+ location = in the same file
+ [stbl: Sample Table Box]
+ position = 424
+ size = 321
+ [stsd: Sample Description Box]
+ position = 432
+ size = 79
+ version = 0
+ flags = 0x000000
+ entry_count = 1
+ [fLaC: Audio Description]
+ position = 448
+ size = 63
+ reserved = 0x000000000000
+ data_reference_index = 1
+ reserved = 0x0000
+ reserved = 0x0000
+ reserved = 0x00000000
+ channelcount = 2
+ samplesize = 16
+ pre_defined = 0
+ reserved = 0
+ samplerate = 48000.000000
+ [dfLa: FLAC Specific Box]
+ position = 484
+ size = 50
+ version = 0
+ flags = 0x000000
+ [FLACMetadataBlock]
+ LastMetadataBlockFlag = 1
+ BlockType = 0
+ Length = 34
+ BlockData[34];
+ [stts: Decoding Time to Sample Box]
+ position = 492
+ size = 24
+ version = 0
+ flags = 0x000000
+ entry_count = 1
+ entry[0]
+ sample_count = 18
+ sample_delta = 1920
+ [stsc: Sample To Chunk Box]
+ position = 516
+ size = 40
+ version = 0
+ flags = 0x000000
+ entry_count = 2
+ entry[0]
+ first_chunk = 1
+ samples_per_chunk = 13
+ sample_description_index = 1
+ entry[1]
+ first_chunk = 2
+ samples_per_chunk = 5
+ sample_description_index = 1
+ [stsz: Sample Size Box]
+ position = 556
+ size = 92
+ version = 0
+ flags = 0x000000
+ sample_size = 0 (variable)
+ sample_count = 18
+ entry_size[0] = 977
+ entry_size[1] = 938
+ entry_size[2] = 939
+ entry_size[3] = 938
+ entry_size[4] = 934
+ entry_size[5] = 945
+ entry_size[6] = 948
+ entry_size[7] = 956
+ entry_size[8] = 955
+ entry_size[9] = 930
+ entry_size[10] = 933
+ entry_size[11] = 934
+ entry_size[12] = 972
+ entry_size[13] = 977
+ entry_size[14] = 958
+ entry_size[15] = 949
+ entry_size[16] = 962
+ entry_size[17] = 848
+ [stco: Chunk Offset Box]
+ position = 648
+ size = 24
+ version = 0
+ flags = 0x000000
+ entry_count = 2
+ chunk_offset[0] = 686
+ chunk_offset[1] = 12985
+ [free: Free Space Box]
+ position = 672
+ size = 8
+ [mdat: Media Data Box]
+ position = 680
+ size = 17001
+
+4 Acknowledgements
+
+ This spec draws heavily from the Opus-in-ISOBMFF specification
+ work done by Yusuke Nakamura <muken.the.vfrmaniac |at| gmail.com>
+
+ Thank you to Tim Terriberry, David Evans, and Yusuke Nakamura
+ for valuable feedback. Thank you to Ralph Giles for editorial
+ help.
+
+5 Author Address
+
+ Monty Montgomery <cmontgomery@mozilla.com>