1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
|
---
title: OPUS-A2DP-0.5 specification
author: Pauli Virtanen <pav@iki.fi>
date: Jun 4, 2022
---
# OPUS-A2DP-0.5 specification
In this file, a way to use Opus as an A2DP vendor codec is specified.
We will call this "OPUS-A2DP-0.5". There is no previous public
specification for using Opus as an A2DP vendor codec (to my
knowledge), which is why we need this one.
[[_TOC_]]
# Media Codec Capabilities
The Media Codec Specific Information Elements ([AVDTP v1.3], §8.21.5)
capability and configuration structure is as follows:
| Octet | Bits | Meaning |
|-------|------|-----------------------------------------------|
| 0-5 | 0-7 | Vendor ID Part |
| 6-7 | 0-7 | Channel Configuration |
| 8-11 | 0-7 | Audio Location Configuration |
| 12-14 | 0-7 | Limits Configuration |
| 15-16 | 0-7 | Return Direction Channel Configuration |
| 17-20 | 0-7 | Return Direction Audio Location Configuration |
| 21-23 | 0-7 | Return Direction Limits Configuration |
All integer fields and multi-byte bitfields are laid out in **little
endian** order. All integer fields are unsigned.
Each entry may have different meaning when present as a capability.
Below, we indicate this by abbreviations CAP for capability and SEL
for the value selected by SRC.
Bits in fields marked RFA (Reserved For Additions) shall be set to
zero.
> **Note**
>
> See `a2dp-codec-caps.h` for definition as C structs.
## Vendor ID Part
The fixed value
| Octet | Bits | Meaning |
|-------|------|-------------------------------|
| 0-3 | 0-7 | A2DP Vendor ID (0x05F1) |
| 4-5 | 0-7 | A2DP Vendor Codec ID (0x1005) |
> **Note**
>
> The Vendor ID is that of the Linux Foundation, and we are using it
> here unofficially.
## Channel Configuration
The channel configuration consists of the channel count, and the count
of coupled streams. The latter indicates which channels are encoded as
left/right pairs, as defined in Sec. 5.1.1 of Opus Ogg Encapsulation [RFC7845].
| Octet | Bits | Meaning |
|-------|------|------------------------------------------------------------|
| 6 | 0-7 | Channel Count. CAP: maximum number supported. SEL: actual. |
| 7 | 0-7 | Coupled Stream Count. CAP: 0. SEL: actual. |
The Channel Count indicates the number of logical channels encoded in
the data stream.
The Coupled Stream Count indicates the number of streams that encode a
coupled (left & right) channel pair. The count shall satisfy
`(Channel Count) >= 2*(Coupled Stream Count)`.
The Stream Count is `(Channel Count) - (Coupled Stream Count)`.
The logical Channels are identified by a Channel Index *j* such that `0 <= j
< (Channel Count)`. The channels `0 <= j < 2*(Coupled Stream Count)`
are encoded in the *k*-th stream of the payload, where `k = floor(j/2)` and
`j mod 2` determines which of the two channels of the stream the logical
channel is. The channels `2*(Coupled Stream Count) <= j < (Channel Count)`
are encoded in the *k*-th stream of the payload, where `k = j - (Coupled Stream Count)`.
> **Note**
>
> The prescription here is identical to [RFC7845] with channel mapping
> `mapping[j] = j`. We do not want to include the mapping table in the
> A2DP capabilities, so it is assumed to be trivial.
## Audio Location Configuration
The semantic meaning for each channel is determined by their Audio
Location bitfield.
| Octet | Bits | Meaning |
|-------|------|------------------------------------------------------|
| 8-11 | 0-7 | Audio Location bitfield. CAP: available. SEL: actual |
The values specified in CAP are informative, and SEL may contain bits
that were not set in CAP. SNK shall handle unsupported audio
locations. It may do this for example by ignoring unsupported channels
or via suitable up/downmixing. Hence, SRC may transmit channels with
audio locations that are not marked supported by SNK.
The audio location bit values are:
| Channel Order | Bitmask | Audio Location |
|---------------|------------|-------------------------|
| 0 | 0x00000001 | Front Left |
| 1 | 0x00000002 | Front Right |
| 2 | 0x00000400 | Side Left |
| 3 | 0x00000800 | Side Right |
| 4 | 0x00000010 | Back Left |
| 5 | 0x00000020 | Back Right |
| 6 | 0x00000040 | Front Left of Center |
| 7 | 0x00000080 | Front Right of Center |
| 8 | 0x00001000 | Top Front Left |
| 9 | 0x00002000 | Top Front Right |
| 10 | 0x00040000 | Top Side Left |
| 11 | 0x00080000 | Top Side Right |
| 12 | 0x00010000 | Top Back Left |
| 13 | 0x00020000 | Top Back Right |
| 14 | 0x00400000 | Bottom Front Left |
| 15 | 0x00800000 | Bottom Front Right |
| 16 | 0x01000000 | Front Left Wide |
| 17 | 0x02000000 | Front Right Wide |
| 18 | 0x04000000 | Left Surround |
| 19 | 0x08000000 | Right Surround |
| 20 | 0x00000004 | Front Center |
| 21 | 0x00000100 | Back Center |
| 22 | 0x00004000 | Top Front Center |
| 23 | 0x00008000 | Top Center |
| 24 | 0x00100000 | Top Back Center |
| 25 | 0x00200000 | Bottom Front Center |
| 26 | 0x00000008 | Low Frequency Effects 1 |
| 27 | 0x00000200 | Low Frequency Effects 2 |
| 28 | 0x10000000 | RFA |
| 29 | 0x20000000 | RFA |
| 30 | 0x40000000 | RFA |
| 31 | 0x80000000 | RFA |
Each bit value is associated with a Channel Order. The bits set in
the bitfield define audio locations for the streams present in the
payload. The set bit with the smallest Channel Order value defines the
audio location for the Channel Index *j=0*, the bit with the next
lowest Channel Order value defines the audio location for the Channel
Index *j=1*, and so forth.
When the Channel Count is larger than the number of bits set in the
Audio Location bitfield, the audio locations of the remaining channels
are unspecified. Implementations may handle them as appropriate for
their use case, considering them as AUX0–AUXN, or in the case of
Channel Count = 1, as the single mono audio channel.
When the Channel Count is smaller than the number of bits set in the
Audio Location bitfield, the audio locations for the channels are
assigned as above, and remaining excess bits shall be ignored.
> **Note**
>
> The channel audio location specification is similar to the location
> bitfield of the `Audio_Channel_Allocation` LTV structure in Bluetooth
> SIG [Assigned Numbers, Generic Audio] used in the LE Audio, and the
> bitmasks defined above are the same.
>
> The channel ordering differs from LE Audio, and is defined here to be
> compatible with the internal stream ordering in the reference Opus
> Multistream surround encoder Mapping Family 0 and 1 output. This
> allows making use of its surround masking and LFE handling
> capabilities. The stream ordering of the reference Opus surround
> encoder, although being unchanged since its addition in 2013, is an
> internal detail of the encoder. Implementations using the surround
> encoder need to check that the mapping table used by the encoder
> corresponds to the above channel ordering.
>
> For reference, we list the Audio Location bitfield values
> corresponding to the different channel counts in Opus Mapping Family 0
> and 1 surround encoder output, and the expected mapping table:
>
> | Mapping Family | Channel Count | Audio Location Value | Stream Ordering | Mapping Table |
> |----------------|---------------|----------------------|---------------------------------|--------------------------|
> | 0 | 1 | 0x00000000 | mono | {0} |
> | 0 | 2 | 0x00000003 | FL, FR | {0, 1} |
> | 1 | 1 | 0x00000000 | mono | {0} |
> | 1 | 2 | 0x00000003 | FL, FR | {0, 1} |
> | 1 | 3 | 0x00000007 | FL, FR, FC | {0, 2, 1} |
> | 1 | 4 | 0x00000033 | FL, FR, BL, BR | {0, 1, 2, 3} |
> | 1 | 5 | 0x00000037 | FL, FR, BL, BR, FC | {0, 4, 1, 2, 3} |
> | 1 | 6 | 0x0000003f | FL, FR, BL, BR, FC, LFE | {0, 4, 1, 2, 3, 5} |
> | 1 | 7 | 0x00000d0f | FL, FR, SL, SR, FC, BC, LFE | {0, 4, 1, 2, 3, 5, 6} |
> | 1 | 8 | 0x00000c3f | FL, FR, SL, SR, BL, BR, FC, LFE | {0, 6, 1, 2, 3, 4, 5, 7} |
>
> The Mapping Table in the table indicates the mapping table selected by
> `opus_multistream_surround_encoder_create` (Opus 1.3.1). If the
> encoder outputs a different mapping table in a future Opus encoder
> release, the channel ordering will be incorrect, and the surround
> encoder can not be used. We expect that the probability of the Opus
> encoder authors making such changes is negligible.
## Limits Configuration
The limits for allowed frame durations and maximum bitrate can also be
configured.
| Octet | Bits | Meaning |
|-------|------|-----------------------------------------------------|
| 16 | 0 | Frame duration 2.5ms. CAP: supported, SEL: selected |
| 16 | 1 | Frame duration 5ms. CAP: supported, SEL: selected |
| 16 | 2 | Frame duration 10ms. CAP: supported, SEL: selected |
| 16 | 3 | Frame duration 20ms. CAP: supported, SEL: selected |
| 16 | 4 | Frame duration 40ms. CAP: supported, SEL: selected |
| 16 | 5-7 | RFA |
| Octet | Bits | Meaning |
|-------|------|------------------------------------------------|
| 17-18 | 0-7 | Maximum bitrate. CAP: supported, SEL: selected |
The maximum bitrate is given in units of 1024 bits per second.
The maximum bitrate field in CAP may contain value 0 to indicate
everything is supported.
## Bidirectional Audio Configuration
Bidirectional audio may be supported. Its Channel Configuration, Audio
Location Configuration, and Limits Configuration have identical form
to the forward direction, and represented by exactly similar
structures.
Namely:
| Octet | Bits | Meaning |
|-------|------|----------------------------------------------------|
| 19-20 | 0-7 | Channel Configuration fields, for return direction |
| 21-28 | 0-7 | Audio Location fields, for return direction |
| 29-31 | 0-7 | Limits Configuration fields, for return direction |
If no return channel is supported or selected, the number of channels
is set to 0 in CAP or SEL.
> **Note**
>
> This is a nonstandard extension to A2DP. The return direction audio
> data is simply sent back via the underlying L2CAP connection, which
> is bidirectional, in the same format as the forward direction audio.
> This is similar to what aptX-LL and FastStream do.
# Packet Structure
Each packet consists of an RTP header, an RTP payload header, and a
payload containing Opus Multistream data.
| Octet | Bits | Meaning |
|-------|------|--------------------------|
| 0-11 | 0-7 | RTP header |
| 12 | 0-7 | RTP payload header |
| 13-N | 0-7 | Opus Multistream payload |
For each Bluetooth packet, the payload shall contain exactly one Opus
Multistream packet, or a fragment of one. The Opus Multistream packet
may be fragmented to several consecutive Bluetooth packets.
The format of the Multistream data is the same as in the audio packets
of [RFC7845], or, as produced/consumed by the Opus Multistream API.
> **Note**
>
> We DO NOT follow [RFC7587], as we want fragmentation and multichannel support.
## RTP Header
See [RFC3550].
The RTP payload type is pt=96 (dynamic).
## RTP Payload Header
The RTP payload header is used to indicate if and how the Opus
Multistream packet is fragmented across several consecutive Bluetooth
packets.
| Octet | Bits | Meaning
|--------|------|--------------------------------------------------------
| 0 | 0-3 | Frame Count
| 4 | 4 | RFA
| 4 | 5 | Is Last Fragment
| 4 | 6 | Is First Fragment
| 4 | 7 | Is Fragmented
In each packet, Frame Count indicates how many Bluetooth packets are
still to be received (including the present packet) before the Opus
Multistream packet is complete.
The Is Fragment flag indicates whether the present packet contains
fragmented payload.
The Is Last Fragment flag indicates whether the present packet is the
last part of fragmented payload.
The Is First Fragment flag indicates whether the present packet is the
first part of fragmented payload.
In non-fragmented packets, Frame Count shall be (1), and the other bits
in the header zero.
## Opus Payload
The Opus payload is a single Opus Multistream packet, or its fragment.
In case of fragmentation, as indicated by the RTP payload header,
concatenating the payloads of the fragment Bluetooth packets shall
yield the total Opus Multistream packet.
The SRC should choose encoder parameters such that Bluetooth bandwidth
limitations are not exceeded.
The SRC may include FEC data. The SNK may enable forward error
correction instead of PLC.
# References
1. Bluetooth [AVDTP v1.3]
2. IETF [RFC3550]
3. IETF [RFC7587]
4. IETF [RFC7845]
5. Bluetooth [Assigned Numbers, Generic Audio]
[AVDTP v1.3]: https://www.bluetooth.com/specifications/specs/a-v-distribution-transport-protocol-1-3/
[RFC3550]: https://datatracker.ietf.org/doc/html/rfc3550
[RFC7587]: https://datatracker.ietf.org/doc/html/rfc7587
[RFC7845]: https://datatracker.ietf.org/doc/html/rfc7845
[Assigned Numbers, Generic Audio]: https://www.bluetooth.com/specifications/assigned-numbers/
|