summaryrefslogtreecommitdiffstats
path: root/third_party/libwebrtc/modules/audio_processing/test/conversational_speech/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'third_party/libwebrtc/modules/audio_processing/test/conversational_speech/README.md')
-rw-r--r--third_party/libwebrtc/modules/audio_processing/test/conversational_speech/README.md74
1 files changed, 74 insertions, 0 deletions
diff --git a/third_party/libwebrtc/modules/audio_processing/test/conversational_speech/README.md b/third_party/libwebrtc/modules/audio_processing/test/conversational_speech/README.md
new file mode 100644
index 0000000000..0fa66669e6
--- /dev/null
+++ b/third_party/libwebrtc/modules/audio_processing/test/conversational_speech/README.md
@@ -0,0 +1,74 @@
+# Conversational Speech generator tool
+
+Tool to generate multiple-end audio tracks to simulate conversational speech
+with two or more participants.
+
+The input to the tool is a directory containing a number of audio tracks and
+a text file indicating how to time the sequence of speech turns (see the Example
+section).
+
+Since the timing of the speaking turns is specified by the user, the generated
+tracks may not be suitable for testing scenarios in which there is unpredictable
+network delay (e.g., end-to-end RTC assessment).
+
+Instead, the generated pairs can be used when the delay is constant (obviously
+including the case in which there is no delay).
+For instance, echo cancellation in the APM module can be evaluated using two-end
+audio tracks as input and reverse input.
+
+By indicating negative and positive time offsets, one can reproduce cross-talk
+(aka double-talk) and silence in the conversation.
+
+### Example
+
+For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A)
+and b1, b2 (speaker B).
+The text file with the timing information may look like this:
+
+```
+A a1 0
+B b1 0
+A a2 100
+B b2 -200
+A a3 0
+A a4 0
+```
+
+The first column indicates the speaker name, the second contains the audio track
+file names, and the third the offsets (in milliseconds) used to concatenate the
+chunks. An optional fourth column contains positive or negative integral gains
+in dB that will be applied to the tracks. It's possible to specify the gain for
+some turns but not for others. If the gain is left out, no gain is applied.
+
+Assume that all the audio tracks in the example above are 1000 ms long.
+The tool will then generate two tracks (A and B) that look like this:
+
+**Track A**
+```
+ a1 (1000 ms)
+ silence (1100 ms)
+ a2 (1000 ms)
+ silence (800 ms)
+ a3 (1000 ms)
+ a4 (1000 ms)
+```
+
+**Track B**
+```
+ silence (1000 ms)
+ b1 (1000 ms)
+ silence (900 ms)
+ b2 (1000 ms)
+ silence (2000 ms)
+```
+
+The two tracks can be also visualized as follows (one characheter represents
+100 ms, "." is silence and "*" is speech).
+
+```
+t: 0 1 2 3 4 5 6 (s)
+A: **********...........**********........********************
+B: ..........**********.........**********....................
+ ^ 200 ms cross-talk
+ 100 ms silence ^
+```