diff options
Diffstat (limited to 'docs/dev/AudioInputDebug.md')
-rw-r--r-- | docs/dev/AudioInputDebug.md | 138 |
1 files changed, 138 insertions, 0 deletions
diff --git a/docs/dev/AudioInputDebug.md b/docs/dev/AudioInputDebug.md new file mode 100644 index 000000000..ba51f3023 --- /dev/null +++ b/docs/dev/AudioInputDebug.md @@ -0,0 +1,138 @@ +# Debugging AudioInput + +Mumble does quite a bit of signal processing on the raw microphone input, so if something breaks it may not be immediately apparent _where_ it breaks. + +For this reason, the `--dump-input-streams` option was added, to help tap into various parts of the DSP chain, and find where the issue is. Consider +it a bit like the digital equivalent of probing with an oscilloscope the signal path of an analog audio gear. + +As the option was introduced to debug the echo canceller, the default tap points are at the input and output of that algorithm, but if you are going +to debug some C++ code, you should not have problems moving a `write()` to an `ofstream` here and there should you need to, right? + +## How to use `--dump-input-streams` + +You'll need to run Mumble from the command line, and the directory from where you run it will be where the dumped files will be written. + +``` +$ ./mumble --dump-input-streams +``` + +Then log into a server as usual, and start using Mumble. It's usually good enough to just run it for 10/20 seconds and then quit. Unless your bug +happens only after some time or occurs at random, there's no need to accumulate gigabytes of dumped audio. It's also best to make reproducible tests, +like playing the same video or speaking the same phrase, so as to compare results. + +After closing Mumble, there should be 3 new files in the directory you launched it from: + +* `raw_microphone_dump` +* `speaker_dump` +* `processed_microphone_dump` + +Please note that if you run Mumble again, those files will be overwritten. Also, those files are overwritten whenever the `AudioInput` class is +reinstantiated, such as when going though the audio wizard. If you find it difficult to get the data you want, such as because closing the audio +wizard clears your files, terminate Mumble with Ctrl-C at any moment and the files won't be erased. + +### Opening the files + +These files contain the raw PCM streams that have been sampled. No header, no file format; nothing. Just data. +This makes the dumping code as simple as possible, and you also don't have to change the header every time you tap a point with a different sample +rate or encoding, as there's no header. + +To open the raw files, you can use Audacity. Select `File > Import > Raw Data`. + +Since there's no metadata, Audacity will ask you what's in those files: + +* Encoding is `Signed 16 bit PCM` in the default tap point (i.e. you haven't modified `write()`). Mumble's signal path is partly 16 bit and partly + float, so remember to select `32 bit float` if you move the tap points to some float part of the Mumble audio path. +* Byte order is `Little-endian` if you're on an x86 CPU, which you most likely are. +* Channels is always `1` for the microphone signal path, but may be more for the speaker readback if you use multichannel echo cancellation. +* Sample rate is `48000` for the default tap point, as Mumble's audio chain resamples everything to 48KHz regardless of what your audio card is + configured to. Change accordingly when tapping before the resampler. + +In Audacity you can open multiple tracks and mute them individually, so it's usually a good idea to open all three tracks to compare. + +## Debugging the echo canceller + +The audio dumps have an additional property that is fundamental for debugging the echo canceller: the're synchronous. If you open them all in +Audacity, you'll be able not only to see what gets passed to the echo canceller, but the relative time between the signals. + +This is fundamental for an echo canceller, which can break simply because the microphone data arrives before the speaker one (how can the echo +canceller predict an echo from the future?), or if the speaker data is so ahead that exceeds its limited filter length. + +### The `--print-echocancel-queue` option + +Now that I've mentioned the requirement for the echo canceller to have well aligned inputs, maybe it's best to introduce the +`--print-echocancel-queue` option. When running Mumble with this option, the current state of the queue in the Resynchronizer class is used to align +the microphone and speaker readback streams is printed on the command line. Moreover, if packets are dropped (which is necessary to keep the signals +aligned if the OS/pulseaudio/audio card is playing tricks to us), those will be printed as well. + +### The Resynchronizer class + +Documentation on the Resynchronizer class is put as a comment in the `AudioInput.h` file, but it doesn't hurt to repeat it here, also because the +statemachine design doesn't fit in a C++ comment as it's an image. + +According to https://www.speex.org/docs/manual/speex-manual/node7.html +"It is important that, at any time, any echo that is present in the input +has already been sent to the echo canceller as echo_frame." +Thus, we artificially introduce a small lag in the microphone by means of +a queue, so as to be sure the speaker data always precedes the microphone. + +There are conflicting requirements for the queue: + +* it has to be small enough not to cause a noticeable lag in the voice +* it has to be large enough not to force us to drop packets frequently + when the addMic() and addEcho() callbacks are called in a jittery way +* its fill level must be controlled so it does not operate towards zero + elements size, as this would not provide the lag required for the + echo canceller to work properly. + +The current implementation uses a 5 elements queue, with a control +statemachine that introduces packet drops to control the fill level +to at least 2 (plus or minus one) and less than 4 elements. +With a 10ms chunk, this queue should introduce a ~20ms lag to the voice. + +![](../media/images/AudioInputDebugging_Fsm.png) + +Here _m_ means a microphone chunk was received, _s_ a speaker chunk was received, and the number in the state is the queue fill level. The design +tries to keep the limit cycle of the queue add/remove pattern between 1 and 4 elements, preventing the queue to operate in a limit cycle between 0 and +1 elements (queue too empty, the speaker data may risk arriving after the microphone) and in a limit cycle between 4 and 5 elements (too full, we're +wasting some precious filter length to cancel real echo just because some delay accumulated). + +### A reproducible test for verifying the correct operation of the echo canceller + +To avoid regressions being introduced in the echo cancellation feature, it is beneficial to have a controlled test that can be easily reproduced to +test whether the echo canceller works. + +You will need: + +* Low quality headphones that cause echo. The in-ear type that's used with smartphones and has a combined microphone/headphones jack works best. If + you don't have them or your PC lacks a combined microphone/headphones jack, do the test with your speakers, but keep the volume relatively quiet. + Some echo is unavoidable at high volume levels, especially if it makes the microphone clip. +* A quiet Mumble server to connect to. Just join an empty room with no other users. + +Here's the step by step guide: + +1. Make sure Mumble echo cancellation is enabled. You may also need to repeat this test twice with mixed and multichannel echo cancellation. +1. Run Mumble with the `--dump-input-streams` option +2. Join the quiet server +3. Play the first 15 or so seconds of a YouTube video that contains both a relatively periodic note and voice, such as this one: + https://www.youtube.com/watch?v=im9z8NT96Iw +4. Say a phrase, such as "Testing 1 2 3" +5. Close Mumble +6. Open the three dumped streams in Audacity. Don't forget to select the correct number of channels for the `speaker_dump` when testing multichannel + echo cancellation +7. Play the raw microphone stream, you should hear the echo of the YouTube video clearly above the noise, and it should be less loud than you saying + "Testing 1 2 3". If you hear no echo, increase you headphones volume, switch to worse headphones or use your speakers and repeat. If the echo is as + loud as or louder than you speaking, reduce your audio volume and repeat. +8. Now listen to the processed microphone stream: the echo should be almost gone, both the note and the voice coming from YouTube, while your voice + should remain. It is acceptable that after a silence gap, the first part of the echo can reappear, but it should quickly be cancelled. If not, + there's a bug in Mumble. +9. Play the speaker dump. It should sound as well as the YouTube video itself. If not, there's a bug in Mumble. +10. As a final check, take a transition from silence to noise as a reference and zoom in in Audacity: the speaker dump should precede the microphone + dump by 20ms or so (0 to 50ms is acceptable). If not, there's a bug in Mumble. + +Example of an echo canceller bug: the speaker data lags compared to the microphone one. As a result, only the note is cancelled, but voice is not. + +![](../media/images/AudioInputDebugging_Bug.png) + +Exampe of the a working echo canceller. + +![](../media/images/AudioInputDebugging_Fix.png) |