Update README.md

This commit is contained in:
Alexander Veysov
2020-12-24 13:58:41 +03:00
committed by GitHub
parent b1b2c2d4f8
commit 6d3f7f282b

View File

@@ -203,12 +203,12 @@ Since our VAD (only VAD, other networks are more flexible) was trained on chunks
### How VAD Works
- Audio is split into 250 ms chunks;
- Audio is split into 250 ms chunks (you can choose any chunk size, but quality with chunks shorter than 100ms will suffer and there will be more false positives and "unnatural" pauses);
- VAD keeps record of a previous chunk (or zeros at the beginning of the stream);
- Then this 500 ms audio (250 ms + 250 ms) is split into N (typically 4 or 8) windows and the model is applied to this window batch. Each window is 250 ms long (naturally, windows overlap);
- Then probability is averaged across these windows;
- Though typically pauses in speech are 300 ms+ or longer (pauses less than 200-300ms are typically not meaninful), it is hard to confidently classify speech vs noise / music on very short chunks (i.e. 30 - 50ms);
- We are working on lifting this limitation, so that you can use 100 - 125ms windows;
- ~~We are working on lifting this limitation, so that you can use 100 - 125ms windows~~;
### VAD Quality Metrics Methodology