From c44d85e8b241295463e690991ba0752b48345468 Mon Sep 17 00:00:00 2001 From: Dimitrii Voronin <36505480+adamnsandle@users.noreply.github.com> Date: Wed, 20 Jan 2021 15:18:10 +0200 Subject: [PATCH] Update README.md --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index e9f3c08..d216ea8 100644 --- a/README.md +++ b/README.md @@ -24,8 +24,7 @@ # Silero VAD - -![image](https://user-images.githubusercontent.com/36505480/102872739-ce099280-4448-11eb-967b-724440165eb5.png) +![image](https://user-images.githubusercontent.com/36505480/105179755-5eafbd00-5b32-11eb-963d-1eb7461144fb.png) **Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier.** Enterprise-grade Speech Products made refreshingly simple (see our [STT](https://github.com/snakers4/silero-models) models). @@ -309,7 +308,9 @@ Since our VAD (only VAD, other networks are more flexible) was trained on chunks [Webrtc](https://github.com/wiseman/py-webrtcvad) splits audio into frames, each frame has corresponding number (0 **or** 1). We use 30ms frames for webrtc, so each 250 ms chunk is split into 8 frames, their **mean** value is used as a treshold for plot. -![image](https://user-images.githubusercontent.com/36505480/102872739-ce099280-4448-11eb-967b-724440165eb5.png) +[Auditok](https://github.com/amsehili/auditok) - logic same as Webrtc, but we use 50ms frames. + +![image](https://user-images.githubusercontent.com/36505480/105179755-5eafbd00-5b32-11eb-963d-1eb7461144fb.png) ## FAQ