From 8996d5e519f7d4e0c936091a010bb332dc41bdab Mon Sep 17 00:00:00 2001
From: snakers41 <aveysov@gmail.com>
Date: Tue, 15 Dec 2020 14:51:16 +0000
Subject: [PATCH 1/4] Update readme, add skeleton for FAQ

---
 README.md | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index f073463..392c277 100644
--- a/README.md
+++ b/README.md
@@ -13,6 +13,11 @@
   - [Metrics](#metrics)
     - [Performance Metrics](#performance-metrics)
     - [Quality Metrics](#quality-metrics)
+  - [FAQ](#faq)
+    - [How VAD Works](#how-vad-works)
+    - [VAD Quality Metrics Methodology](#vad-quality-metrics-methodology)
+    - [How Number Detector Works](#how-number-detector-works)
+    - [How Language Classifier Works](#how-language-classifier-works)
   - [Contact](#contact)
     - [Get in Touch](#get-in-touch)
     - [Commercial Inquiries](#commercial-inquiries)
@@ -57,7 +62,7 @@ Currently we provide the following functionality:
 
 | Version | Date        | Comment                                           |
 |---------|-------------|---------------------------------------------------|
-| `v1`    | 2020-12-15  | initial release                                   |
+| `v1`    | 2020-12-15  | Initial release                                   |
 | `v2`    | coming soon | Add Number Detector or Language Classifier heads  |
 
 ### PyTorch
@@ -90,6 +95,24 @@ Speed metrics here.
 
 Quality metrics here.
 
+## FAQ
+
+### How VAD Works
+
+Bla-bla, 300ms, 15ms latency on 1 thread, see examples (naive, streaming).
+
+### VAD Quality Metrics Methodology
+
+TBD
+
+### How Number Detector Works
+
+TBD
+
+### How Language Classifier Works
+
+TBD
+
 ## Contact
 
 ### Get in Touch

From dd2d7ff70e03f79f72016afb37fcbd13d9338d0f Mon Sep 17 00:00:00 2001
From: Dimitrii Voronin <36505480+adamnsandle@users.noreply.github.com>
Date: Tue, 15 Dec 2020 17:12:15 +0200
Subject: [PATCH 2/4] Update README.md

---
 README.md | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 392c277..af50d00 100644
--- a/README.md
+++ b/README.md
@@ -93,7 +93,13 @@ Speed metrics here.
 
 ### Quality Metrics
 
-Quality metrics here.
+We use random 0.25 second audio chunks to validate on. Speech to Non-speech ratio among chunks ~50/50, speech chunks are carved from real audios in four different languages (English, Russian, Spanish, German), then random random background noise is applied to some of them. 
+
+Since our models were trained on chunks of the same length, model's output is just one float number from 0 to 1 - **speech probability**. We use speech probabilities as tresholds for precision-recall curve.
+
+Webrtc splits audio into frames, each frame has corresponding number (0 **or** 1). We use 30ms frames for webrtc predicts, so each 0.25 second chunk is splitted into 8 frames, their **mean** value is used as a treshold for plot.
+
+![image](https://user-images.githubusercontent.com/36505480/102233150-9f476580-3ef8-11eb-87fb-ae6f1edfe10f.png)
 
 ## FAQ
 

From 97fc53a8395e70cb9a591f5831c51a3ccba6d341 Mon Sep 17 00:00:00 2001
From: Dimitrii Voronin <36505480+adamnsandle@users.noreply.github.com>
Date: Tue, 15 Dec 2020 17:13:37 +0200
Subject: [PATCH 3/4] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index af50d00..02474d9 100644
--- a/README.md
+++ b/README.md
@@ -97,7 +97,7 @@ We use random 0.25 second audio chunks to validate on. Speech to Non-speech rati
 
 Since our models were trained on chunks of the same length, model's output is just one float number from 0 to 1 - **speech probability**. We use speech probabilities as tresholds for precision-recall curve.
 
-Webrtc splits audio into frames, each frame has corresponding number (0 **or** 1). We use 30ms frames for webrtc predicts, so each 0.25 second chunk is splitted into 8 frames, their **mean** value is used as a treshold for plot.
+[Webrtc](https://github.com/wiseman/py-webrtcvad) splits audio into frames, each frame has corresponding number (0 **or** 1). We use 30ms frames for webrtc predicts, so each 0.25 second chunk is splitted into 8 frames, their **mean** value is used as a treshold for plot.
 
 ![image](https://user-images.githubusercontent.com/36505480/102233150-9f476580-3ef8-11eb-87fb-ae6f1edfe10f.png)
 

From 14d7cbc3b114e486a74d680fd41ed5b17b438235 Mon Sep 17 00:00:00 2001
From: Dimitrii Voronin <36505480+adamnsandle@users.noreply.github.com>
Date: Tue, 15 Dec 2020 17:28:00 +0200
Subject: [PATCH 4/4] Update README.md

---
 README.md | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 50 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 02474d9..9eade58 100644
--- a/README.md
+++ b/README.md
@@ -70,21 +70,66 @@ Currently we provide the following functionality:
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb)
 
 [![Open on Torch Hub](https://img.shields.io/badge/Torch-Hub-red?logo=pytorch&style=for-the-badge)](https://pytorch.org/hub/snakers4_silero-vad/) (coming soon)
-
 ```python
-TBD
-```
+import torch
+torch.set_num_threads(1)
+from pprint import pprint
 
+model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
+                              model='silero_vad',
+                              force_reload=True)
+
+(get_speech_ts,
+ _, read_audio,
+ _, _, _) = utils
+
+files_dir = torch.hub.get_dir() + '/snakers4_silero-vad_master/files'
+
+wav = read_audio(f'{files_dir}/en.wav')
+# full audio
+# get speech timestamps from full audio file
+speech_timestamps = get_speech_ts(wav, model,
+                                  num_steps=4)
+pprint(speech_timestamps)
+```
 ### ONNX
 
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb)
 
 You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.
-
 ```python
-TBD
+import onnxruntime
+from pprint import pprint
+
+_, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
+                              model='silero_vad',
+                              force_reload=True)
+
+(get_speech_ts,
+ _, read_audio,
+ _, _, _) = utils
+
+files_dir = torch.hub.get_dir() + '/snakers4_silero-vad_master/files'
+
+def init_onnx_model(model_path: str):
+    return onnxruntime.InferenceSession(model_path)
+
+def validate_onnx(model, inputs):
+    with torch.no_grad():
+        ort_inputs = {'input': inputs.cpu().numpy()}
+        outs = model.run(None, ort_inputs)
+        outs = [torch.Tensor(x) for x in outs]
+    return outs
+    
+model = init_onnx_model(f'{files_dir}/model.onnx')
+wav = read_audio(f'{files_dir}/en.wav')
+
+# get speech timestamps from full audio file
+speech_timestamps = get_speech_ts(wav, model, num_steps=4, run_function=validate_onnx) 
+pprint(speech_timestamps)
 ```
 
+
 ## Metrics
 
 ### Performance Metrics