Merge branch 'master' of github.com:snakers4/silero-vad into adamnsandle

2026-02-05 18:09:22 +08:00 · 2021-12-21 11:01:25 +00:00
parent 45d72863b6 f40cc128a4
commit 2392ea33f4
2 changed files with 12 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -29,13 +29,13 @@ https://user-images.githubusercontent.com/36505480/144874384-95f80f6d-a4f1-42cc-
 <h2 align="center">Key Features</h2>
 <br/>

- **High accuracy**
+- **Stellar accuracy**

  Silero VAD has [excellent results](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics#vs-other-available-solutions) on speech detection tasks.
  
 - **Fast**

-  One audio chunk (30+ ms) [takes](https://github.com/snakers4/silero-vad/wiki/Performance-Metrics#silero-vad-performance-metrics) around **1ms** to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably.
+  One audio chunk (30+ ms) [takes](https://github.com/snakers4/silero-vad/wiki/Performance-Metrics#silero-vad-performance-metrics) around **1ms** to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably. Under certain conditions ONNX may even run up to 2-3x faster. 

 - **Lightweight**

@@ -47,12 +47,20 @@ https://user-images.githubusercontent.com/36505480/144874384-95f80f6d-a4f1-42cc-

 - **Flexible sampling rate**

-  Silero VAD [supports](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics#sample-rate-comparison)  **8000 Hz** and **16000 Hz** [sampling rates](https://en.wikipedia.org/wiki/Sampling_(signal_processing)#Sampling_rate).
+  Silero VAD [supports](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics#sample-rate-comparison)  **8000 Hz** and **16000 Hz** (PyTorch JIT) and **16000 Hz** (ONNX) [sampling rates](https://en.wikipedia.org/wiki/Sampling_(signal_processing)#Sampling_rate).

 - **Flexible chunk size**

  Model was trained on audio chunks of different lengths. **30 ms**, **60 ms** and **100 ms** long chunks are supported directly, others may work as well.

+- **Highly Portable**
+
+  Silero VAD reaps benefits from the rich ecosystems built around **PyTorch** and **ONNX** running everywhere where these runtimes are available.
+
+- **No Strings Attached**
+
+   Published under permissive license (MIT) Silero VAD has zero strings attached - no telemetry, no keys, no registration, no built-in expiration, no keys or vendor lock.
+
 <br/>
 <h2 align="center">Typical Use Cases</h2>
 <br/>
--- a/utils_vad.py
+++ b/utils_vad.py
@@ -191,7 +191,7 @@ def get_speech_timestamps(audio: torch.Tensor,
        step = 1

    if sampling_rate == 8000 and window_size_samples > 768:
-        warnings.warn('window_size_samples is too big for 8000 sampling_rate! Better set window_size_samples to 256, 512 or 1536 for 8000 sample rate!')
+        warnings.warn('window_size_samples is too big for 8000 sampling_rate! Better set window_size_samples to 256, 512 or 768 for 8000 sample rate!')
    if window_size_samples not in [256, 512, 768, 1024, 1536]:
        warnings.warn('Unusual window_size_samples! Supported window_size_samples:\n - [512, 1024, 1536] for 16000 sampling_rate\n - [256, 512, 768] for 8000 sampling_rate')