diff --git a/README.md b/README.md index 0ae9751..a0418c8 100644 --- a/README.md +++ b/README.md @@ -29,13 +29,13 @@ https://user-images.githubusercontent.com/36505480/144874384-95f80f6d-a4f1-42cc-

Key Features


-- **High accuracy** +- **Stellar accuracy** Silero VAD has [excellent results](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics#vs-other-available-solutions) on speech detection tasks. - **Fast** - One audio chunk (30+ ms) [takes](https://github.com/snakers4/silero-vad/wiki/Performance-Metrics#silero-vad-performance-metrics) around **1ms** to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably. + One audio chunk (30+ ms) [takes](https://github.com/snakers4/silero-vad/wiki/Performance-Metrics#silero-vad-performance-metrics) around **1ms** to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably. Under certain conditions ONNX may even run up to 2-3x faster. - **Lightweight** @@ -47,12 +47,20 @@ https://user-images.githubusercontent.com/36505480/144874384-95f80f6d-a4f1-42cc- - **Flexible sampling rate** - Silero VAD [supports](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics#sample-rate-comparison) **8000 Hz** and **16000 Hz** [sampling rates](https://en.wikipedia.org/wiki/Sampling_(signal_processing)#Sampling_rate). + Silero VAD [supports](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics#sample-rate-comparison) **8000 Hz** and **16000 Hz** (PyTorch JIT) and **16000 Hz** (ONNX) [sampling rates](https://en.wikipedia.org/wiki/Sampling_(signal_processing)#Sampling_rate). - **Flexible chunk size** Model was trained on audio chunks of different lengths. **30 ms**, **60 ms** and **100 ms** long chunks are supported directly, others may work as well. +- **Highly Portable** + + Silero VAD reaps benefits from the rich ecosystems built around **PyTorch** and **ONNX** running everywhere where these runtimes are available. + +- **No Strings Attached** + + Published under permissive license (MIT) Silero VAD has zero strings attached - no telemetry, no keys, no registration, no built-in expiration, no keys or vendor lock. +

Typical Use Cases


diff --git a/utils_vad.py b/utils_vad.py index fca2d82..eccf618 100644 --- a/utils_vad.py +++ b/utils_vad.py @@ -191,7 +191,7 @@ def get_speech_timestamps(audio: torch.Tensor, step = 1 if sampling_rate == 8000 and window_size_samples > 768: - warnings.warn('window_size_samples is too big for 8000 sampling_rate! Better set window_size_samples to 256, 512 or 1536 for 8000 sample rate!') + warnings.warn('window_size_samples is too big for 8000 sampling_rate! Better set window_size_samples to 256, 512 or 768 for 8000 sample rate!') if window_size_samples not in [256, 512, 768, 1024, 1536]: warnings.warn('Unusual window_size_samples! Supported window_size_samples:\n - [512, 1024, 1536] for 16000 sampling_rate\n - [256, 512, 768] for 8000 sampling_rate')