mirror of
https://github.com/snakers4/silero-vad.git
synced 2026-02-05 01:49:22 +08:00
Merge branch 'master' of github.com:snakers4/silero-vad into adamnsandle
This commit is contained in:
14
README.md
14
README.md
@@ -29,13 +29,13 @@ https://user-images.githubusercontent.com/36505480/144874384-95f80f6d-a4f1-42cc-
|
||||
<h2 align="center">Key Features</h2>
|
||||
<br/>
|
||||
|
||||
- **High accuracy**
|
||||
- **Stellar accuracy**
|
||||
|
||||
Silero VAD has [excellent results](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics#vs-other-available-solutions) on speech detection tasks.
|
||||
|
||||
- **Fast**
|
||||
|
||||
One audio chunk (30+ ms) [takes](https://github.com/snakers4/silero-vad/wiki/Performance-Metrics#silero-vad-performance-metrics) around **1ms** to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably.
|
||||
One audio chunk (30+ ms) [takes](https://github.com/snakers4/silero-vad/wiki/Performance-Metrics#silero-vad-performance-metrics) around **1ms** to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably. Under certain conditions ONNX may even run up to 2-3x faster.
|
||||
|
||||
- **Lightweight**
|
||||
|
||||
@@ -47,12 +47,20 @@ https://user-images.githubusercontent.com/36505480/144874384-95f80f6d-a4f1-42cc-
|
||||
|
||||
- **Flexible sampling rate**
|
||||
|
||||
Silero VAD [supports](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics#sample-rate-comparison) **8000 Hz** and **16000 Hz** [sampling rates](https://en.wikipedia.org/wiki/Sampling_(signal_processing)#Sampling_rate).
|
||||
Silero VAD [supports](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics#sample-rate-comparison) **8000 Hz** and **16000 Hz** (PyTorch JIT) and **16000 Hz** (ONNX) [sampling rates](https://en.wikipedia.org/wiki/Sampling_(signal_processing)#Sampling_rate).
|
||||
|
||||
- **Flexible chunk size**
|
||||
|
||||
Model was trained on audio chunks of different lengths. **30 ms**, **60 ms** and **100 ms** long chunks are supported directly, others may work as well.
|
||||
|
||||
- **Highly Portable**
|
||||
|
||||
Silero VAD reaps benefits from the rich ecosystems built around **PyTorch** and **ONNX** running everywhere where these runtimes are available.
|
||||
|
||||
- **No Strings Attached**
|
||||
|
||||
Published under permissive license (MIT) Silero VAD has zero strings attached - no telemetry, no keys, no registration, no built-in expiration, no keys or vendor lock.
|
||||
|
||||
<br/>
|
||||
<h2 align="center">Typical Use Cases</h2>
|
||||
<br/>
|
||||
|
||||
@@ -191,7 +191,7 @@ def get_speech_timestamps(audio: torch.Tensor,
|
||||
step = 1
|
||||
|
||||
if sampling_rate == 8000 and window_size_samples > 768:
|
||||
warnings.warn('window_size_samples is too big for 8000 sampling_rate! Better set window_size_samples to 256, 512 or 1536 for 8000 sample rate!')
|
||||
warnings.warn('window_size_samples is too big for 8000 sampling_rate! Better set window_size_samples to 256, 512 or 768 for 8000 sample rate!')
|
||||
if window_size_samples not in [256, 512, 768, 1024, 1536]:
|
||||
warnings.warn('Unusual window_size_samples! Supported window_size_samples:\n - [512, 1024, 1536] for 16000 sampling_rate\n - [256, 512, 768] for 8000 sampling_rate')
|
||||
|
||||
|
||||
Reference in New Issue
Block a user