mirror of
https://github.com/snakers4/silero-vad.git
synced 2026-02-05 18:09:22 +08:00
git pushMerge branch 'master' of github.com:snakers4/silero-vad
This commit is contained in:
102
README.md
102
README.md
@@ -1,6 +1,6 @@
|
|||||||
[](mailto:hello@silero.ai) [](https://t.me/joinchat/Bv9tjhpdXTI22OUgpOIIDg) [](https://github.com/snakers4/silero-models/blob/master/LICENSE)
|
[](mailto:hello@silero.ai) [](https://t.me/joinchat/Bv9tjhpdXTI22OUgpOIIDg) [](https://github.com/snakers4/silero-vad/blob/master/LICENSE)
|
||||||
|
|
||||||
[](https://pytorch.org/hub/snakers4_silero-vad/)
|
[](https://pytorch.org/hub/snakers4_silero-vad/) (coming soon)
|
||||||
|
|
||||||
[](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb)
|
[](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb)
|
||||||
|
|
||||||
@@ -16,24 +16,23 @@
|
|||||||
- [Contact](#contact)
|
- [Contact](#contact)
|
||||||
- [Get in Touch](#get-in-touch)
|
- [Get in Touch](#get-in-touch)
|
||||||
- [Commercial Inquiries](#commercial-inquiries)
|
- [Commercial Inquiries](#commercial-inquiries)
|
||||||
- [History](#history)
|
|
||||||
|
|
||||||
|
|
||||||
# Silero VAD
|
# Silero VAD
|
||||||
|
|
||||||
`Single Image Why our VAD is better than WebRTC`
|
`Single Image Why our VAD is better than WebRTC`
|
||||||
|
|
||||||
Silero VAD: pre-trained enterprise-grade Voice Activity and Number Detector.
|
Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier.
|
||||||
Enterprise-grade Speech Products made refreshingly simple (all see our [STT](https://github.com/snakers4/silero-models)).
|
Enterprise-grade Speech Products made refreshingly simple (all see our [STT](https://github.com/snakers4/silero-models)).
|
||||||
|
|
||||||
Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector ([link](https://github.com/wiseman/py-webrtcvad)).
|
Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector ([link](https://github.com/wiseman/py-webrtcvad)).
|
||||||
|
|
||||||
Also in enterprise it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. Name recognition is highly subjective and would depend on location, but Voice Activity and Number detections are quite general tasks.
|
Also in enterprise it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. Name recognition is highly subjective and would depend on locale and business case, but Voice Activity and Number detections are quite general tasks.
|
||||||
|
|
||||||
**Key advantages:**
|
**Key advantages / features:**
|
||||||
|
|
||||||
- Modern, portable;
|
- Modern, portable;
|
||||||
- Small memory footprint (?);
|
- Small memory footprint;
|
||||||
- Trained on huge spoken corpora and noise / sound libraries;
|
- Trained on huge spoken corpora and noise / sound libraries;
|
||||||
- Slower than WebRTC, but sufficiently fast for IOT / edge / mobile applications;
|
- Slower than WebRTC, but sufficiently fast for IOT / edge / mobile applications;
|
||||||
- Superior metrics to WebRTC;
|
- Superior metrics to WebRTC;
|
||||||
@@ -44,53 +43,28 @@ Also in enterprise it is crucial to be able to anonymize large-scale spoken corp
|
|||||||
- Voice detection for IOT / edge / mobile use cases;
|
- Voice detection for IOT / edge / mobile use cases;
|
||||||
- Data cleaning and preparation, number and voice detection in general;
|
- Data cleaning and preparation, number and voice detection in general;
|
||||||
|
|
||||||
|
|
||||||
Key features / differences:
|
|
||||||
|
|
||||||
## Getting Started
|
## Getting Started
|
||||||
|
|
||||||
All of the provided models are listed in the [models.yml](https://github.com/snakers4/silero-models/blob/master/models.yml) file.
|
The models are small enough to be included directly into this repository. Newer models will supersede older models directly.
|
||||||
Any meta-data and newer versions will be added there.
|
|
||||||
|
|
||||||
Currently we provide the following checkpoints:
|
Currently we provide the following models:
|
||||||
|
|
||||||
| | PyTorch | ONNX | Quantization | Languages | Colab |
|
| | Released |PyTorch | ONNX | VAD | Number Detector | Language Classifier | Languages | Colab |
|
||||||
|-----------------|--------------------|--------------------|--------------|---------|-------|
|
|----|------------|-------------------|--------------------|---------------------| --------------------|---------------------|-------------------------|-------|
|
||||||
| VAD v1 (vad_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | `ru`, `en`, `de`, `es` |
|
| v1 | 2020-12-15 |:heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | | | `ru`, `en`, `de`, `es` | [](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb) |
|
||||||
[](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb) |
|
|
||||||
|
|
||||||
|
Version history:
|
||||||
|
|
||||||
|
- v1, 2020-12-15, initial release, no Number Detector or Language Classifier heads yet;
|
||||||
|
|
||||||
### PyTorch
|
### PyTorch
|
||||||
|
|
||||||
[](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb)
|
[](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb)
|
||||||
|
|
||||||
[](https://pytorch.org/hub/snakers4_silero-vad/)
|
[](https://pytorch.org/hub/snakers4_silero-vad/) (coming soon)
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import torch
|
TBD
|
||||||
import zipfile
|
|
||||||
import torchaudio
|
|
||||||
from glob import glob
|
|
||||||
|
|
||||||
device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU
|
|
||||||
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
|
|
||||||
model='silero_stt',
|
|
||||||
language='en', # also available 'de', 'es'
|
|
||||||
device=device)
|
|
||||||
(read_batch, split_into_batches,
|
|
||||||
read_audio, prepare_model_input) = utils # see function signature for details
|
|
||||||
|
|
||||||
# download a single file, any format compatible with TorchAudio (soundfile backend)
|
|
||||||
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
|
|
||||||
dst ='speech_orig.wav', progress=True)
|
|
||||||
test_files = glob('speech_orig.wav')
|
|
||||||
batches = split_into_batches(test_files, batch_size=10)
|
|
||||||
input = prepare_model_input(read_batch(batches[0]),
|
|
||||||
device=device)
|
|
||||||
|
|
||||||
output = model(input)
|
|
||||||
for example in output:
|
|
||||||
print(decoder(example.cpu()))
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### ONNX
|
### ONNX
|
||||||
@@ -100,42 +74,7 @@ for example in output:
|
|||||||
You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.
|
You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import onnx
|
TBD
|
||||||
import torch
|
|
||||||
import onnxruntime
|
|
||||||
from omegaconf import OmegaConf
|
|
||||||
|
|
||||||
language = 'en' # also available 'de', 'es'
|
|
||||||
|
|
||||||
# load provided utils
|
|
||||||
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
|
|
||||||
(read_batch, split_into_batches,
|
|
||||||
read_audio, prepare_model_input) = utils
|
|
||||||
|
|
||||||
# see available models
|
|
||||||
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
|
|
||||||
models = OmegaConf.load('models.yml')
|
|
||||||
available_languages = list(models.stt_models.keys())
|
|
||||||
assert language in available_languages
|
|
||||||
|
|
||||||
# load the actual ONNX model
|
|
||||||
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
|
|
||||||
onnx_model = onnx.load('model.onnx')
|
|
||||||
onnx.checker.check_model(onnx_model)
|
|
||||||
ort_session = onnxruntime.InferenceSession('model.onnx')
|
|
||||||
|
|
||||||
# download a single file, any format compatible with TorchAudio (soundfile backend)
|
|
||||||
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
|
|
||||||
test_files = ['speech_orig.wav']
|
|
||||||
batches = split_into_batches(test_files, batch_size=10)
|
|
||||||
input = prepare_model_input(read_batch(batches[0]))
|
|
||||||
|
|
||||||
# actual onnx inference and decoding
|
|
||||||
onnx_input = input.detach().cpu().numpy()
|
|
||||||
ort_inputs = {'input': onnx_input}
|
|
||||||
ort_outs = ort_session.run(None, ort_inputs)
|
|
||||||
decoded = decoder(torch.Tensor(ort_outs[0])[0])
|
|
||||||
print(decoded)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Metrics
|
## Metrics
|
||||||
@@ -152,11 +91,8 @@ Quality metrics here.
|
|||||||
|
|
||||||
### Get in Touch
|
### Get in Touch
|
||||||
|
|
||||||
Try our models, create an [issue](https://github.com/snakers4/silero-models/issues/new), join our [chat](https://t.me/joinchat/Bv9tjhpdXTI22OUgpOIIDg), [email](mailto:hello@silero.ai) us.
|
Try our models, create an [issue](https://github.com/snakers4/silero-vad/issues/new), start a [discussion](https://github.com/snakers4/silero-vad/discussions/new), join our telegram [chat](https://t.me/joinchat/Bv9tjhpdXTI22OUgpOIIDg), [email](mailto:hello@silero.ai) us.
|
||||||
|
|
||||||
### Commercial Inquiries
|
### Commercial Inquiries
|
||||||
|
|
||||||
Please see our [wiki](https://github.com/snakers4/silero-models/wiki) and [tiers](https://github.com/snakers4/silero-models/wiki/Licensing-and-Tiers) for relevant information and [email](mailto:hello@silero.ai) us.
|
Please see our [wiki](https://github.com/snakers4/silero-models/wiki) and [tiers](https://github.com/snakers4/silero-models/wiki/Licensing-and-Tiers) for relevant information and [email](mailto:hello@silero.ai) us directly.
|
||||||
|
|
||||||
# History
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user