deepgeek/silero-vad

Fork 0

mirror of https://github.com/snakers4/silero-vad.git synced 2026-02-04 17:39:22 +08:00

Go to file

snakers41 9796112e73 Fx

2020-12-15 12:46:21 +00:00

.github/ISSUE_TEMPLATE

Mv folder

2020-11-23 10:31:16 +00:00

files

delete onnx from utils

2020-12-15 12:00:37 +00:00

CODE_OF_CONDUCT.md

Mv folder

2020-11-23 10:31:16 +00:00

hubconf.py

Polish hubconf file

2020-12-15 12:42:56 +00:00

LICENSE

Add License

2020-11-23 10:40:52 +00:00

README.md

Merge branch 'master' of github.com:snakers4/silero-vad into master

2020-11-23 10:43:32 +00:00

silero-vad.ipynb

delete onnx from utils

2020-12-15 12:00:37 +00:00

utils.py

2020-12-15 12:46:21 +00:00

README.md

Silero VAD
History

Silero VAD

Single Image Why our VAD is better than WebRTC

Silero VAD: pre-trained enterprise-grade Voice Activity and Number Detector. Enterprise-grade Speech Products made refreshingly simple (all see our STT).

Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector (link).

Also in enterprise it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. Name recognition is highly subjective and would depend on location, but Voice Activity and Number detections are quite general tasks.

Key advantages:

Modern, portable;
Small memory footprint (?);
Trained on huge spoken corpora and noise / sound libraries;
Slower than WebRTC, but sufficiently fast for IOT / edge / mobile applications;
Superior metrics to WebRTC;

Typical use cases:

Spoken corpora anonymization;
Voice detection for IOT / edge / mobile use cases;
Data cleaning and preparation, number and voice detection in general;

Key features / differences:

Getting Started

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following checkpoints:

	PyTorch	ONNX	Quantization	Languages	Colab
VAD v1 (vad_v1)	✔️	✔️	✔️	`ru`, `en`, `de`, `es`

PyTorch

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav') 
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

ONNX

You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual onnx inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
print(decoded)

README.md

Silero VAD

Getting Started

PyTorch

ONNX

Metrics

Performance Metrics

Quality Metrics

Contact

Get in Touch

Commercial Inquiries

History