snakers41 9796112e73 Fx
2020-12-15 12:46:21 +00:00
2020-11-23 10:31:16 +00:00
2020-12-15 12:00:37 +00:00
2020-11-23 10:31:16 +00:00
2020-12-15 12:42:56 +00:00
2020-11-23 10:40:52 +00:00
2020-12-15 12:00:37 +00:00
Fx
2020-12-15 12:46:21 +00:00

Mailing list : test Mailing list : test License: CC BY-NC 4.0

Open on Torch Hub Open on TF Hub

Open In Colab

header)

Silero VAD

Single Image Why our VAD is better than WebRTC

Silero VAD: pre-trained enterprise-grade Voice Activity and Number Detector. Enterprise-grade Speech Products made refreshingly simple (all see our STT).

Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector (link).

Also in enterprise it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. Name recognition is highly subjective and would depend on location, but Voice Activity and Number detections are quite general tasks.

Key advantages:

  • Modern, portable;
  • Small memory footprint (?);
  • Trained on huge spoken corpora and noise / sound libraries;
  • Slower than WebRTC, but sufficiently fast for IOT / edge / mobile applications;
  • Superior metrics to WebRTC;

Typical use cases:

  • Spoken corpora anonymization;
  • Voice detection for IOT / edge / mobile use cases;
  • Data cleaning and preparation, number and voice detection in general;

Key features / differences:

Getting Started

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following checkpoints:

PyTorch ONNX Quantization Languages Colab
VAD v1 (vad_v1) ✔️ ✔️ ✔️ ru, en, de, es Open In Colab

PyTorch

Open In Colab

Open on Torch Hub

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav') 
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

ONNX

Open In Colab

You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual onnx inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
print(decoded)

Metrics

Performance Metrics

Speed metrics here.

Quality Metrics

Quality metrics here.

Contact

Get in Touch

Try our models, create an issue, join our chat, email us.

Commercial Inquiries

Please see our wiki and tiers for relevant information and email us.

History

Languages
Python 89.1%
Jupyter Notebook 10.9%