diff --git a/README.md b/README.md index a580da5..3406030 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ -[![Mailing list : test](http://img.shields.io/badge/Email-gray.svg?style=for-the-badge&logo=gmail)](mailto:hello@silero.ai) [![Mailing list : test](http://img.shields.io/badge/Telegram-blue.svg?style=for-the-badge&logo=telegram)](https://t.me/joinchat/Bv9tjhpdXTI22OUgpOIIDg) [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-MIT-lightgrey.svg?style=for-the-badge)](https://github.com/snakers4/silero-models/blob/master/LICENSE) +[![Mailing list : test](http://img.shields.io/badge/Email-gray.svg?style=for-the-badge&logo=gmail)](mailto:hello@silero.ai) [![Mailing list : test](http://img.shields.io/badge/Telegram-blue.svg?style=for-the-badge&logo=telegram)](https://t.me/joinchat/Bv9tjhpdXTI22OUgpOIIDg) [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-MIT-lightgrey.svg?style=for-the-badge)](https://github.com/snakers4/silero-vad/blob/master/LICENSE) -[![Open on Torch Hub](https://img.shields.io/badge/Torch-Hub-red?logo=pytorch&style=for-the-badge)](https://pytorch.org/hub/snakers4_silero-vad/) +[![Open on Torch Hub](https://img.shields.io/badge/Torch-Hub-red?logo=pytorch&style=for-the-badge)](https://pytorch.org/hub/snakers4_silero-vad/) (coming soon) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb) @@ -16,24 +16,23 @@ - [Contact](#contact) - [Get in Touch](#get-in-touch) - [Commercial Inquiries](#commercial-inquiries) -- [History](#history) # Silero VAD `Single Image Why our VAD is better than WebRTC` -Silero VAD: pre-trained enterprise-grade Voice Activity and Number Detector. +Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier. Enterprise-grade Speech Products made refreshingly simple (all see our [STT](https://github.com/snakers4/silero-models)). Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector ([link](https://github.com/wiseman/py-webrtcvad)). -Also in enterprise it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. Name recognition is highly subjective and would depend on location, but Voice Activity and Number detections are quite general tasks. +Also in enterprise it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. Name recognition is highly subjective and would depend on locale and business case, but Voice Activity and Number detections are quite general tasks. -**Key advantages:** +**Key advantages / features:** - Modern, portable; -- Small memory footprint (?); +- Small memory footprint; - Trained on huge spoken corpora and noise / sound libraries; - Slower than WebRTC, but sufficiently fast for IOT / edge / mobile applications; - Superior metrics to WebRTC; @@ -44,53 +43,28 @@ Also in enterprise it is crucial to be able to anonymize large-scale spoken corp - Voice detection for IOT / edge / mobile use cases; - Data cleaning and preparation, number and voice detection in general; - -Key features / differences: - ## Getting Started -All of the provided models are listed in the [models.yml](https://github.com/snakers4/silero-models/blob/master/models.yml) file. -Any meta-data and newer versions will be added there. +The models are small enough to be included directly into this repository. Newer models will supersede older models directly. -Currently we provide the following checkpoints: +Currently we provide the following models: -| | PyTorch | ONNX | Quantization | Languages | Colab | -|-----------------|--------------------|--------------------|--------------|---------|-------| -| VAD v1 (vad_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | `ru`, `en`, `de`, `es` | -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb) | +| | Released |PyTorch | ONNX | VAD | Number Detector | Language Classifier | Languages | Colab | +|----|------------|-------------------|--------------------|---------------------| --------------------|---------------------|-------------------------|-------| +| v1 | 2020-12-15 |:heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | | | `ru`, `en`, `de`, `es` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb) | +Version history: + +- v1, 2020-12-15, initial release, no Number Detector or Language Classifier heads yet; ### PyTorch [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb) -[![Open on Torch Hub](https://img.shields.io/badge/Torch-Hub-red?logo=pytorch&style=for-the-badge)](https://pytorch.org/hub/snakers4_silero-vad/) +[![Open on Torch Hub](https://img.shields.io/badge/Torch-Hub-red?logo=pytorch&style=for-the-badge)](https://pytorch.org/hub/snakers4_silero-vad/) (coming soon) ```python -import torch -import zipfile -import torchaudio -from glob import glob - -device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU -model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', - model='silero_stt', - language='en', # also available 'de', 'es' - device=device) -(read_batch, split_into_batches, - read_audio, prepare_model_input) = utils # see function signature for details - -# download a single file, any format compatible with TorchAudio (soundfile backend) -torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', - dst ='speech_orig.wav', progress=True) -test_files = glob('speech_orig.wav') -batches = split_into_batches(test_files, batch_size=10) -input = prepare_model_input(read_batch(batches[0]), - device=device) - -output = model(input) -for example in output: - print(decoder(example.cpu())) +TBD ``` ### ONNX @@ -100,42 +74,7 @@ for example in output: You can run our model everywhere, where you can import the ONNX model or run ONNX runtime. ```python -import onnx -import torch -import onnxruntime -from omegaconf import OmegaConf - -language = 'en' # also available 'de', 'es' - -# load provided utils -_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language) -(read_batch, split_into_batches, - read_audio, prepare_model_input) = utils - -# see available models -torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml') -models = OmegaConf.load('models.yml') -available_languages = list(models.stt_models.keys()) -assert language in available_languages - -# load the actual ONNX model -torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True) -onnx_model = onnx.load('model.onnx') -onnx.checker.check_model(onnx_model) -ort_session = onnxruntime.InferenceSession('model.onnx') - -# download a single file, any format compatible with TorchAudio (soundfile backend) -torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True) -test_files = ['speech_orig.wav'] -batches = split_into_batches(test_files, batch_size=10) -input = prepare_model_input(read_batch(batches[0])) - -# actual onnx inference and decoding -onnx_input = input.detach().cpu().numpy() -ort_inputs = {'input': onnx_input} -ort_outs = ort_session.run(None, ort_inputs) -decoded = decoder(torch.Tensor(ort_outs[0])[0]) -print(decoded) +TBD ``` ## Metrics @@ -152,11 +91,8 @@ Quality metrics here. ### Get in Touch -Try our models, create an [issue](https://github.com/snakers4/silero-models/issues/new), join our [chat](https://t.me/joinchat/Bv9tjhpdXTI22OUgpOIIDg), [email](mailto:hello@silero.ai) us. +Try our models, create an [issue](https://github.com/snakers4/silero-vad/issues/new), start a [discussion](https://github.com/snakers4/silero-vad/discussions/new), join our telegram [chat](https://t.me/joinchat/Bv9tjhpdXTI22OUgpOIIDg), [email](mailto:hello@silero.ai) us. ### Commercial Inquiries -Please see our [wiki](https://github.com/snakers4/silero-models/wiki) and [tiers](https://github.com/snakers4/silero-models/wiki/Licensing-and-Tiers) for relevant information and [email](mailto:hello@silero.ai) us. - -# History - +Please see our [wiki](https://github.com/snakers4/silero-models/wiki) and [tiers](https://github.com/snakers4/silero-models/wiki/Licensing-and-Tiers) for relevant information and [email](mailto:hello@silero.ai) us directly.