3.4 KiB
A collection of VAD models ready to use with FastRTC. Click on the tags below to find the VAD model you're looking for!
pytorch
-
🗣️{ .lg .middle }👀{ .lg .middle } Your VAD Model {: data-tags="pytorch"}
Description
Install Instructions
Usage
[:octicons-arrow-right-24: Demo](Your demo here)
[:octicons-code-16: Repository](Code here)
How to add your own VAD model
-
Your model can be implemented in any framework you want but it must implement the
PauseDetectionModelprotocol.ModelOptions: TypeAlias = Any class PauseDetectionModel(Protocol): def vad( self, audio: tuple[int, NDArray[np.int16] | NDArray[np.float32]], options: ModelOptions, ) -> tuple[float, list[AudioChunk]]: ... def warmup( self, ) -> None: ...-
The
vadmethod should take a numpy array of audio data and return a tuple of the form(speech_duration, and list[AudioChunk])wherespeech_durationis the duration of the human speech in the audio chunk andAudioChunkis a dictionary with the following fields:(start, end)wherestartandendare the start and end times of the human speech in the audio array. -
The
audiotuple should be of the form(sample_rate, audio_array)wheresample_rateis the sample rate of the audio array andaudio_arrayis a numpy array of the audio data. It can be of typenp.int16ornp.float32. -
The
warmupmethod is optional but recommended to warm up the model when the server starts.
-
-
Once you have your model implemented, you can use it in the
ReplyOnPauseclass by passing in the model and any options you need.from fastrtc import ReplyOnPause, Stream from your_model import YourModel def echo(audio): yield audio model = YourModel() # implement the PauseDetectionModel protocol reply_on_pause = ReplyOnPause( echo, model=model, options=YourModelOptions(), ) stream = Stream(reply_on_pause, mode="send-receive", modality="audio") stream.ui.launch() -
Open a PR to add your model to the gallery! Ideally you model package should be pip installable so other can try it out easily.