Add some utils fns, add moshi to cookbook, fix querySelector, support async functions in ReplyOnPause (#29)

* add

* add code
This commit is contained in:
Freddy Boulton
2024-12-04 15:14:19 -05:00
committed by GitHub
parent c85c117576
commit 868e0bfa64
9 changed files with 158 additions and 10 deletions

View File

@@ -24,6 +24,18 @@
[:octicons-code-16: Code](https://huggingface.co/spaces/freddyaboulton/talk-to-claude/blob/main/app.py)
- :speaking_head:{ .lg .middle } __Kyutai Moshi__
---
Kyutai's moshi is a novel speech-to-speech model for modeling human conversations.
<video width=98% src="https://github.com/user-attachments/assets/becc7a13-9e89-4a19-9df2-5fb1467a0137" controls style="text-align: center"></video>
[:octicons-arrow-right-24: Demo](https://huggingface.co/spaces/freddyaboulton/talk-to-moshi)
[:octicons-code-16: Code](https://huggingface.co/spaces/freddyaboulton/talk-to-moshi/blob/main/app.py)
- :robot:{ .lg .middle } __Llama Code Editor__
---

View File

@@ -22,7 +22,4 @@ pip install gradio_webrtc[vad]
```
## Examples
1. [Object Detection from Webcam with YOLOv10](https://huggingface.co/spaces/freddyaboulton/webrtc-yolov10n) 📷
2. [Streaming Object Detection from Video with RT-DETR](https://huggingface.co/spaces/freddyaboulton/rt-detr-object-detection-webrtc) 🎥
3. [Text-to-Speech](https://huggingface.co/spaces/freddyaboulton/parler-tts-streaming-webrtc) 🗣️
4. [Conversational AI](https://huggingface.co/spaces/freddyaboulton/omni-mini-webrtc) 🤖🗣️
See the [cookbook](/cookbook)

54
docs/utils.md Normal file
View File

@@ -0,0 +1,54 @@
# Utils
## `audio_to_bytes`
Convert an audio tuple containing sample rate and numpy array data into bytes.
Useful for sending data to external APIs from `ReplyOnPause` handler.
Parameters
```
audio : tuple[int, np.ndarray]
A tuple containing:
- sample_rate (int): The audio sample rate in Hz
- data (np.ndarray): The audio data as a numpy array
```
Returns
```
bytes
The audio data encoded as bytes, suitable for transmission or storage
```
Example
```python
>>> sample_rate = 44100
>>> audio_data = np.array([0.1, -0.2, 0.3]) # Example audio samples
>>> audio_tuple = (sample_rate, audio_data)
>>> audio_bytes = audio_to_bytes(audio_tuple)
```
## `audio_to_file`
Save an audio tuple containing sample rate and numpy array data to a file.
Parameters
```
audio : tuple[int, np.ndarray]
A tuple containing:
- sample_rate (int): The audio sample rate in Hz
- data (np.ndarray): The audio data as a numpy array
```
Returns
```
str
The path to the saved audio file
```
Example
```
```python
>>> sample_rate = 44100
>>> audio_data = np.array([0.1, -0.2, 0.3]) # Example audio samples
>>> audio_tuple = (sample_rate, audio_data)
>>> file_path = audio_to_file(audio_tuple)
>>> print(f"Audio saved to: {file_path}")
```