mirror of
https://github.com/HumanAIGC-Engineering/gradio-webrtc.git
synced 2026-02-05 01:49:23 +08:00
@@ -140,4 +140,21 @@ You can control the button color and pulse color with `icon_button_color` and `p
|
||||
pulse_color="black",
|
||||
)
|
||||
```
|
||||
<img src="https://github.com/user-attachments/assets/39e9bb0b-53fb-448e-be44-d37f6785b4b6">
|
||||
<img src="https://github.com/user-attachments/assets/39e9bb0b-53fb-448e-be44-d37f6785b4b6">
|
||||
|
||||
|
||||
## Changing the Button Text
|
||||
|
||||
You can supply a `button_labels` dictionary to change the text displayed in the `Start`, `Stop` and `Waiting` buttons that are displayed in the UI.
|
||||
The keys must be `"start"`, `"stop"`, and `"waiting"`.
|
||||
|
||||
``` python
|
||||
webrtc = WebRTC(
|
||||
label="Video Chat",
|
||||
modality="audio-video",
|
||||
mode="send-receive",
|
||||
button_labels={"start": "Start Talking to Gemini"}
|
||||
)
|
||||
```
|
||||
|
||||
<img src="https://github.com/user-attachments/assets/04be0b95-189c-4b4b-b8cc-1eb598529dd3" />
|
||||
|
||||
@@ -1,16 +1,16 @@
|
||||
<div class="grid cards" markdown>
|
||||
|
||||
- :speaking_head:{ .lg .middle } __OpenAI Real Time Voice API__
|
||||
- :speaking_head:{ .lg .middle }:eyes:{ .lg .middle } __Gemini Audio Video Chat__
|
||||
|
||||
---
|
||||
|
||||
Talk to ChatGPT in real time using OpenAI's voice API.
|
||||
Stream BOTH your webcam video and audio feeds to Google Gemini. You can also upload images to augment your conversation!
|
||||
|
||||
<video width=98% src="https://github.com/user-attachments/assets/41a63376-43ec-496a-9b31-4f067d3903d6" controls style="text-align: center"></video>
|
||||
<video width=98% src="https://github.com/user-attachments/assets/9636dc97-4fee-46bb-abb8-b92e69c08c71" controls style="text-align: center"></video>
|
||||
|
||||
[:octicons-arrow-right-24: Demo](https://huggingface.co/spaces/freddyaboulton/openai-realtime-voice)
|
||||
[:octicons-arrow-right-24: Demo](https://huggingface.co/spaces/freddyaboulton/gemini-audio-video-chat)
|
||||
|
||||
[:octicons-code-16: Code](https://huggingface.co/spaces/freddyaboulton/openai-realtime-voice/blob/main/app.py)
|
||||
[:octicons-code-16: Code](https://huggingface.co/spaces/freddyaboulton/gemini-audio-video-chat/blob/main/app.py)
|
||||
|
||||
- :speaking_head:{ .lg .middle } __Google Gemini Real Time Voice API__
|
||||
|
||||
@@ -24,6 +24,18 @@
|
||||
|
||||
[:octicons-code-16: Code](https://huggingface.co/spaces/freddyaboulton/gemini-voice/blob/main/app.py)
|
||||
|
||||
- :speaking_head:{ .lg .middle } __OpenAI Real Time Voice API__
|
||||
|
||||
---
|
||||
|
||||
Talk to ChatGPT in real time using OpenAI's voice API.
|
||||
|
||||
<video width=98% src="https://github.com/user-attachments/assets/41a63376-43ec-496a-9b31-4f067d3903d6" controls style="text-align: center"></video>
|
||||
|
||||
[:octicons-arrow-right-24: Demo](https://huggingface.co/spaces/freddyaboulton/openai-realtime-voice)
|
||||
|
||||
[:octicons-code-16: Code](https://huggingface.co/spaces/freddyaboulton/openai-realtime-voice/blob/main/app.py)
|
||||
|
||||
- :speaking_head:{ .lg .middle } __Hello Llama: Stop Word Detection__
|
||||
|
||||
---
|
||||
|
||||
@@ -419,6 +419,34 @@ Set up a server-to-client stream to stream video from an arbitrary user interact
|
||||
2. Set `mode="receive"` to only receive audio from the server.
|
||||
3. The `trigger` parameter the gradio event that will trigger the stream. In this case, the button click event.
|
||||
|
||||
## Audio-Video Streaming
|
||||
|
||||
You can simultaneously stream audio and video simultaneously to/from a server using `AudioVideoStreamHandler` or `AsyncAudioVideoStreamHandler`.
|
||||
They are identical to the audio `StreamHandlers` with the addition of `video_receive` and `video_emit` methods which take and return a `numpy` array, respectively.
|
||||
|
||||
Here is an example of the video handling functions for connecting with the Gemini multimodal API. In this case, we simply reflect the webcam feed back to the user but every second we'll send the latest webcam frame (and an additional image component) to the Gemini server.
|
||||
|
||||
Please see the "Gemini Audio Video Chat" example in the [cookbook](/cookbook) for the complete code.
|
||||
|
||||
``` python title="Async Gemini Video Handling"
|
||||
|
||||
async def video_receive(self, frame: np.ndarray):
|
||||
"""Send video frames to the server"""
|
||||
if self.session:
|
||||
# send image every 1 second
|
||||
# otherwise we flood the API
|
||||
if time.time() - self.last_frame_time > 1:
|
||||
self.last_frame_time = time.time()
|
||||
await self.session.send(encode_image(frame))
|
||||
if self.latest_args[2] is not None:
|
||||
await self.session.send(encode_image(self.latest_args[2]))
|
||||
self.video_queue.put_nowait(frame)
|
||||
|
||||
async def video_emit(self) -> VideoEmitType:
|
||||
"""Return video frames to the client"""
|
||||
return await self.video_queue.get()
|
||||
```
|
||||
|
||||
|
||||
## Additional Outputs
|
||||
|
||||
|
||||
Reference in New Issue
Block a user