mirror of
https://github.com/HumanAIGC-Engineering/gradio-webrtc.git
synced 2026-02-04 17:39:23 +08:00
* Add code * add code * add code * Rename messages * rename * add code * Add demo * docs + demos + bug fixes * add code * styles * user guide * Styles * Add code * misc docs updates * print nit * whisper + pr * url for images * whsiper update * Fix bugs * remove demo files * version number * Fix pypi readme * Fix * demos * Add llama code editor * Update llama code editor and object detection cookbook * Add more cookbook demos * add code * Fix links for PR deploys * add code * Fix the install * add tts * TTS docs * Typo * Pending bubbles for reply on pause * Stream redesign (#63) * better error handling * Websocket error handling * add code --------- Co-authored-by: Freddy Boulton <freddyboulton@hf-freddy.local> * remove docs from dist * Some docs typos * more typos * upload changes + docs * docs * better phone * update docs * add code * Make demos better * fix docs + websocket start_up * remove mention of FastAPI app * fastphone tweaks * add code * ReplyOnStopWord fixes * Fix cookbook * Fix pypi readme * add code * bump versions * sambanova cookbook * Fix tags * Llm voice chat * kyutai tag * Add error message to all index.html * STT module uses Moonshine * Not required from typing extensions * fix llm voice chat * Add vpn warning * demo fixes * demos * Add more ui args and gemini audio-video * update cookbook * version 9 --------- Co-authored-by: Freddy Boulton <freddyboulton@hf-freddy.local>
27 lines
1.3 KiB
Markdown
27 lines
1.3 KiB
Markdown
# Audio-Video Streaming
|
|
|
|
You can simultaneously stream audio and video using `AudioVideoStreamHandler` or `AsyncAudioVideoStreamHandler`.
|
|
They are identical to the audio `StreamHandlers` with the addition of `video_receive` and `video_emit` methods which take and return a `numpy` array, respectively.
|
|
|
|
Here is an example of the video handling functions for connecting with the Gemini multimodal API. In this case, we simply reflect the webcam feed back to the user but every second we'll send the latest webcam frame (and an additional image component) to the Gemini server.
|
|
|
|
Please see the "Gemini Audio Video Chat" example in the [cookbook](../../cookbook) for the complete code.
|
|
|
|
``` python title="Async Gemini Video Handling"
|
|
|
|
async def video_receive(self, frame: np.ndarray):
|
|
"""Send video frames to the server"""
|
|
if self.session:
|
|
# send image every 1 second
|
|
# otherwise we flood the API
|
|
if time.time() - self.last_frame_time > 1:
|
|
self.last_frame_time = time.time()
|
|
await self.session.send(encode_image(frame))
|
|
if self.latest_args[2] is not None:
|
|
await self.session.send(encode_image(self.latest_args[2]))
|
|
self.video_queue.put_nowait(frame)
|
|
|
|
async def video_emit(self) -> VideoEmitType:
|
|
"""Return video frames to the client"""
|
|
return await self.video_queue.get()
|
|
``` |