gradio-webrtc/cookbook.md at ee049cd4bc93c040374c5e1f218cdf5e1406f47a

mirror of https://github.com/HumanAIGC-Engineering/gradio-webrtc.git synced 2026-02-05 01:49:23 +08:00

Files

Freddy Boulton 6517a93472 Clean up cancelled generators (#124 )

* fix links

* fix upload

* add code

* Add code

---------

Co-authored-by: Freddy Boulton <freddyboulton@hf-freddy.local>

2025-03-04 18:08:10 -05:00

13 KiB

Raw Blame History

A collection of applications built with FastRTC. Click on the tags below to find the app you're looking for!

audio video llm computer-vision real-time-api voice chat code generation stopword transcription SambaNova Groq ElevenLabs Kyutai Agentic

🗣️{ .lg .middle }👀{ .lg .middle } Gemini Audio Video Chat {: data-tags="audio,video,real-time-api"}

Stream BOTH your webcam video and audio feeds to Google Gemini. You can also upload images to augment your conversation!

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🗣️{ .lg .middle } Google Gemini Real Time Voice API {: data-tags="audio,real-time-api,voice-chat"}

Talk to Gemini in real time using Google's voice API.

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🗣️{ .lg .middle } OpenAI Real Time Voice API {: data-tags="audio,real-time-api,voice-chat"}

Talk to ChatGPT in real time using OpenAI's voice API.

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🤖{ .lg .middle } Hello Computer {: data-tags="llm,stopword,sambanova"}

Say computer before asking your question!

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🤖{ .lg .middle } Llama Code Editor {: data-tags="audio,llm,code-generation,groq,stopword"}

Create and edit HTML pages with just your voice! Powered by Groq!

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🗣️{ .lg .middle } SmolAgents with Voice {: data-tags="audio,llm,voice-chat,agentic"}

Build a voice-based smolagent to find a coworking space!

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🗣️{ .lg .middle } Talk to Claude {: data-tags="audio,llm,voice-chat"}

Use the Anthropic and Play.Ht APIs to have an audio conversation with Claude.

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🎵{ .lg .middle } LLM Voice Chat {: data-tags="audio,llm,voice-chat,groq,elevenlabs"}

Talk to an LLM with ElevenLabs!

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🎵{ .lg .middle } Whisper Transcription {: data-tags="audio,transcription,groq"}

Have whisper transcribe your speech in real time!

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🤖{ .lg .middle } Talk to Sambanova {: data-tags="llm,stopword,sambanova"}

Talk to Llama 3.2 with the SambaNova API.

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🗣️{ .lg .middle } Hello Llama: Stop Word Detection {: data-tags="audio,llm,code-generation,stopword,sambanova"}

A code editor built with Llama 3.3 70b that is triggered by the phrase "Hello Llama". Build a Siri-like coding assistant in 100 lines of code!

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🗣️{ .lg .middle } Audio Input/Output with mini-omni2 {: data-tags="audio,llm,voice-chat"}

Build a GPT-4o like experience with mini-omni2, an audio-native LLM.

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🗣️{ .lg .middle } Kyutai Moshi {: data-tags="audio,llm,voice-chat,kyutai"}

Kyutai's moshi is a novel speech-to-speech model for modeling human conversations.

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🗣️{ .lg .middle } Talk to Ultravox {: data-tags="audio,llm,voice-chat"}

Talk to Fixie.AI's audio-native Ultravox LLM with the transformers library.

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🗣️{ .lg .middle } Talk to Llama 3.2 3b {: data-tags="audio,llm,voice-chat"}

Use the Lepton API to make Llama 3.2 talk back to you!

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🤖{ .lg .middle } Talk to Qwen2-Audio {: data-tags="audio,llm,voice-chat"}

Qwen2-Audio is a SOTA audio-to-text LLM developed by Alibaba.

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
📷{ .lg .middle } Yolov10 Object Detection {: data-tags="video,computer-vision"}

Run the Yolov10 model on a user webcam stream in real time!

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
📷{ .lg .middle } Video Object Detection with RT-DETR {: data-tags="video,computer-vision"}

Upload a video and stream out frames with detected objects (powered by RT-DETR) model.

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🔈{ .lg .middle } Text-to-Speech with Parler {: data-tags="audio"}

Stream out audio generated by Parler TTS!

:octicons-arrow-right-24: Demo

:octicons-code-16: Code

13 KiB Raw Blame History

13 KiB

Raw Blame History