gradio-webrtc/cookbook.md at aafd7b82eb94653b7ae66f8be66bb349d43a86ff

mirror of https://github.com/HumanAIGC-Engineering/gradio-webrtc.git synced 2026-02-05 01:49:23 +08:00

Files

Freddy Boulton 853d6a06b5 Rebrand to FastRTC (#60 )

* Add code

* add code

* add code

* Rename messages

* rename

* add code

* Add demo

* docs + demos + bug fixes

* add code

* styles

* user guide

* Styles

* Add code

* misc docs updates

* print nit

* whisper + pr

* url for images

* whsiper update

* Fix bugs

* remove demo files

* version number

* Fix pypi readme

* Fix

* demos

* Add llama code editor

* Update llama code editor and object detection cookbook

* Add more cookbook demos

* add code

* Fix links for PR deploys

* add code

* Fix the install

* add tts

* TTS docs

* Typo

* Pending bubbles for reply on pause

* Stream redesign (#63)

* better error handling

* Websocket error handling

* add code

---------

Co-authored-by: Freddy Boulton <freddyboulton@hf-freddy.local>

* remove docs from dist

* Some docs typos

* more typos

* upload changes + docs

* docs

* better phone

* update docs

* add code

* Make demos better

* fix docs + websocket start_up

* remove mention of FastAPI app

* fastphone tweaks

* add code

* ReplyOnStopWord fixes

* Fix cookbook

* Fix pypi readme

* add code

* bump versions

* sambanova cookbook

* Fix tags

* Llm voice chat

* kyutai tag

* Add error message to all index.html

* STT module uses Moonshine

* Not required from typing extensions

* fix llm voice chat

* Add vpn warning

* demo fixes

* demos

* Add more ui args and gemini audio-video

* update cookbook

* version 9

---------

Co-authored-by: Freddy Boulton <freddyboulton@hf-freddy.local>

2025-02-24 01:13:42 -05:00

12 KiB

Raw Blame History

A collection of applications built with FastRTC. Click on the tags below to find the app you're looking for!

audio video llm computer-vision real-time-api voice chat code generation stopword transcription SambaNova Groq ElevenLabs Kyutai

🗣️{ .lg .middle }👀{ .lg .middle } Gemini Audio Video Chat {: data-tags="audio,video,real-time-api"}

Stream BOTH your webcam video and audio feeds to Google Gemini. You can also upload images to augment your conversation!

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🗣️{ .lg .middle } Google Gemini Real Time Voice API {: data-tags="audio,real-time-api,voice-chat"}

Talk to Gemini in real time using Google's voice API.

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🗣️{ .lg .middle } OpenAI Real Time Voice API {: data-tags="audio,real-time-api,voice-chat"}

Talk to ChatGPT in real time using OpenAI's voice API.

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🤖{ .lg .middle } Hello Computer {: data-tags="llm,stopword,sambanova"}

Say computer before asking your question!

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🤖{ .lg .middle } Llama Code Editor {: data-tags="audio,llm,code-generation,groq,stopword"}

Create and edit HTML pages with just your voice! Powered by Groq!

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🗣️{ .lg .middle } Talk to Claude {: data-tags="audio,llm,voice-chat"}

Use the Anthropic and Play.Ht APIs to have an audio conversation with Claude.

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🎵{ .lg .middle } LLM Voice Chat {: data-tags="audio,llm,voice-chat,groq,elevenlabs"}

Talk to an LLM with ElevenLabs!

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🎵{ .lg .middle } Whisper Transcription {: data-tags="audio,transcription,groq"}

Have whisper transcribe your speech in real time!

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🤖{ .lg .middle } Talk to Sambanova {: data-tags="llm,stopword,sambanova"}

Talk to Llama 3.2 with the SambaNova API.

:octicons-arrow-right-24: Demo

:octicons-arrow-right-24: Gradio UI

:octicons-code-16: Code
🗣️{ .lg .middle } Hello Llama: Stop Word Detection {: data-tags="audio,llm,code-generation,stopword,sambanova"}

A code editor built with Llama 3.3 70b that is triggered by the phrase "Hello Llama". Build a Siri-like coding assistant in 100 lines of code!

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🗣️{ .lg .middle } Audio Input/Output with mini-omni2 {: data-tags="audio,llm,voice-chat"}

Build a GPT-4o like experience with mini-omni2, an audio-native LLM.

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🗣️{ .lg .middle } Kyutai Moshi {: data-tags="audio,llm,voice-chat,kyutai"}

Kyutai's moshi is a novel speech-to-speech model for modeling human conversations.

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🗣️{ .lg .middle } Talk to Ultravox {: data-tags="audio,llm,voice-chat"}

Talk to Fixie.AI's audio-native Ultravox LLM with the transformers library.

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🗣️{ .lg .middle } Talk to Llama 3.2 3b {: data-tags="audio,llm,voice-chat"}

Use the Lepton API to make Llama 3.2 talk back to you!

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🤖{ .lg .middle } Talk to Qwen2-Audio {: data-tags="audio,llm,voice-chat"}

Qwen2-Audio is a SOTA audio-to-text LLM developed by Alibaba.

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
📷{ .lg .middle } Yolov10 Object Detection {: data-tags="video,computer-vision"}

Run the Yolov10 model on a user webcam stream in real time!

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
📷{ .lg .middle } Video Object Detection with RT-DETR {: data-tags="video,computer-vision"}

Upload a video and stream out frames with detected objects (powered by RT-DETR) model.

:octicons-arrow-right-24: Demo

:octicons-code-16: Code
🔈{ .lg .middle } Text-to-Speech with Parler {: data-tags="audio"}

Stream out audio generated by Parler TTS!

:octicons-arrow-right-24: Demo

:octicons-code-16: Code

12 KiB Raw Blame History

12 KiB

Raw Blame History