mirror of
https://github.com/HumanAIGC-Engineering/gradio-webrtc.git
synced 2026-02-05 01:49:23 +08:00
* Add code * add code * add code * Rename messages * rename * add code * Add demo * docs + demos + bug fixes * add code * styles * user guide * Styles * Add code * misc docs updates * print nit * whisper + pr * url for images * whsiper update * Fix bugs * remove demo files * version number * Fix pypi readme * Fix * demos * Add llama code editor * Update llama code editor and object detection cookbook * Add more cookbook demos * add code * Fix links for PR deploys * add code * Fix the install * add tts * TTS docs * Typo * Pending bubbles for reply on pause * Stream redesign (#63) * better error handling * Websocket error handling * add code --------- Co-authored-by: Freddy Boulton <freddyboulton@hf-freddy.local> * remove docs from dist * Some docs typos * more typos * upload changes + docs * docs * better phone * update docs * add code * Make demos better * fix docs + websocket start_up * remove mention of FastAPI app * fastphone tweaks * add code * ReplyOnStopWord fixes * Fix cookbook * Fix pypi readme * add code * bump versions * sambanova cookbook * Fix tags * Llm voice chat * kyutai tag * Add error message to all index.html * STT module uses Moonshine * Not required from typing extensions * fix llm voice chat * Add vpn warning * demo fixes * demos * Add more ui args and gemini audio-video * update cookbook * version 9 --------- Co-authored-by: Freddy Boulton <freddyboulton@hf-freddy.local>
460 lines
16 KiB
Markdown
460 lines
16 KiB
Markdown
# Connecting via API
|
|
|
|
Before continuing, select the `modality`, `mode` of your `Stream` and whether you're using `WebRTC` or `WebSocket`s.
|
|
|
|
<div class="config-selector">
|
|
<div class="select-group">
|
|
<label for="connection">Connection</label>
|
|
<select id="connection" onchange="updateDocs()">
|
|
<option value="webrtc">WebRTC</option>
|
|
<option value="websocket">WebSocket</option>
|
|
</select>
|
|
</div>
|
|
<div class="select-group">
|
|
<label for="modality">Modality</label>
|
|
<select id="modality" onchange="updateDocs()">
|
|
<option value="audio">Audio</option>
|
|
<option value="video">Video</option>
|
|
<option value="audio-video">Audio-Video</option>
|
|
</select>
|
|
</div>
|
|
<div class="select-group">
|
|
<label for="mode">Mode</label>
|
|
<select id="mode" onchange="updateDocs()">
|
|
<option value="send-receive">Send-Receive</option>
|
|
<option value="receive">Receive</option>
|
|
<option value="send">Send</option>
|
|
</select>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
### Sample Code
|
|
<div id="docs"></div>
|
|
|
|
### Message Format
|
|
|
|
Over both WebRTC and WebSocket, the server can send messages of the following format:
|
|
|
|
```json
|
|
{
|
|
"type": `send_input` | `fetch_output` | `stopword` | `error` | `warning` | `log`,
|
|
"data": string | object
|
|
}
|
|
```
|
|
|
|
- `send_input`: Send any input data for the handler to the server. See [`Additional Inputs`](#additional-inputs) for more details.
|
|
- `fetch_output`: An instance of [`AdditionalOutputs`](#additional-outputs) is sent to the server.
|
|
- `stopword`: The stopword has been detected. See [`ReplyOnStopWords`](../audio/#reply-on-stopwords) for more details.
|
|
- `error`: An error occurred. The `data` will be a string containing the error message.
|
|
- `warning`: A warning occurred. The `data` will be a string containing the warning message.
|
|
- `log`: A log message. The `data` will be a string containing the log message.
|
|
|
|
The `ReplyOnPause` handler can also send the following `log` messages.
|
|
|
|
```json
|
|
{
|
|
"type": "log",
|
|
"data": "pause_detected" | "response_starting"
|
|
}
|
|
```
|
|
|
|
!!! tip
|
|
When using WebRTC, the messages will be encoded as strings, so parse as JSON before using.
|
|
|
|
### Additional Inputs
|
|
|
|
When the `send_input` message is received, update the inputs of your handler however you like by using the `set_input` method of the `Stream` object.
|
|
|
|
A common pattern is to use a `POST` request to send the updated data. The first argument to the `set_input` method is the `webrtc_id` of the handler.
|
|
|
|
```python
|
|
from pydantic import BaseModel, Field
|
|
|
|
class InputData(BaseModel):
|
|
webrtc_id: str
|
|
conf_threshold: float = Field(ge=0, le=1)
|
|
|
|
|
|
@app.post("/input_hook")
|
|
async def _(data: InputData):
|
|
stream.set_input(data.webrtc_id, data.conf_threshold)
|
|
```
|
|
|
|
The updated data will be passed to the handler on the **next** call.
|
|
|
|
### Additional Outputs
|
|
|
|
The `fetch_output` message is sent to the client whenever an instance of [`AdditionalOutputs`](../streams/#additional-outputs) is available. You can access the latest output data by calling the `fetch_latest_output` method of the `Stream` object.
|
|
|
|
However, rather than fetching each output manually, a common pattern is to fetch the entire stream of output data by calling the `output_stream` method.
|
|
|
|
Here is an example:
|
|
```python
|
|
from fastapi.responses import StreamingResponse
|
|
|
|
@app.get("/updates")
|
|
async def stream_updates(webrtc_id: str):
|
|
async def output_stream():
|
|
async for output in stream.output_stream(webrtc_id):
|
|
# Output is the AdditionalOutputs instance
|
|
# Be sure to serialize it however you would like
|
|
yield f"data: {output.args[0]}\n\n"
|
|
|
|
return StreamingResponse(
|
|
output_stream(),
|
|
media_type="text/event-stream"
|
|
)
|
|
```
|
|
|
|
### Handling Errors
|
|
|
|
When connecting via `WebRTC`, the server will respond to the `/webrtc/offer` route with a JSON response. If there are too many connections, the server will respond with a 429 error.
|
|
|
|
```json
|
|
{
|
|
"status": "failed",
|
|
"meta": {
|
|
"error": "concurrency_limit_reached",
|
|
"limit": 10
|
|
}
|
|
```
|
|
|
|
Over `WebSocket`, the server will send the same message before closing the connection.
|
|
|
|
|
|
<style>
|
|
.config-selector {
|
|
margin: 1em 0;
|
|
display: flex;
|
|
gap: 2em;
|
|
}
|
|
|
|
.select-group {
|
|
display: flex;
|
|
flex-direction: column;
|
|
gap: 0.5em;
|
|
}
|
|
|
|
.select-group label {
|
|
font-size: 0.8em;
|
|
font-weight: 600;
|
|
color: var(--md-default-fg-color--light);
|
|
}
|
|
|
|
.select-group select {
|
|
padding: 0.5em;
|
|
border: 1px solid var(--md-default-fg-color--lighter);
|
|
border-radius: 4px;
|
|
background-color: var(--md-code-bg-color);
|
|
color: var(--md-code-fg-color);
|
|
width: 150px;
|
|
font-size: 0.9em;
|
|
}
|
|
|
|
/* Style code blocks to match site theme */
|
|
.rendered-content pre {
|
|
background-color: var(--md-code-bg-color) !important;
|
|
color: var(--md-code-fg-color) !important;
|
|
padding: 1em;
|
|
border-radius: 4px;
|
|
}
|
|
|
|
.rendered-content code {
|
|
font-family: var(--md-code-font-family);
|
|
background-color: var(--md-code-bg-color) !important;
|
|
color: var(--md-code-fg-color) !important;
|
|
}
|
|
</style>
|
|
|
|
<script>
|
|
|
|
// doT.js
|
|
// 2011-2014, Laura Doktorova, https://github.com/olado/doT
|
|
// Licensed under the MIT license.
|
|
|
|
|
|
var doT = {
|
|
name: "doT",
|
|
version: "1.1.1",
|
|
templateSettings: {
|
|
evaluate: /\{\{([\s\S]+?(\}?)+)\}\}/g,
|
|
interpolate: /\{\{=([\s\S]+?)\}\}/g,
|
|
encode: /\{\{!([\s\S]+?)\}\}/g,
|
|
use: /\{\{#([\s\S]+?)\}\}/g,
|
|
useParams: /(^|[^\w$])def(?:\.|\[[\'\"])([\w$\.]+)(?:[\'\"]\])?\s*\:\s*([\w$\.]+|\"[^\"]+\"|\'[^\']+\'|\{[^\}]+\})/g,
|
|
define: /\{\{##\s*([\w\.$]+)\s*(\:|=)([\s\S]+?)#\}\}/g,
|
|
defineParams: /^\s*([\w$]+):([\s\S]+)/,
|
|
conditional: /\{\{\?(\?)?\s*([\s\S]*?)\s*\}\}/g,
|
|
iterate: /\{\{~\s*(?:\}\}|([\s\S]+?)\s*\:\s*([\w$]+)\s*(?:\:\s*([\w$]+))?\s*\}\})/g,
|
|
varname: "it",
|
|
strip: false,
|
|
append: true,
|
|
selfcontained: false,
|
|
doNotSkipEncoded: false
|
|
},
|
|
template: undefined, //fn, compile template
|
|
compile: undefined, //fn, for express
|
|
log: true
|
|
}, _globals;
|
|
|
|
doT.encodeHTMLSource = function (doNotSkipEncoded) {
|
|
var encodeHTMLRules = { "&": "&", "<": "<", ">": ">", '"': """, "'": "'", "/": "/" },
|
|
matchHTML = doNotSkipEncoded ? /[&<>"'\/]/g : /&(?!#?\w+;)|<|>|"|'|\//g;
|
|
return function (code) {
|
|
return code ? code.toString().replace(matchHTML, function (m) { return encodeHTMLRules[m] || m; }) : "";
|
|
};
|
|
};
|
|
|
|
_globals = (function () { return this || (0, eval)("this"); }());
|
|
|
|
/* istanbul ignore else */
|
|
if (typeof module !== "undefined" && module.exports) {
|
|
module.exports = doT;
|
|
} else if (typeof define === "function" && define.amd) {
|
|
define(function () { return doT; });
|
|
} else {
|
|
_globals.doT = doT;
|
|
}
|
|
|
|
var startend = {
|
|
append: { start: "'+(", end: ")+'", startencode: "'+encodeHTML(" },
|
|
split: { start: "';out+=(", end: ");out+='", startencode: "';out+=encodeHTML(" }
|
|
}, skip = /$^/;
|
|
|
|
function resolveDefs(c, block, def) {
|
|
return ((typeof block === "string") ? block : block.toString())
|
|
.replace(c.define || skip, function (m, code, assign, value) {
|
|
if (code.indexOf("def.") === 0) {
|
|
code = code.substring(4);
|
|
}
|
|
if (!(code in def)) {
|
|
if (assign === ":") {
|
|
if (c.defineParams) value.replace(c.defineParams, function (m, param, v) {
|
|
def[code] = { arg: param, text: v };
|
|
});
|
|
if (!(code in def)) def[code] = value;
|
|
} else {
|
|
new Function("def", "def['" + code + "']=" + value)(def);
|
|
}
|
|
}
|
|
return "";
|
|
})
|
|
.replace(c.use || skip, function (m, code) {
|
|
if (c.useParams) code = code.replace(c.useParams, function (m, s, d, param) {
|
|
if (def[d] && def[d].arg && param) {
|
|
var rw = (d + ":" + param).replace(/'|\\/g, "_");
|
|
def.__exp = def.__exp || {};
|
|
def.__exp[rw] = def[d].text.replace(new RegExp("(^|[^\\w$])" + def[d].arg + "([^\\w$])", "g"), "$1" + param + "$2");
|
|
return s + "def.__exp['" + rw + "']";
|
|
}
|
|
});
|
|
var v = new Function("def", "return " + code)(def);
|
|
return v ? resolveDefs(c, v, def) : v;
|
|
});
|
|
}
|
|
|
|
function unescape(code) {
|
|
return code.replace(/\\('|\\)/g, "$1").replace(/[\r\t\n]/g, " ");
|
|
}
|
|
|
|
doT.template = function (tmpl, c, def) {
|
|
c = c || doT.templateSettings;
|
|
var cse = c.append ? startend.append : startend.split, needhtmlencode, sid = 0, indv,
|
|
str = (c.use || c.define) ? resolveDefs(c, tmpl, def || {}) : tmpl;
|
|
|
|
str = ("var out='" + (c.strip ? str.replace(/(^|\r|\n)\t* +| +\t*(\r|\n|$)/g, " ")
|
|
.replace(/\r|\n|\t|\/\*[\s\S]*?\*\//g, "") : str)
|
|
.replace(/'|\\/g, "\\$&")
|
|
.replace(c.interpolate || skip, function (m, code) {
|
|
return cse.start + unescape(code) + cse.end;
|
|
})
|
|
.replace(c.encode || skip, function (m, code) {
|
|
needhtmlencode = true;
|
|
return cse.startencode + unescape(code) + cse.end;
|
|
})
|
|
.replace(c.conditional || skip, function (m, elsecase, code) {
|
|
return elsecase ?
|
|
(code ? "';}else if(" + unescape(code) + "){out+='" : "';}else{out+='") :
|
|
(code ? "';if(" + unescape(code) + "){out+='" : "';}out+='");
|
|
})
|
|
.replace(c.iterate || skip, function (m, iterate, vname, iname) {
|
|
if (!iterate) return "';} } out+='";
|
|
sid += 1; indv = iname || "i" + sid; iterate = unescape(iterate);
|
|
return "';var arr" + sid + "=" + iterate + ";if(arr" + sid + "){var " + vname + "," + indv + "=-1,l" + sid + "=arr" + sid + ".length-1;while(" + indv + "<l" + sid + "){"
|
|
+ vname + "=arr" + sid + "[" + indv + "+=1];out+='";
|
|
})
|
|
.replace(c.evaluate || skip, function (m, code) {
|
|
return "';" + unescape(code) + "out+='";
|
|
})
|
|
+ "';return out;")
|
|
.replace(/\n/g, "\\n").replace(/\t/g, '\\t').replace(/\r/g, "\\r")
|
|
.replace(/(\s|;|\}|^|\{)out\+='';/g, '$1').replace(/\+''/g, "");
|
|
//.replace(/(\s|;|\}|^|\{)out\+=''\+/g,'$1out+=');
|
|
|
|
if (needhtmlencode) {
|
|
if (!c.selfcontained && _globals && !_globals._encodeHTML) _globals._encodeHTML = doT.encodeHTMLSource(c.doNotSkipEncoded);
|
|
str = "var encodeHTML = typeof _encodeHTML !== 'undefined' ? _encodeHTML : ("
|
|
+ doT.encodeHTMLSource.toString() + "(" + (c.doNotSkipEncoded || '') + "));"
|
|
+ str;
|
|
}
|
|
try {
|
|
return new Function(c.varname, str);
|
|
} catch (e) {
|
|
/* istanbul ignore else */
|
|
if (typeof console !== "undefined") console.log("Could not create a template function: " + str);
|
|
throw e;
|
|
}
|
|
};
|
|
|
|
doT.compile = function (tmpl, def) {
|
|
return doT.template(tmpl, null, def);
|
|
};
|
|
|
|
// WebRTC template
|
|
|
|
const webrtcTemplate = doT.template(`
|
|
To connect to the server, you need to create a new RTCPeerConnection object and call the \`setupWebRTC\` function below.
|
|
{{? it.mode === "send-receive" || it.mode === "receive" }}
|
|
This code snippet assumes there is an html element with an id of \`{{=it.modality}}_output_component_id\` where the output will be displayed. It should be {{? it.modality === "audio"}}a \`<audio>\`{{??}}an \`<video>\`{{?}} element.
|
|
{{?}}
|
|
|
|
\`\`\`javascript
|
|
// pass any rtc_configuration params here
|
|
const pc = new RTCPeerConnection();
|
|
{{? it.mode === "send-receive" || it.mode === "receive" }}
|
|
const {{=it.modality}}_output_component = document.getElementById("{{=it.modality}}_output_component_id");
|
|
{{?}}
|
|
async function setupWebRTC(peerConnection) {
|
|
{{? it.mode === "send-receive" || it.mode === "send" }}
|
|
// Get {{=it.modality}} stream from webcam
|
|
const stream = await navigator.mediaDevices.getUserMedia({
|
|
{{=it.modality}}: true,
|
|
})
|
|
{{?}}
|
|
{{? it.mode === "send-receive" }}
|
|
// Send {{=it.modality}} stream to server
|
|
stream.getTracks().forEach(async (track) => {
|
|
const sender = pc.addTrack(track, stream);
|
|
})
|
|
{{?? it.mode === "send" }}
|
|
// Receive {{=it.modality}} stream from server
|
|
pc.addTransceiver({{=it.modality}}, { direction: "recvonly" })
|
|
{{?}}
|
|
{{? it.mode === "send-receive" || it.mode === "receive" }}
|
|
peerConnection.addEventListener("track", (evt) => {
|
|
if ({{=it.modality}}_output_component &&
|
|
{{=it.modality}}_output_component.srcObject !== evt.streams[0]) {
|
|
{{=it.modality}}_output_component.srcObject = evt.streams[0];
|
|
}
|
|
});
|
|
{{?}}
|
|
// Create data channel (needed!)
|
|
const dataChannel = peerConnection.createDataChannel("text");
|
|
|
|
// Create and send offer
|
|
const offer = await peerConnection.createOffer();
|
|
await peerConnection.setLocalDescription(offer);
|
|
|
|
// Send offer to server
|
|
const response = await fetch('/webrtc/offer', {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json' },
|
|
body: JSON.stringify({
|
|
sdp: offer.sdp,
|
|
type: offer.type,
|
|
webrtc_id: Math.random().toString(36).substring(7)
|
|
})
|
|
});
|
|
|
|
// Handle server response
|
|
const serverResponse = await response.json();
|
|
await peerConnection.setRemoteDescription(serverResponse);
|
|
}
|
|
\`\`\`
|
|
`);
|
|
|
|
// WebSocket template
|
|
const wsTemplate = doT.template(`
|
|
{{? it.modality !== "audio" || it.mode !== "send-receive" }}
|
|
WebSocket connections are currently only supported for audio in send-receive mode.
|
|
{{??}}
|
|
|
|
To connect to the server via WebSocket, you'll need to establish a WebSocket connection and handle audio processing. The code below assumes there is an HTML audio element for output playback.
|
|
|
|
\`\`\`javascript
|
|
// Setup audio context and stream
|
|
const audioContext = new AudioContext();
|
|
const stream = await navigator.mediaDevices.getUserMedia({
|
|
audio: true
|
|
});
|
|
|
|
// Create WebSocket connection
|
|
const ws = new WebSocket(\`\${window.location.protocol === 'https:' ? 'wss:' : 'ws:'}//$\{window.location.host}/websocket/offer\`);
|
|
|
|
ws.onopen = () => {
|
|
// Send initial start message with unique ID
|
|
ws.send(JSON.stringify({
|
|
event: "start",
|
|
websocket_id: generateId() // Implement your own ID generator
|
|
}));
|
|
|
|
// Setup audio processing
|
|
const source = audioContext.createMediaStreamSource(stream);
|
|
const processor = audioContext.createScriptProcessor(2048, 1, 1);
|
|
source.connect(processor);
|
|
processor.connect(audioContext.destination);
|
|
|
|
processor.onaudioprocess = (e) => {
|
|
const inputData = e.inputBuffer.getChannelData(0);
|
|
const mulawData = convertToMulaw(inputData, audioContext.sampleRate);
|
|
const base64Audio = btoa(String.fromCharCode.apply(null, mulawData));
|
|
|
|
if (ws.readyState === WebSocket.OPEN) {
|
|
ws.send(JSON.stringify({
|
|
event: "media",
|
|
media: {
|
|
payload: base64Audio
|
|
}
|
|
}));
|
|
}
|
|
};
|
|
};
|
|
\`\`\`
|
|
{{?}}
|
|
`);
|
|
|
|
function updateDocs() {
|
|
// Get selected values
|
|
const modality = document.getElementById('modality').value;
|
|
const mode = document.getElementById('mode').value;
|
|
const connection = document.getElementById('connection').value;
|
|
|
|
// Context for templates
|
|
const context = {
|
|
modality: modality,
|
|
mode: mode,
|
|
additional_inputs: true,
|
|
additional_outputs: true
|
|
};
|
|
|
|
// Choose template based on connection type
|
|
const template = connection === 'webrtc' ? webrtcTemplate : wsTemplate;
|
|
|
|
// Render docs with syntax highlighting
|
|
const html = template(context);
|
|
const docsDiv = document.getElementById('docs');
|
|
docsDiv.innerHTML = marked.parse(html);
|
|
docsDiv.className = 'rendered-content';
|
|
|
|
// Initialize any code blocks that were just added
|
|
document.querySelectorAll('pre code').forEach((block) => {
|
|
hljs.highlightElement(block);
|
|
});
|
|
}
|
|
|
|
// Initial render
|
|
document.addEventListener('DOMContentLoaded', updateDocs);
|
|
</script>
|
|
|