Files
gradio-webrtc/frontend/shared/VideoChat/helpers/processor.ts
neil.xh f476f9cf29 gs对话接入
本次代码评审新增并完善了gs视频聊天功能,包括前后端接口定义、状态管理及UI组件实现,并引入了新的依赖库以支持更多互动特性。
Link: https://code.alibaba-inc.com/xr-paas/gradio_webrtc/codereview/21273476
* 更新python 部分

* 合并videochat前端部分

* Merge branch 'feature/update-fastrtc-0.0.19' of http://gitlab.alibaba-inc.com/xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 替换audiowave

* 导入路径修改

* 合并websocket mode逻辑

* feat: gaussian avatar chat

* 增加其他渲染的入参

* feat: ws连接和使用

* Merge branch 'feature/update-fastrtc-0.0.19' of http://gitlab.alibaba-inc.com/xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 右边距离超出容器宽度,则向左移动

* 配置传递

* Merge branch 'feature/update-fastrtc-0.0.19' of gitlab.alibaba-inc.com:xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 高斯包异常

* 同步webrtc_utils

* 更新webrtc_utils

* 兼容on_chat_datachannel

* 修复设备名称列表没有正常显示的问题

* copy 传递 webrtc_id

* Merge branch 'feature/update-fastrtc-0.0.19' of gitlab.alibaba-inc.com:xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 保证webrtc 完成后再进行websocket连接

* feat: 音频表情数据接入

* dist 上传

* canvas 隐藏

* feat: 高斯文件下载进度透出

* Merge branch 'feature/update-fastrtc-0.0.19' of http://gitlab.alibaba-inc.com/xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 修改无法获取权限问题

* Merge branch 'feature/update-fastrtc-0.0.19' of gitlab.alibaba-inc.com:xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 先获取权限再获取设备

* fix: gs资源下载完成前不处理ws数据

* fix: merge

* 话术调整

* Merge branch 'feature/update-fastrtc-0.0.19' of gitlab.alibaba-inc.com:xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 修复设备切换后重新对话,又切换回默认设备的问题

* Merge branch 'feature/update-fastrtc-0.0.19' of http://gitlab.alibaba-inc.com/xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 更新localvideo 尺寸

* Merge branch 'feature/update-fastrtc-0.0.19' of gitlab.alibaba-inc.com:xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 不能默认default

* 修改音频权限问题

* 更新打包结果

* fix: 对话按钮状态跟gs资源挂钩,删除无用代码

* fix: merge

* feat: gs渲染模块从npm包引入

* fix

* 新增对话记录

* Merge branch 'feature/update-fastrtc-0.0.19' of http://gitlab.alibaba-inc.com/xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 样式修改

* 更新包

* fix: gs数字人初始化位置和静音

* 对话记录滚到底部

* 至少100%高度

* Merge branch 'feature/update-fastrtc-0.0.19' of gitlab.alibaba-inc.com:xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 略微上移文本框

* 开始连接时清空对话记录

* fix: update gs render npm

* Merge branch 'feature/update-fastrtc-0.0.19' of http://gitlab.alibaba-inc.com/xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 逻辑保证

* Merge branch 'feature/update-fastrtc-0.0.19' of gitlab.alibaba-inc.com:xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* feat: 音频初始化配置是否静音

* actionsbar在有字幕时调整位置

* Merge branch 'feature/update-fastrtc-0.0.19' of http://gitlab.alibaba-inc.com/xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 样式优化

* feat: 增加readme

* fix: 资源图片

* fix: docs

* fix: update gs render sdk

* fix: gs模式下画面位置计算

* fix: update readme

* 设备判断,太窄处理

* Merge branch 'feature/update-fastrtc-0.0.19' of gitlab.alibaba-inc.com:xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* 是否有权限和是否有设备分开

* feat: gs 下载和加载钩子函数分离

* Merge branch 'feature/update-fastrtc-0.0.19' of http://gitlab.alibaba-inc.com/xr-paas/gradio_webrtc into feature/update-fastrtc-0.0.19

* fix: update gs render sdk

* 替换

* dist

* 上传文件

* del
2025-04-16 19:09:04 +08:00

611 lines
19 KiB
TypeScript
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

import EventEmitter from "eventemitter3";
import PQueue from "p-queue";
import { mergeBlob, unpack } from "../binary_utils";
import {
EventTypes,
PlayerEventTypes,
ProcessorEventTypes,
} from "../interface/eventType";
import { Player } from "./player";
export type IPayload = Record<string, string | number | object | Blob>;
interface IDataRecords {
channel_names?: string[];
data_id: number;
data_offset: number;
data_type: string;
sample_rate: number;
shape: number[];
}
interface IEvent {
avatar_status?: string;
event_type: string;
speech_id: string;
}
interface IParsedData {
batch_id?: number;
batch_name?: string;
data_records: Record<string, IDataRecords>;
end_of_batch: boolean;
events: IEvent[];
}
interface IAvatarMotionData {
// 数据大小,首包存在该值
binary_size: number;
// 是否首包
first_package: boolean;
// 数据分片,非首包存在该值
motion_data_slice?: Blob;
// 分片数量,首包存在该值
segment_num?: number;
// 分片索引,非首包存在该值
slice_index?: number;
// 是否使用二进制帧,首包存在该值
use_binary_frame?: boolean;
// 初始化的音频是否静音
is_audio_mute?: boolean;
}
interface IAvatarMotionGroupBase {
arkitFaceArrayBufferArray?: ArrayBuffer[];
batch_id?: number;
batch_name?: string;
binSize?: number;
jsonSize?: number;
merged_motion_data: Uint8Array;
motion_data_slices: Blob[];
player?: Player;
tts2faceArrayBufferArray?: ArrayBuffer[];
}
interface IAvatarMotionGroup extends IAvatarMotionGroupBase {
binary_size: number;
first_package: boolean;
segment_num?: number;
use_binary_frame?: boolean;
}
const InputCodecs: Record<string, "Int8" | "Int16" | "Int32" | "Float32"> = {
int16: "Int16",
int32: "Int32",
float32: "Float32",
};
const TypedArrays: Record<
string,
typeof Int16Array | typeof Int32Array | typeof Float32Array
> = {
int16: Int16Array,
int32: Int32Array,
float32: Float32Array,
};
export class Processor {
private ee: EventEmitter;
private _motionDataGroupHandlerQueue = new PQueue({
concurrency: 1,
});
private _motionDataGroups: IAvatarMotionGroup[] = [];
private _arkit_face_sample_rate?: number;
private _arkit_face_channel_names?: string[];
private _tts2face_sample_rate?: number;
private _tts2face_channel_names?: string[];
private _maxBatchId?: number;
private _arkitFaceShape?: number;
private _tts2FaceShape?: number;
constructor(ee: EventEmitter) {
this.ee = ee;
}
add(payload: IPayload) {
const { avatar_motion_data } = payload;
this._motionDataGroupHandlerQueue.add(
async () =>
await this._motionDataGroupHandler(
avatar_motion_data as IAvatarMotionData,
),
);
}
clear() {
this._motionDataGroups.forEach((group) => {
group.player?.destroy();
});
this._motionDataGroups = [];
}
setMute(isMute: boolean) {
this._motionDataGroups.forEach((group) => {
group.player?.setMute(isMute);
});
}
getArkitFaceFrame() {
return {
arkitFace: this._getArkitFaceFrame(),
};
}
getLastBatchId() {
let batch_id = undefined;
this._motionDataGroups.forEach((group) => {
if (group.batch_id) {
batch_id = group.batch_id;
}
});
return batch_id;
}
getTtt2FaceFrame() {
return {
tts2Face: this._getTts2FaceFrame(),
};
}
interrupt() {
this._motionDataGroups.forEach((group) => {
if (group.batch_id) {
this._maxBatchId = group.batch_id;
}
group.player?.destroy();
});
this._motionDataGroups = [];
}
private _getArkitFaceFrame() {
if (!this._motionDataGroups.length) {
return null;
}
const targetMotion = this._motionDataGroups.find(
(_motion) => _motion.player,
);
if (!targetMotion) {
return null;
}
const { arkitFaceArrayBufferArray, player } = targetMotion!;
if (
player &&
player._firstStartAbsoluteTime &&
arkitFaceArrayBufferArray &&
arkitFaceArrayBufferArray.length > 0 &&
this._arkitFaceShape &&
this._arkit_face_sample_rate
) {
const offsetTime = Date.now() - player._firstStartAbsoluteTime;
let lastIndex = 0;
let firstSampleStartTime: number;
player.samplesList.forEach((item, index) => {
if (
firstSampleStartTime === undefined &&
item.startTime !== undefined
) {
firstSampleStartTime = item.startTime;
}
if (
item.startTime !== undefined &&
item.startTime - firstSampleStartTime <= offsetTime / 1000
) {
lastIndex = index;
}
});
const samples = player.samplesList[lastIndex];
const subOffsetTime = offsetTime - samples.startTime! * 1000;
const offset = Math.floor(
(subOffsetTime / 1000) * this._arkit_face_sample_rate,
);
const arkitFaceFloat32ArrayArray = new Float32Array(
arkitFaceArrayBufferArray[lastIndex],
);
const subData = arkitFaceFloat32ArrayArray?.slice(
offset * this._arkitFaceShape,
offset * this._arkitFaceShape + this._arkitFaceShape,
);
if (subData?.length) {
const result = {};
const channelNames = this._arkit_face_channel_names || [];
channelNames.forEach((channelName, index) => {
Object.assign(result, {
[channelName]: subData[index],
});
});
return result;
}
return null;
}
return null;
}
private _getTts2FaceFrame() {
if (!this._motionDataGroups.length) {
return null;
}
const targetMotion = this._motionDataGroups.find(
(_motion) => _motion.player,
);
if (!targetMotion) {
return null;
}
const { tts2faceArrayBufferArray, player } = targetMotion!;
if (
player &&
player._firstStartAbsoluteTime &&
tts2faceArrayBufferArray &&
tts2faceArrayBufferArray.length > 0 &&
this._tts2FaceShape &&
this._tts2face_sample_rate
) {
const offsetTime = Date.now() - player._firstStartAbsoluteTime;
let lastIndex = 0;
let firstSampleStartTime: number;
player.samplesList.forEach((item, index) => {
if (
firstSampleStartTime === undefined &&
item.startTime !== undefined
) {
firstSampleStartTime = item.startTime;
}
if (
item.startTime !== undefined &&
item.startTime - firstSampleStartTime <= offsetTime / 1000
) {
lastIndex = index;
}
});
const samples = player.samplesList[lastIndex];
const subOffsetTime = offsetTime - samples.startTime! * 1000;
const offset = Math.floor(
(subOffsetTime / 1000) * this._tts2face_sample_rate,
);
const arkitFaceFloat32ArrayArray = new Float32Array(
tts2faceArrayBufferArray[lastIndex],
);
const subData = arkitFaceFloat32ArrayArray?.slice(
offset * this._tts2FaceShape,
offset * this._tts2FaceShape + this._tts2FaceShape,
);
if (subData?.length) {
return subData;
}
return null;
}
return null;
}
private async _motionDataGroupHandler(avatar_motion_data: IAvatarMotionData) {
try {
const {
first_package,
motion_data_slice,
segment_num,
binary_size,
use_binary_frame,
is_audio_mute
} = avatar_motion_data;
if (first_package) {
const lastMotionGroup =
this._motionDataGroups[this._motionDataGroups.length - 1];
if (lastMotionGroup) {
// 检测上一大片数量是否丢包
if (
lastMotionGroup.segment_num !==
lastMotionGroup.motion_data_slices.length
) {
// 丢包触发错误
this.ee.emit(EventTypes.ErrorReceived, 'lost data packets');
}
}
this._motionDataGroups.push({
first_package,
binary_size,
segment_num,
use_binary_frame,
motion_data_slices: [],
merged_motion_data: new Uint8Array(binary_size),
});
} else {
if (this._motionDataGroups.length === 0) {
return;
}
if (!motion_data_slice) {
return;
}
const lastMotionGroup =
this._motionDataGroups[this._motionDataGroups.length - 1];
const prevMotionGroup =
this._motionDataGroups[this._motionDataGroups.length - 2];
lastMotionGroup.motion_data_slices.push(motion_data_slice);
if (
lastMotionGroup.motion_data_slices.length ===
lastMotionGroup.segment_num
) {
// 单段不分小片段的情况不需要mergeBlob为了兼容后续逻辑这里直接赋值
const blob = lastMotionGroup.motion_data_slices[0]
// const blob = mergeBlob(
// lastMotionGroup.motion_data_slices,
// lastMotionGroup.merged_motion_data,
// );
const { parsedData, jsonSize, binSize } = await unpack(blob);
lastMotionGroup.jsonSize = jsonSize;
lastMotionGroup.binSize = binSize;
const bin = blob.slice(12 + lastMotionGroup.jsonSize!);
if (bin.size !== lastMotionGroup.binSize) {
this.ee.emit(ProcessorEventTypes.Chat_BinsizeError);
}
const batchCheckResult = this._connectBatch(
parsedData,
lastMotionGroup,
prevMotionGroup,
);
if (!batchCheckResult) {
return;
}
await this._handleArkitFaceConfig(
parsedData,
lastMotionGroup,
prevMotionGroup,
bin,
);
// await this._handletts2faceConfig(
// parsedData,
// lastMotionGroup,
// prevMotionGroup,
// bin,
// );
await this._handleAudioConfig(
parsedData,
lastMotionGroup,
prevMotionGroup,
bin,
is_audio_mute || false
);
this._handleEvents(parsedData);
}
}
} catch (err: unknown) {
console.error('err', err)
this.ee.emit(EventTypes.ErrorReceived, (err as Error).message);
}
}
private async _handleAudioConfig(
parsedData: IParsedData,
lastMotionGroup: IAvatarMotionGroup,
prevMotionGroup: IAvatarMotionGroup,
bin: Blob,
isPlayerMute: boolean
) {
const { data_records = {}, end_of_batch } = parsedData;
const { audio } = data_records;
if (audio) {
const { sample_rate, shape, data_offset, data_type } = audio;
const inputCodec = InputCodecs[data_type];
const targetTypedArray = TypedArrays[data_type];
if (lastMotionGroup.player === undefined) {
if (
prevMotionGroup &&
prevMotionGroup.player &&
prevMotionGroup.batch_id === lastMotionGroup.batch_id
) {
lastMotionGroup.player = prevMotionGroup.player;
} else if (sample_rate) {
lastMotionGroup.player = new Player(
{
inputCodec,
channels: 1,
sampleRate: sample_rate,
fftSize: 1024,
isMute: isPlayerMute,
onended: (option) => {
if (!option) {
return;
}
const {
end_of_batch: innerEndOfBatch,
lastMotionGroup: innerLastMotion,
} = option;
if (innerEndOfBatch) {
const { batch_id, player } =
innerLastMotion as IAvatarMotionGroup;
this.ee.emit(PlayerEventTypes.Player_EndSpeaking, player);
this._motionDataGroups = this._motionDataGroups.filter(
(item) => item.batch_id! > batch_id!,
);
if (
this._motionDataGroups.length &&
this._motionDataGroups[0].player
) {
this._motionDataGroups[0].player.updateAutoPlay(true);
} else {
this.ee.emit(PlayerEventTypes.Player_NoLegacy);
}
}
},
},
this.ee,
);
}
if (end_of_batch) {
const originEnded = lastMotionGroup.player!.option.onended;
lastMotionGroup.player!.option.onended = () => {
originEnded({
end_of_batch,
lastMotionGroup,
});
};
}
}
const shapeLength = shape.reduce(
(acc: number, cur: number) => acc * cur,
inputCodec === "Int16" ? 2 : 4,
);
const audioBlobSliceStart = data_offset;
const audioBlobSliceEnd = data_offset + shapeLength;
const audioBlob = bin.slice(audioBlobSliceStart, audioBlobSliceEnd);
const audioArrayBuffer = await audioBlob.arrayBuffer();
// 如果前一段还没播放结束,后一段已接收到,那么后一段则不能自动播放
const prevHasPlayerMotionDataGroup = this._motionDataGroups.find(
(item) => item.player,
);
if (
this._motionDataGroups.length &&
lastMotionGroup.player &&
prevHasPlayerMotionDataGroup &&
prevHasPlayerMotionDataGroup.player !== lastMotionGroup.player
) {
lastMotionGroup.player.autoPlay = false;
}
if (lastMotionGroup.player) {
lastMotionGroup.player.feed({
audio: new targetTypedArray(audioArrayBuffer),
end_of_batch,
});
}
} else {
// 特殊事件motion挂上这个
if (
prevMotionGroup &&
prevMotionGroup.player &&
lastMotionGroup.batch_id === prevMotionGroup.batch_id
) {
lastMotionGroup.player = prevMotionGroup.player;
}
}
}
private async _handleArkitFaceConfig(
parsedData: IParsedData,
lastMotionGroup: IAvatarMotionGroup,
prevMotionGroup: IAvatarMotionGroup,
bin: Blob,
) {
const { data_records = {} } = parsedData;
const { arkit_face } = data_records;
if (arkit_face) {
const { channel_names, shape, data_offset, sample_rate } =
arkit_face as IDataRecords;
if (channel_names && !this._arkit_face_channel_names) {
this._arkit_face_channel_names = channel_names;
this._arkit_face_sample_rate = sample_rate;
}
if (lastMotionGroup.arkitFaceArrayBufferArray === undefined) {
if (
prevMotionGroup &&
prevMotionGroup.arkitFaceArrayBufferArray &&
prevMotionGroup.batch_id === lastMotionGroup.batch_id
) {
lastMotionGroup.arkitFaceArrayBufferArray =
prevMotionGroup.arkitFaceArrayBufferArray;
} else {
lastMotionGroup.arkitFaceArrayBufferArray = [];
}
const shapeLength = shape.reduce(
(acc: number, cur: number) => acc * cur,
4,
);
this._arkitFaceShape = shape[1];
const arkitFaceBlob = bin.slice(data_offset, data_offset + shapeLength);
const arkitFaceArrayBuffer = await arkitFaceBlob.arrayBuffer();
lastMotionGroup.arkitFaceArrayBufferArray.push(arkitFaceArrayBuffer);
}
} else {
if (
prevMotionGroup &&
prevMotionGroup.arkitFaceArrayBufferArray &&
lastMotionGroup.batch_id === prevMotionGroup.batch_id
) {
lastMotionGroup.arkitFaceArrayBufferArray =
prevMotionGroup.arkitFaceArrayBufferArray;
}
}
}
private async _handletts2faceConfig(
parsedData: IParsedData,
lastMotionGroup: IAvatarMotionGroup,
prevMotionGroup: IAvatarMotionGroup,
bin: Blob,
) {
const { data_records = {} } = parsedData;
const { tts2face } = data_records;
if (tts2face) {
const { channel_names, shape, data_offset, sample_rate } =
tts2face as IDataRecords;
if (channel_names && !this._tts2face_channel_names) {
this._tts2face_channel_names = channel_names;
this._tts2face_sample_rate = sample_rate;
}
if (lastMotionGroup.tts2faceArrayBufferArray === undefined) {
if (
prevMotionGroup &&
prevMotionGroup.tts2faceArrayBufferArray &&
prevMotionGroup.batch_id === lastMotionGroup.batch_id
) {
lastMotionGroup.tts2faceArrayBufferArray =
prevMotionGroup.tts2faceArrayBufferArray;
} else {
lastMotionGroup.tts2faceArrayBufferArray = [];
}
const shapeLength = shape.reduce(
(acc: number, cur: number) => acc * cur,
4,
);
this._tts2FaceShape = shape[1];
const tts2faceBlob = bin.slice(data_offset, data_offset + shapeLength);
const tts2faceArrayBuffer = await tts2faceBlob.arrayBuffer();
lastMotionGroup.tts2faceArrayBufferArray.push(tts2faceArrayBuffer);
}
} else {
if (
prevMotionGroup &&
prevMotionGroup.tts2faceArrayBufferArray &&
lastMotionGroup.batch_id === prevMotionGroup.batch_id
) {
lastMotionGroup.tts2faceArrayBufferArray =
prevMotionGroup.tts2faceArrayBufferArray;
}
}
}
private _handleEvents(parsedData: IParsedData) {
const { events } = parsedData;
if (events && events.length) {
events.forEach((e) => {
switch (e.event_type) {
case "interrupt_speech":
// console.log('HandleEvents: interrupt_speech')
break;
case "change_status":
// console.log('HandleEvents: change_status')
this.ee.emit(ProcessorEventTypes.Change_Status, e);
break;
default:
break;
}
});
}
}
private _connectBatch(
parsedData: IParsedData,
lastMotionGroup: IAvatarMotionGroup,
prevMotionGroup: IAvatarMotionGroup,
) {
let batchCheckResult = true;
// 处理二进制batch_id
if (parsedData.batch_id && lastMotionGroup.batch_id === undefined) {
lastMotionGroup.batch_id = parsedData.batch_id;
}
// 特殊事件motion如果没有batch_id也可挂上此batch_id
if (
!lastMotionGroup.batch_id &&
prevMotionGroup &&
prevMotionGroup.batch_id
) {
lastMotionGroup.batch_id = prevMotionGroup.batch_id;
}
// 特殊事件motion如果没有batch_name也可挂上此batch_name
if (parsedData.batch_name && lastMotionGroup.batch_name === undefined) {
lastMotionGroup.batch_name = parsedData.batch_name;
}
// 处理打断后如果仍接收到上一个batch的motionData, 那么重新销毁
if (
this._maxBatchId &&
lastMotionGroup.batch_id &&
lastMotionGroup.batch_id <= this._maxBatchId
) {
this.clear();
batchCheckResult = false;
}
return batchCheckResult;
}
}