metadata
license: mit
language:
- en
- zh
base_model:
- Wan-AI/Wan2.2-S2V-14B
pipeline_tag: any-to-any
RealVideo
RealVideo is a WebSocket-based video calling system that supports text input. It leverages GLM-4.5-AirX and GLM-TTS models to generate audio responses and utilizes autoregressive diffusion to generate corresponding video frames. The system features a modular design with full functionality and a clean code structure. Visit blog here!
Features
- Text Input: Supports text message input.
- AI Voice Response: Integrates GLM-4.5-AirX and GLM-TTS models to generate voice responses.
- Lip Sync: Generates real-time conversational video based on any input image and audio.
- Real-time Communication: WebSocket-based real-time bidirectional communication.
Quick Start
you can check in our GitHub.
Technical Highlights
- Model Integration: Allows for convenient and quick voice cloning, taking text input to generate audio output.
- Modular Design: Clear code structure, easy to maintain and extend.
- Real-time Performance: Optimized audio processing and real-time video generation algorithms.
Acknowledgements
This project utilizes the following open-source libraries: