- Use the Sieve client package to call multiple existing functions in Python
- Combine these functions to create a customized app that meets our requirements
Introduction
As mentioned above, to create an app that can dub videos, we need several models to work together:- WhisperX: an audio transcription model
- SeamlessT2T: a text-to-text translation model
- XTTS-V1: a text-to-speech model
- Sieve Video Retalker: an optimized version of video retalker for lipsyncing
Building the app from scratch
1
Set up folder and Python file
Create a folder and Python file named
video_dubbing.py
with the following command:2
Set up pipeline
Paste the following code into
pipeline.py
. The higher level logic of this code is as follows:- Extract audio from the video
- Transcribe the audio
- Translate the transcript
- Generate new audio from the translated text
- Combine audio and video with our lipsyncer
3
Run the pipeline
Make sure you’re in the
video_dubbing
directory before running the pipeline.video_dubbing.run
, where video_dubbing
is the local Sieve function, this will deploy the Sieve function and then run a job.4
View your dubbed file!
Open your video in your file explorer with the following command:Or just view the video on the Sieve dashboard!