Example: Video Dubbing
Build a video dubbing Sieve app
In this example, we will explore a more complex use case of Sieve. We’ll create an application that automatically dubs a video of a person speaking into another language! We’ll take an input video, transcribe it, translate that text to another language, generate text-to-speech of that translated text, and have the original video lipsync to the new audio.
By following this, you will learn how to:
- Use the Sieve client package to call multiple existing functions in Python
- Combine these functions to create a customized app that meets our requirements
Introduction
As mentioned above, to create an app that can dub videos, we need several models to work together:
- WhisperX: an audio transcription model
- SeamlessT2T: a text-to-text translation model
- XTTS-V1: a text-to-speech model
- Sieve Video Retalker: an optimized version of video retalker for lipsyncing
Building the app from scratch
Set up folder and Python file
Create a folder and Python file named video_dubbing.py
with the following command:
Set up pipeline
Paste the following code into pipeline.py
. The higher level logic of this code is as follows:
- Extract audio from the video
- Transcribe the audio
- Translate the transcript
- Generate new audio from the translated text
- Combine audio and video with our lipsyncer
Run the pipeline
video_dubbing
directory before running the pipeline.video_dubbing.run
, where video_dubbing
is the local Sieve function, this will deploy the Sieve function and then run a job.You should start seeing some logs streaming in. You can also view the status of this job on the Sieve dashboard. After it has completed running, you’ll see a video file path printed to the console, which has been saved to a temporary directory. You can open this video to see the results.
View your dubbed file!
Open your video in your file explorer with the following command:
Or just view the video on the Sieve dashboard!