Example: Video Dubbing
In this example, we will explore a more complex use case of Sieve. We’ll create an application that automatically dubs a video of a person speaking into another language! We’ll take an input video, transcribe it, translate that text to another language, generate text-to-speech of that translated text, and have the original video lipsync to the new audio.
By following this, you will learn how to:
- Use the Sieve client package to call multiple existing functions in pure Python
- Combine these functions to create a customized app that meets your requirements
Introduction
As mentioned above, to create an app that can dub videos, we need several models to work together:
- WhisperX: an audio transcription model
- SeamlessT2T: a text-to-text translation model
- XTTS-V1: a text-to-speech model
- Sieve Video Retalker: an optimized version of video retalker for lipsyncing
Using the python client
Ensure that the Sieve python package is installed on your system by executing the following command.
pip show sievedata
If installed successfully, you will see an output detailing the package’s version, summary, dependencies, etc. If it isn’t, please follow installation instructions linked here.
Building the app from scratch
Create a new Python file named video_dubbing.py
with the following command:
touch video_dubbing.py
Then, paste the following code into video_dubbing.py
. This function uses other models that have already been deployed to Sieve that you can run almost like you’re running them locally. The function calls that use .run()
, such as transcriber.run(source_audio)
are executed on the Sieve cloud.
The higher level logic of this code is as follows:
- Extract audio from the video
- Transcribe the audio
- Translate the transcript
- Generate new audio from the translated text
- Combine audio and video with our lipsyncer
import sieve
def extract_audio(source_video: sieve.Video):
import subprocess
audio_path = 'temp.wav'
subprocess.run(["ffmpeg", "-i", source_video.path, audio_path, "-y"])
return sieve.Audio(path=audio_path)
def transcript_to_text(transcript):
text = " ".join([segment["text"] for segment in transcript])
return text
@sieve.function(
name="video-dubber",
python_packages=[
"numpy>=1.19.0",
],
system_packages=["ffmpeg"],
)
def video_dubbing(source_video: sieve.Video, language: str):
"""
:param source_video: The video to dub
:param language: The language to dub the video in
:return: The dubbed video
"""
print("extracting audio from video")
# Extract audio from video
source_audio = extract_audio(source_video)
print("done extracting")
# check if language is supported
if language not in ["english", "spanish", "french", "german", "italian", "portuguese", "polish", "turkish", "russian", "dutch", "czech", "arabic", "chinese"]:
raise Exception("Language not supported")
# use remote sieve models
transcriber = sieve.function.get("sieve/speech_transcriber")
translator = sieve.function.get("sieve/seamless_text2text")
tts = sieve.function.get("sieve/xtts")
lipsyncer = sieve.function.get("sieve/video_retalking")
print("transcribing audio")
# transcribe audio
transcript = list(transcriber.run(source_audio))
text = transcript_to_text(transcript)
text_language = "english"
print("transcription:", text)
print("translating text")
# Translate text
translated_text = translator.run(text, text_language, language)
print("translated text:", translated_text)
print("generating tts audio")
# Generate new audio from translated text
target_audio = tts.run(translated_text, source_audio, stability=0.5, similarity_boost=0.5)
print("done generating audio")
print("starting lipsync")
# Combine audio and video with Retalker
return lipsyncer.run(source_video, target_audio)
if __name__ == "__main__":
video = sieve.Video(url="https://storage.googleapis.com/mango-public-models/david.mp4")
dubbed_video = video_dubbing.run(video, "spanish")
print('dubbed video path: ', dubbed_video.path)
Now let’s run the code. This will deploy your function to Sieve and call it with the video we provided.
python video_dubbing.py
You should start seeing some logs streaming in. You can also view the status of this job on the Sieve dashboard. After it has completed running, you’ll see a video file path printed to the console, which has been saved to a temporary directory. You can open this video to see the results.
On MacOS, you can open the video with the following command:
open <video_path>
On Windows, you can open the video with the following command:
start <video_path>
Or just view the video on the Sieve dashboard!