In this example, we will explore a more complex use case of Sieve. We’ll create an application that automatically dubs a video of a person speaking into another language! We’ll take an input video, transcribe it, translate that text to another language, generate text-to-speech of that translated text, and have the original video lipsync to the new audio.

By following this, you will learn how to:

  • Use the Sieve client package to call multiple existing functions in Python
  • Combine these functions to create a customized app that meets our requirements

Introduction

As mentioned above, to create an app that can dub videos, we need several models to work together:

  1. WhisperX: an audio transcription model
  2. SeamlessT2T: a text-to-text translation model
  3. XTTS-V1: a text-to-speech model
  4. Sieve Video Retalker: an optimized version of video retalker for lipsyncing

Building the app from scratch

1

Set up folder and Python file

Create a folder and Python file named video_dubbing.py with the following command:

mkdir video_dubbing
touch video_dubbing/pipeline.py
2

Set up pipeline

Paste the following code into pipeline.py. The higher level logic of this code is as follows:

  1. Extract audio from the video
  2. Transcribe the audio
  3. Translate the transcript
  4. Generate new audio from the translated text
  5. Combine audio and video with our lipsyncer
import sieve

# Helper functions
def extract_audio(source_video: sieve.File):
    import subprocess
    audio_path = 'temp.wav'
    subprocess.run(["ffmpeg", "-i", source_video.path, audio_path, "-y", "-loglevel", "error"])
    return audio_path

def transcript_to_text(transcript):
    text = " ".join([segment["text"] for segment in transcript])
    return text

@sieve.function(
    name="video_dubbing",
    python_packages=[
        "numpy>=1.19.0",
    ],
    system_packages=["ffmpeg"],
)
def video_dubbing(source_video: sieve.File, language: str):
    """
    :param source_video: The video to dub
    :param language: The language to dub the video in
    :return: The dubbed video
    """

    print("extracting audio from video")
    source_audio_path = extract_audio(source_video)
    print("done extracting")
    # check if language is supported
    if language not in ["eng", "spa", "fra", "deu", "ita", "por", "pol", "tur", "rus", "nld", "ces", "arb", "cmn"]:
        raise Exception("Language not supported")

    # use remote Sieve functions
    transcriber = sieve.function.get("sieve/speech_transcriber")
    translator = sieve.function.get("sieve/seamless_text2text")
    tts = sieve.function.get("sieve/tts")
    lipsyncer = sieve.function.get("sieve/video_retalking")

    print("transcribing audio")
    source_audio = sieve.File(path=source_audio_path)
    transcript = list(transcriber.run(source_audio))
    text = transcript_to_text(transcript)
    text_language = "eng"
    print("transcription:", text)

    print("translating text")
    translated_text = translator.run(text, text_language, language)
    print("translated text:", translated_text)

    print("generating tts audio")
    source_audio = sieve.File(path=source_audio_path)
    target_audio = tts.run("elevenlabs-voice-cloning", text=translated_text, reference_audio=source_audio, stability=0.5, style=0.5)
    print("done generating audio")

    print("starting lipsync")
    return lipsyncer.run(source_video, target_audio)

if __name__ == "__main__":
    video = sieve.File(url="https://storage.googleapis.com/mango-public-models/david.mp4")
    dubbed_video = video_dubbing(video, "spa")
    print('dubbed video path: ', dubbed_video.path)
3

Run the pipeline

Make sure you’re in the video_dubbing directory before running the pipeline.
Run the pipeline with the following command. Because we are running our function with video_dubbing.run, where video_dubbing is the local Sieve function, this will deploy the Sieve function and then run a job.

python pipeline.py

You should start seeing some logs streaming in. You can also view the status of this job on the Sieve dashboard. After it has completed running, you’ll see a video file path printed to the console, which has been saved to a temporary directory. You can open this video to see the results.

4

View your dubbed file!

Open your video in your file explorer with the following command:

Or just view the video on the Sieve dashboard!