In this example, we will explore a more complex use case of Sieve. We’ll create an application that automatically dubs a video of a person speaking into another language! We’ll take an input video, transcribe it, translate that text to another language, generate text-to-speech of that translated text, and have the original video lipsync to the new audio.

By following this, you will learn how to:

  • Use the Sieve client package to call multiple existing functions in Python
  • Combine these functions to create a customized app that meets our requirements


As mentioned above, to create an app that can dub videos, we need several models to work together:

  1. WhisperX: an audio transcription model
  2. SeamlessT2T: a text-to-text translation model
  3. XTTS-V1: a text-to-speech model
  4. Sieve Video Retalker: an optimized version of video retalker for lipsyncing

Building the app from scratch


Set up folder and Python file

Create a folder and Python file named with the following command:

mkdir video_dubbing
touch video_dubbing/

Set up pipeline

Paste the following code into The higher level logic of this code is as follows:

  1. Extract audio from the video
  2. Transcribe the audio
  3. Translate the transcript
  4. Generate new audio from the translated text
  5. Combine audio and video with our lipsyncer
import sieve

# Helper functions
def extract_audio(source_video: sieve.Video):
    import subprocess
    audio_path = 'temp.wav'["ffmpeg", "-i", source_video.path, audio_path, "-y"])
    return sieve.Audio(path=audio_path)

def transcript_to_text(transcript):
    text = " ".join([segment["text"] for segment in transcript])
    return text

def video_dubbing(source_video: sieve.Video, language: str):
    :param source_video: The video to dub
    :param language: The language to dub the video in
    :return: The dubbed video

    print("extracting audio from video")
    source_audio = extract_audio(source_video)
    print("done extracting")

    # check if language is supported
    if language not in ["english", "spanish", "french", "german", "italian", "portuguese", "polish", "turkish", "russian", "dutch", "czech", "arabic", "chinese"]:
        raise Exception("Language not supported")

    # use remote Sieve functions
    transcriber = sieve.function.get("sieve/speech_transcriber")
    translator = sieve.function.get("sieve/seamless_text2text")
    tts = sieve.function.get("sieve/xtts")
    lipsyncer = sieve.function.get("sieve/video_retalking")

    print("transcribing audio")
    transcript = list(
    text = transcript_to_text(transcript)
    text_language = "english"
    print("transcription:", text)

    print("translating text")
    translated_text =, text_language, language)
    print("translated text:", translated_text)

    print("generating tts audio")
    target_audio =, source_audio, stability=0.5, similarity_boost=0.5)
    print("done generating audio")

    print("starting lipsync")
    return, target_audio)

if __name__ == "__main__":
    video = sieve.Video(url="")
    dubbed_video =, "spanish")
    print('dubbed video path: ', dubbed_video.path)

Run the pipeline

Make sure you’re in the video_dubbing directory before running the pipeline.
Run the pipeline with the following command. Because we are running our function with, where video_dubbing is the local Sieve function, this will deploy the Sieve function and then run a job.


You should start seeing some logs streaming in. You can also view the status of this job on the Sieve dashboard. After it has completed running, you’ll see a video file path printed to the console, which has been saved to a temporary directory. You can open this video to see the results.


View your dubbed file!

Open your video in your file explorer with the following command:

open <video_path>

Or just view the video on the Sieve dashboard!