In this example, we will explore a more complex use case of Sieve. We’ll create an application that automatically dubs a video of a person speaking into another language! We’ll take an input video, transcribe it, translate that text to another language, generate text-to-speech of that translated text, and have the original video lipsync to the new audio.
By following this, you will learn how to:
- Use the Sieve client package to call multiple existing functions in pure Python
- Combine these functions to create a customized app that meets your requirements
As mentioned above, to create an app that can dub videos, we need several models to work together:
- WhisperX: an audio transcription model
- SeamlessT2T: a text-to-text translation model
- XTTS-V1: a text-to-speech model
- Sieve Video Retalker: an optimized version of video retalker for lipsyncing
Using the python client
Ensure that the Sieve python package is installed on your system by executing the following command.
pip show sievedata
If installed successfully, you will see an output detailing the package’s version, summary, dependencies, etc. If it isn’t, please follow installation instructions linked here.
Building the app from scratch
Create a new Python file named
video_dubbing.py with the following command:
Then, paste the following code into
video_dubbing.py. This function uses other models that have already been deployed to Sieve that you can run almost like you’re running them locally. The function calls that use
.run(), such as
transcriber.run(source_audio) are executed on the Sieve cloud.
The higher level logic of this code is as follows:
- Extract audio from the video
- Transcribe the audio
- Translate the transcript
- Generate new audio from the translated text
- Combine audio and video with our lipsyncer
import sieve def extract_audio(source_video: sieve.Video): import subprocess audio_path = 'temp.wav' subprocess.run(["ffmpeg", "-i", source_video.path, audio_path, "-y"]) return sieve.Audio(path=audio_path) def transcript_to_text(transcript): text = " ".join([segment["text"] for segment in transcript]) return text @sieve.function( name="video-dubber", python_packages=[ "numpy>=1.19.0", ], system_packages=["ffmpeg"], ) def video_dubbing(source_video: sieve.Video, language: str): """ :param source_video: The video to dub :param language: The language to dub the video in :return: The dubbed video """ print("extracting audio from video") # Extract audio from video source_audio = extract_audio(source_video) print("done extracting") # check if language is supported if language not in ["english", "spanish", "french", "german", "italian", "portuguese", "polish", "turkish", "russian", "dutch", "czech", "arabic", "chinese"]: raise Exception("Language not supported") # use remote sieve models transcriber = sieve.function.get("sieve/speech_transcriber") translator = sieve.function.get("sieve/seamless_text2text") tts = sieve.function.get("sieve/xtts") lipsyncer = sieve.function.get("sieve/video_retalking") print("transcribing audio") # transcribe audio transcript = list(transcriber.run(source_audio)) text = transcript_to_text(transcript) text_language = "english" print("transcription:", text) print("translating text") # Translate text translated_text = translator.run(text, text_language, language) print("translated text:", translated_text) print("generating tts audio") # Generate new audio from translated text target_audio = tts.run(translated_text, source_audio, stability=0.5, similarity_boost=0.5) print("done generating audio") print("starting lipsync") # Combine audio and video with Retalker return lipsyncer.run(source_video, target_audio) if __name__ == "__main__": video = sieve.Video(url="https://storage.googleapis.com/mango-public-models/david.mp4") dubbed_video = video_dubbing.run(video, "spanish") print('dubbed video path: ', dubbed_video.path)
Now let’s run the code. This will deploy your function to Sieve and call it with the video we provided.
You should start seeing some logs streaming in. You can also view the status of this job on the Sieve dashboard. After it has completed running, you’ll see a video file path printed to the console, which has been saved to a temporary directory. You can open this video to see the results.
On MacOS, you can open the video with the following command:
On Windows, you can open the video with the following command:
Or just view the video on the Sieve dashboard!