Introduction

A common usecase for transcription is transcribing RTMP or HLS livestreams. Let’s walk through deploying a custom function to Sieve that can ingest a livestream and pass it into other functions using Sieve’s function calling capabilities for the purposes of transcription & translation.

Clone repo and run the example

1

Login with Sieve client

To get started, make sure you’re logged in via the Sieve Python client:

sieve login
2

Clone example

Clone the Sieve examples repository and navigate into the audio_transcription directory.

git clone https://github.com/sieve-community/examples
cd examples/audio_transcription
3

Deploy custom function

The code we’ll be deploying can be found at examples/audio_transcription/live_transcriber.py.

This is an example of a custom function to Sieve that can ingest a livestream and pass it into other functions using Sieve’s function calling capabilities.

At a high level, this module does the following:

  1. Imports necessary libraries and modules.
  2. Loads the sieve/whisperx model and sieve/seamless_text2text model for translation.
  3. Takes in audio from ffmpeg at a specified chunk size, splits by the last silence in the audio to prevent splitting words.
  4. Runs transcription and translation if needed and yields out results with its start and end time.

Let’s deploy our function!

sieve deploy live_transcriber.py

Replicas of the live transcriber automatically scale to zero when there is no traffic, so be sure to check out the autoscaling guide to learn more about how to configure autoscaling for your custom functions if you want to keep them running and ready to process requests. You may otherwise notice a slight delay when you first call the function as the replicas are scaled up.

4

Try it out!

Let’s now use our function to transcribe a live audio stream in Python. Make sure to replace {your_org} with your actual organization name. You can find your organization name in the dashboard settings.

import sieve

url = "http://stream.live.vc.bbcmedia.co.uk/bbc_world_service" # Public BBC service
language = "en" # If this language is different from the stream, translation will happen automatically

live_speech_transcriber = sieve.function.get("sieve-internal/live_speech_transcriber")
output = live_speech_transcriber.push(url, target_language=language)

print(f"Started job with id {output.job['id']}")

print("stream ingestor initializing (please set a replica if you'd like for this to be instant)...")
for segment in output.result():
    print(segment)

The segment output will print as it’s being transcribed. You may also push a job via the REST API here and later poll for outputs with this endpoint.

This is an example of a job that will not cancel automatically as it’s processing a stream! To cancel the job once you’re done, you may use the cancel API linked here. You may also cancel the job via the dashboard.

Streaming VTT captions

If you’d like to stream VTT captions directly to your video player, please follow our demo here.