Example: Audio Transcription
Transcribe an audio clip with Sieve
In this example, we’ll walk through a simple use case of Sieve and learn how to:
- Call an existing Sieve function in Python using the Sieve Python package
- Make a call via the Sieve API and poll for results
Introduction
Audio transcription helps transcribe audio to spoken word, which can be useful for applications like captioning, accessibility, content analysis, and more. Let’s explore using an app that already exists on Sieve for this use case.
Using the Python client
Install the Sieve package and login
Create transcriber script
Create a new Python file named transcriber.py
with the following code:
import sieve
def transcribe_audio():
target_audio = sieve.File(
url="https://storage.googleapis.com/sieve-public-data/assets/dub.m4a")
transcriber = sieve.function.get("sieve/speech_transcriber")
transcription_result = transcriber.run(target_audio)
for i in transcription_result:
print(i)
if __name__ == '__main__':
transcribe_audio()
This Python script:
- Initializes a Sieve typed audio object and passes a URL to it (you can also pass a local path)
- Gets a remote function by reference for audio transcription
- Synchronously executes that remote function using
.run()
- Prints out the transcription
Execute the code
python transcriber.py
Congrats, you just ran your first Sieve app! You should see something similar to this output in your terminal.
[
{
"start": 0.009,
"end": 7.733,
"text": "There were a lot of other businesses that just kind of fell by the wayside because they just couldn't make the adaptation from desktop to mobile computing.",
"words": [
{
"start": 0.009,
"end": 0.109,
"score": 0.069,
"word": "There"
},
...
]
},
{
"start": 7.733,
"end": 9.774,
"text": "I think AI is going to be like that for SaaS.",
"words": [
{
"start": 7.733,
"end": 7.813,
"score": 0.787,
"word": "I"
},
...
]
}
]
By following these steps, we converted audio content to text in just a few lines of code. Just like this app, there are plenty of production-ready apps you can reference in code here.
Using the HTTP API
You can also submit a job via the API by sending a POST request to the /v2/push
endpoint:
curl 'https://mango.sievedata.com/v2/push' \
-X POST
-H 'X-API-Key: <your api key>' \
-H 'Content-Type: application/json' \
--data-raw '{
"function": "sieve/speech_transcriber",
"inputs": {
"file": {
"url": "https://storage.googleapis.com/sieve-public-data/assets/dub.m4a"
}
}
}'
Upon successful submission, you’ll receive a response containing the job_id
and other details:
{
"id":"898ccb04-fce3-4dec-83bd-c7569bfdc30b",
"run_id":"898ccb04-fce3-4dec-83bd-c7569bfdc30b-653a14e7-7a48-4c8c-872b-6715a28aca1f",
"stream_output": true,
"status":"queued",
"organization_id":"c0ec7825-0fcb-444e-b0b9-9660c40fa0f4",
"model_id":"c110b79c-5dce-46f5-9431-0a95dbcce8c9",
}
/v2/jobs/<your-job-id>
endpoint and check the status
field:
curl 'https://mango.sievedata.com/v2/jobs/<your-job-id>' \
-H 'X-API-Key: <your api key>' \
-H 'Content-Type: application/json' \
Your result should look something like:
{
"id": "898ccb04-fce3-4dec-83bd-c7569bfdc30b",
"function_id": "c110b79c-5dce-46f5-9431-0a95dbcce8c9",
"status": "finished",
"created_at": "2023-10-05T22:45:34.238000",
"started_at": "2023-10-05T22:45:34.245000",
"completed_at": "2023-10-05T22:45:36.970000",
...
"outputs": [
{
"type": "list",
"name": null,
"data": [
{
"start": 0.009,
"end": 7.733,
"text": " There were a lot of other businesses that just kind of fell by the wayside because they just couldn't make the adaptation from desktop to mobile computing.",
"words": [
{
"start": 0.009,
"end": 0.109,
"score": 0.069,
"word": "There"
},
...
]
},
{
"start": 7.733,
"end": 9.774,
"text": "I think AI is going to be like that for SaaS.",
"words": [
{
"start": 7.733,
"end": 7.813,
"score": 0.787,
"word": "I"
},
...
]
}
],
"description": null
}
],
...
}
The outputs
field contains a list of output files generated by the job. Each output is described by a type and a URL where the file can be downloaded.
Was this page helpful?