In this example, we’ll walk through a simple use case of Sieve and learn how to:

  • Call an existing Sieve function in Python using the Sieve Python package
  • Make a call via the Sieve API and poll for results

Introduction

Audio transcription helps transcribe audio to spoken word, which can be useful for applications like captioning, accessibility, content analysis, and more. Let’s explore using an app that already exists on Sieve for this use case.

Using the Python client

1

Install the Sieve package and login

pip install sievedata

Find your API key on the Sieve dashboard and login on the CLI.

sieve login
2

Create transcriber script

Create a new Python file named transcriber.py with the following code:

import sieve

def transcribe_audio():
    target_audio = sieve.File(
        url="https://storage.googleapis.com/sieve-public-data/assets/dub.m4a")

    transcriber = sieve.function.get("sieve/speech_transcriber")
    transcription_result = transcriber.run(target_audio)
    for i in transcription_result:
        print(i)

if __name__ == '__main__':
    transcribe_audio()

This Python script:

  • Initializes a Sieve typed audio object and passes a URL to it (you can also pass a local path)
  • Gets a remote function by reference for audio transcription
  • Synchronously executes that remote function using .run()
  • Prints out the transcription
3

Execute the code

python transcriber.py

Congrats, you just ran your first Sieve app! You should see something similar to this output in your terminal.

[
    {
        "start": 0.009,
        "end": 7.733,
        "text": "There were a lot of other businesses that just kind of fell by the wayside because they just couldn't make the adaptation from desktop to mobile computing.",
        "words": [
            {
                "start": 0.009,
                "end": 0.109,
                "score": 0.069,
                "word": "There"
            },
            ...
        ]
    },
    {
        "start": 7.733,
        "end": 9.774,
        "text": "I think AI is going to be like that for SaaS.",
        "words": [
            {
                "start": 7.733,
                "end": 7.813,
                "score": 0.787,
                "word": "I"
            },
            ...
        ]
    }
]

By following these steps, we converted audio content to text in just a few lines of code. Just like this app, there are plenty of production-ready apps you can reference in code here.

Using the HTTP API

You can also submit a job via the API by sending a POST request to the /v2/push endpoint:

curl 'https://mango.sievedata.com/v2/push' \
  -X POST
  -H 'X-API-Key: <your api key>' \
  -H 'Content-Type: application/json' \
  --data-raw '{
    "function": "sieve/speech_transcriber",
    "inputs": {
      "file": {
        "url": "https://storage.googleapis.com/sieve-public-data/assets/dub.m4a"
      }
    }
  }'

Upon successful submission, you’ll receive a response containing the job_id and other details:

{
  "id":"898ccb04-fce3-4dec-83bd-c7569bfdc30b",
  "run_id":"898ccb04-fce3-4dec-83bd-c7569bfdc30b-653a14e7-7a48-4c8c-872b-6715a28aca1f",
  "stream_output": true,
  "status":"queued",
  "organization_id":"c0ec7825-0fcb-444e-b0b9-9660c40fa0f4",
  "model_id":"c110b79c-5dce-46f5-9431-0a95dbcce8c9",
}

An alternative to polling is using webhooks, see the docs here.
To check the status of a job and retrieve its results, poll the /v2/jobs/<your-job-id> endpoint and check the status field:

curl 'https://mango.sievedata.com/v2/jobs/<your-job-id>' \
  -H 'X-API-Key: <your api key>' \
  -H 'Content-Type: application/json' \

Your result should look something like:

{
  "id": "898ccb04-fce3-4dec-83bd-c7569bfdc30b",
  "function_id": "c110b79c-5dce-46f5-9431-0a95dbcce8c9",
  "status": "finished",
  "created_at": "2023-10-05T22:45:34.238000",
  "started_at": "2023-10-05T22:45:34.245000",
  "completed_at": "2023-10-05T22:45:36.970000",
  ...
  "outputs": [
    {
      "type": "list",
      "name": null,
      "data": [
        {
          "start": 0.009,
          "end": 7.733,
          "text": " There were a lot of other businesses that just kind of fell by the wayside because they just couldn't make the adaptation from desktop to mobile computing.",
          "words": [
            {
              "start": 0.009,
              "end": 0.109,
              "score": 0.069,
              "word": "There"
            },
            ...
          ]
        },
        {
          "start": 7.733,
          "end": 9.774,
          "text": "I think AI is going to be like that for SaaS.",
          "words": [
            {
              "start": 7.733,
              "end": 7.813,
              "score": 0.787,
              "word": "I"
            },
            ...
          ]
        }
      ],
      "description": null
    }
  ],
  ...
}

The outputs field contains a list of output files generated by the job. Each output is described by a type and a URL where the file can be downloaded.