Run GPU Workloads

Deploying GPU functions

Sieve lets you run code on GPUs. To do so, all you need to do is add a gpu parameter to your function decorator with the GPU you want. Today, Sieve offers a couple machine configurations listed below.

Name	GPU	Memory (GB)	vCPUs	Parameter
T4 (default)	T4	16	4	`sieve.gpu.T4()`
A100-40GB	A100-40GB	85	12	`sieve.gpu.A100()`
A100-20GB	A100-20GB	42.5	6	`sieve.gpu.A10020GB()`
V100	V100	16	4	`sieve.gpu.V100()`
L4	L4	32	8	`sieve.gpu.L4()`

You can specify one of these in code as follows.

@sieve.function(
	name="gpu_function",
	gpu=sieve.gpu.T4()
)
def runs_on_gpu(video: sieve.File) -> sieve.File:
	...

The CUDA version can be specified with the cuda_version parameter in the function decorator. The full list of possible CUDA versions can be found here.

@sieve.function(
    name="gpu_function_cuda",
    gpu=sieve.gpu.T4(),
    cuda_version="12.1"
)
def runs_on_gpu(video: sieve.File) -> sieve.File:
	...

By default, Sieve will allocate an entire GPU for your function worker to use. Each worker runs one prediction at a time. This way, your worker is guaranteed to have the entire GPU to use for the duration of a function call. However, some workloads may not require the use of an entire GPU. You can use the optional split argument in the GPU constructor to tell Sieve to let multiple workers share the same GPU. For example, the following would tell sieve to allocate 3 workers per GPU:

@sieve.function(
	name="gpu_function_sharing",
	gpu=sieve.gpu.T4(split=3)
)
def runs_on_gpu(video: sieve.File) -> sieve.File:
	...

split can be any integer between 1 and 8. Sieve will only share a GPU with other workers of the same function. Since split number of workers share the same GPU, Sieve will spin up that many workers at a time. Each shared worker will be billed at 1/split the rate of a regular worker. So, Sieve will charge you the same amount per GPU hour regardless of how many workers are running on it. Read more about the way the gpu field works in SDK reference.

Example: YOLO Object Detection

In this guide, we’ll deploy YOLOv8, a standard object detection model, to Sieve, using a cloud GPU (T4) and with GPU sharing enabled.

Let’s first create a new directory and set up our project.

mkdir sieve_yolov8 && cd sieve_yolov8
touch sieve_yolov8/main.py

Now, we can setup our YOLO model and write our inference code.

import sieve

@sieve.function(
    name="yolo-v8",
    gpu=sieve.gpu.T4(split=4),
    python_packages=["ultralytics", "torch==1.13.1", "torchvision==0.14.1"],
    cuda_version="11.7.1",
    system_packages=["libgl1-mesa-glx", "libglib2.0-0", "ffmpeg"],
    python_version="3.10",
)
class Yolo:
    def __setup__(self):
        from ultralytics import YOLO
        self.function = YOLO('yolov8l.pt')

    def __predict__(self, img: sieve.File) -> list:
        preds = self.function(img.path)
        all_boxes = []
        for pred in preds:
            all_boxes.append(pred.boxes.xyxy.cpu().numpy().tolist())
        return all_boxes

Finally, let's deploy our function to Sieve.

After authenticating with your API key and making sure you’re in the sieve_yolov8 directory, we can simply run:

sieve deploy

Run jobs!

You’ll now see your deployed function on the Sieve dashboard, along with an auto-generated UI to play around with it!Your job traffic will autoscale GPUs, with each GPU having capacity for running 4 jobs at a time, as we specified with sieve.gpu.T4(split=4).

API Reference

CLI Reference

Custom Functions

SDK Reference

Run GPU Workloads

Deploying GPU functions

Example: YOLO Object Detection

API Reference

CLI Reference

Custom Functions

SDK Reference

​Deploying GPU functions

​GPU Sharing

​Example: YOLO Object Detection

Deploying GPU functions

GPU Sharing

Example: YOLO Object Detection