Deploying GPU functions

Sieve lets you run code on GPUs. To do so, all you need to do is add a gpu parameter to your function decorator with the GPU you want. Today, Sieve offers a couple machine configurations listed below.

NameGPUMemory (GB)vCPUsParameter
T4 (default)T4164sieve.gpu.T4()
A100-40GBA100-40GB8512sieve.gpu.A100()
A100-20GBA100-20GB42.56sieve.gpu.A10020GB()
V100V100164sieve.gpu.V100()
L4L4328sieve.gpu.L4()

You can specify one of these in code as follows.

@sieve.function(
	name="gpu_function",
	gpu=sieve.gpu.T4()
)
def runs_on_gpu(video: sieve.File) -> sieve.File:
	...

The CUDA version can be specified with the cuda_version parameter in the function decorator. The full list of possible CUDA versions can be found here.

@sieve.function(
    name="gpu_function_cuda",
    gpu=sieve.gpu.T4(),
    cuda_version="12.1"
)
def runs_on_gpu(video: sieve.File) -> sieve.File:
	...

GPU Sharing

By default, Sieve will allocate an entire GPU for your function worker to use. Each worker runs one prediction at a time. This way, your worker is guaranteed to have the entire GPU to use for the duration of a function call. However, some workloads may not require the use of an entire GPU. You can use the optional split argument in the GPU constructor to tell Sieve to let multiple workers share the same GPU. For example, the following would tell sieve to allocate 3 workers per GPU:

@sieve.function(
	name="gpu_function_sharing",
	gpu=sieve.gpu.T4(split=3)
)
def runs_on_gpu(video: sieve.File) -> sieve.File:
	...

split can be any integer between 1 and 8. Sieve will only share a GPU with other workers of the same function. Since split number of workers share the same GPU, Sieve will spin up that many workers at a time.

Each shared worker will be billed at 1/split the rate of a regular worker. So, Sieve will charge you the same amount per GPU hour regardless of how many workers are running on it.

Read more about the way the gpu field works in SDK reference.

Example: YOLO Object Detection

In this guide, we’ll deploy YOLOv8, a standard object detection model, to Sieve, using a cloud GPU (T4) and with GPU sharing enabled.

1

Let’s first create a new directory and set up our project.

mkdir sieve_yolov8 && cd sieve_yolov8
touch sieve_yolov8/main.py
2

Now, we can setup our YOLO model and write our inference code.

import sieve

@sieve.Model(
    name="yolo-v8",
    gpu=sieve.gpu.T4(split=4),
    python_packages=["ultralytics", "torch==1.13.1", "torchvision==0.14.1"],
    cuda_version="11.7.1",
    system_packages=["libgl1-mesa-glx", "libglib2.0-0", "ffmpeg"],
    python_version="3.10",
)
class Yolo:
    def __setup__(self):
        from ultralytics import YOLO
        self.model = YOLO('yolov8l.pt')

    def __predict__(self, img: sieve.File) -> list:
        preds = self.model(img.path)
        all_boxes = []
        for pred in preds:
            all_boxes.append(pred.boxes.xyxy.cpu().numpy().tolist())
        return all_boxes
3

Finally, let's deploy our model to Sieve.

After authenticating with your API key and making sure you’re in the sieve_yolov8 directory, we can simply run:

sieve deploy
4

Run jobs!

You’ll now see your deployed model on the Sieve dashboard, along with an auto-generated UI to play around with it!

Your job traffic will autoscale GPUs, with each GPU having capacity for running 4 jobs at a time, as we specified with sieve.gpu.T4(split=4).