Run GPU Workloads
Accelerate your Sieve functions with GPU
Deploying GPU functions
Sieve lets you run code on GPUs. To do so, all you need to do is add a gpu
parameter to your function decorator with the GPU you want. Today, Sieve offers a couple machine configurations listed below.
Name | GPU | Memory (GB) | vCPUs | Parameter |
---|---|---|---|---|
T4 (default) | T4 | 16 | 4 | sieve.gpu.T4() |
A100-40GB | A100-40GB | 85 | 12 | sieve.gpu.A100() |
A100-20GB | A100-20GB | 42.5 | 6 | sieve.gpu.A10020GB() |
V100 | V100 | 16 | 4 | sieve.gpu.V100() |
L4 | L4 | 32 | 8 | sieve.gpu.L4() |
You can specify one of these in code as follows.
The CUDA version can be specified with the cuda_version
parameter in the function decorator. The full list of possible CUDA versions can be found here.
GPU Sharing
By default, Sieve will allocate an entire GPU for your function worker to use. Each worker runs one prediction at a time. This way, your worker is guaranteed to have the entire GPU to use for the duration of a function call. However, some workloads may not require the use of an entire GPU. You can use the optional split
argument in the GPU constructor to tell Sieve to let multiple workers share the same GPU. For example, the following would tell sieve to allocate 3 workers per GPU:
split
can be any integer between 1 and 8. Sieve will only share a GPU with other workers of the same function. Since split
number of workers share the same GPU, Sieve will spin up that many workers at a time.
Each shared worker will be billed at 1/split
the rate of a regular worker. So, Sieve will charge you the same amount per GPU hour regardless of how many workers are running on it.
Read more about the way the gpu
field works in SDK reference.
Example: YOLO Object Detection
In this guide, we’ll deploy YOLOv8, a standard object detection model, to Sieve, using a cloud GPU (T4) and with GPU sharing enabled.
Let’s first create a new directory and set up our project.
Now, we can setup our YOLO model and write our inference code.
Finally, let's deploy our model to Sieve.
After authenticating with your API key and making sure you’re in the sieve_yolov8
directory, we can simply run:
Run jobs!
You’ll now see your deployed model on the Sieve dashboard, along with an auto-generated UI to play around with it!
Your job traffic will autoscale GPUs, with each GPU having capacity for running 4 jobs at a time, as we specified with sieve.gpu.T4(split=4)
.
Was this page helpful?