1. Technical Blogs
  2. Building video effects like Green Screen using AI

Recent research in Computer Vision (CV) has led to widespread advances in all sorts of use cases and creative media / video editing is no exception. Companies like Adobe are releasing features such as motion tracking and content-aware fills within Adobe After Effects while companies like RunwayML are releasing green screen and inpainting. While it’s become evident that these features add immense value to users and could enhance a given product offering, the implementation of these features are only gated to a few companies that have large machine learning teams with the necessary resources to pull it off. The reality is that video infrastructure is hard — and when you pair that with AI / ML infrastructure, it becomes an extremely daunting task. As such, this post will go over using Sieve to build a complex video editing feature like video object removal / video green screen.

Simplify the Problem

Building a modular green screen feature can mean many things. Let’s define a set of variables which can help us evaluate different approaches.

  • Cost. Does it break the bank, especially when GPUs come int play?
  • Quality. Are the masks high quality, and up to par with what the user expects?
  • Smoothness / Feel. How smooth is the user experience?

Approach 1: MiVOS

One promising approach is Modular Interactive Video Object Segmentation (MiVOS). It’s from a CVPR paper released in late 2021.

At first glance, this looks like everything we want. Users can click on any object and some of the out-of-the-box results seem to work at a high-quality. The issue comes up when we evaluate cost and smoothness. MiVOS is a large model that can’t be parallelized and is expensive to run, not to mention constant network calls. This means that a mask will have to be recalculated after each click (which takes a couple of seconds) and will require extremely expensive cloud hardware to run. This fails the cost and smoothness test.

Approach 2: PointRend

Another promising approach is to use a simpler model called PointRend. It was released in 2019 and boasts a more pixel-perfect result compared to other, more traditional instance segmentation models such as DeeplabV3, DeeplabV3+, etc.

The interesting thing about this model is that it’s an image-based model. That means that it can be setup in parallel, where thousands of machines process individual frames at the same time — rather than processing frames one after another. After the object masks are generated in parallel, we can run a tracking algorithm such as SORT or DeepSORT to associate bounding boxes to a given object. The way to think about this setup is almost like a MapReduce problem, where we “map” PointRend to every frame (by running it on thousands of machines in parallel) and then run an object tracking step at the end to reduce the results into given “objects” in a video.

Shipping Green Screen with Sieve

Now we’ve gone over the approach to building a green screen effect using state of the art computer vision, but how does one set this up in practice? Typically, this is the hardest part of the problem because of all the cloud infrastructure, GPUs, and parallel processing involved (scaling to 1000s of parallel machines) but Sieve’s platform has made this extremely easy to do.

Get an API Key

Visit Sieve dashboard to sign up and click “Copy API Key”.

Create a pre-configured green screen Project

Terminal

            
              
              
            
            
          curl 'https://api.sievedata.com/v1/init_project' \
  -X POST \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{
    "project_name": "object_removal",
    "config_url": "https://storage.googleapis.com/sieve-preconfigured-projects/configs/segmentation.json"
  }'

Push a Sample Video

Terminal

            
              
              
            
            
          curl 'https://api.sievedata.com/v1/push_video' \
  -X POST \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_name": "object_removal",
    "video_url": "https://storage.googleapis.com/sieve-preconfigured-projects/videos/home-interior/home_sf.mp4",
    "video_name": "home_video"
  }'

Poll Job Results

Terminal

            
              
              
            
            
          curl 'https://api.sievedata.com/v1/get_all_jobs?project_name=object_removal' \
  -X GET \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json"

Visualize Results

Make queries to Sieve using the metadata endpoint to get object and mask information, and visualize all results.

Large vision AI models like PointRend are increasingly being used for a variety of tasks. Inevitably, there are many other use cases in video editing that can be powered with AI. At Sieve, we believe providing a simple experience to building video AI features can help companies easily enhance their products with minimal work required. To start using Sieve for any video AI use case, get in touch with our team.