1. Technical Blogs
  2. Building Custom Video Object Counting

Years have passed since the public release of models like YOLO, and we’re now at YOLOv7. Since then, thousands of companies have leveraged these models to perform tasks that have to do with object detection, tracking, and counting to solve a variety of business problems. This includes applications in security, factories, farms, and more for tasks such as automated inventory counting, theft prevention, cattle monitoring, etc.

In the wake of this, giants like Facebook have released newer models like DETR that are even more robust, and can perform object detection tasks — using a transformer model paradigm. Other companies like CountThings on the other hand have leveraged models such as YOLO and applied it to a wide variety of object counting tasks.

You’re presumably reading this post however because you still aren’t able to easily apply any of these models to your object counting task. So why is this the case? The answer comes down to form factor and customizability.

Approach 1: Open-Source Repositories

First let’s talk about using open-source code repositories for object detection models to label, train, and deploy the models yourself.

This means that you’d have to set up your own cloud or edge infrastructure to train, manage, and deploy these models — which is typically a lot of effort. Add video into the mix and it’s an almost insurmountable task for most teams. The form factor is simply too complicated to setup and customization is possible but difficult to do.

Approach 2: Managed Object Counting Services

The other option is using services such as CountThings that specialize in object counting for different types of objects.

Their service is great if you’re simply looking for an app that someone on your team can use to point and click to count objects in a given setting. But there are no options if you want to set up your own stationary cameras, or if you have pre-recorded footage from something like a drone. Another downside is that they only support a fixed-set of objects. So if the object you want to count isn’t supported, you’re out of luck.

Counting Objects with Sieve

At Sieve, we believe providing a simple experience to building functionality like object counting can help companies easily enhance their products and workflows with minimal work required. Sieve makes it easy to run YOLO or DETR on images or video with a simple API. Below, we outline the steps to using Sieve for a given object counting task.

Push some sample videos

Terminal

            
              
              
            
            
          curl 'https://api.sievedata.com/v1/push_video' \
  -X POST \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_name": "furniture_counting",
    "video_url": "https://storage.googleapis.com/sieve-preconfigured-projects/videos/home-interior/home_sf.mp4",
    "video_name": "home_video"
  }'

Submit feedback

Sieve’s system takes in human feedback using API calls that automatically retrain YOLO or DETR to count a specific object. It not acts as a way to initially bring Sieve up to speed, but also as a way to constantly provide feedback which keeps performance up to par — even when the environment changes.

Terminal

            
              
              
            
            
          curl 'https://api.sievedata.com/v1/feedback/add_object' \
  -X POST \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_name": "furniture_counting"
    "video_name": "home_video"
    "additions": [
      {
        "class": "chair",
        "temporal": [
          {
            "bbox": {
              "position": {
                "x1": 5,
                "x2": 50,
                "y1": 5,
                "y2": 10
              }
            },
            "frame_number": 5
          }
        ]
      }
    ]
  }'

See results

Make queries using Sieve’s query endpoints (metadata, intervals) to find the moments in a video that match a certain object count, or to pull all historical information for the sake of alerts.

As you might’ve noticed, most use cases require you to fine-tune and deploy AI models such as YOLO and DETR on video. We provide a simple experience to push video, and fine-tune AI models that allows for easy experimentation and quick feedback loops. To get access to the Sieve platform, get in touch with our team.