Restricted Zone Violation Detection with Foot-Based Spatial Intelligence

QIMSDK · Qualcomm

Safety & Security

Real-time pipeline that detects when a person steps into a predefined restricted zone using IM SDK foot detection — with color-coded visual alerts, bounding box status indicators, and RTSP/WebRTC output.

QIMSDK Team·Jun 14, 2026·← All posts

Introduction

Monitoring access to restricted areas is a common requirement in industrial sites, warehouses, and other controlled environments. While these zones are often clearly marked, enforcing compliance in real time remains difficult. Manual supervision and post-event video review do not scale well and cannot provide the immediate response required in safety-critical scenarios. The QIM SDK enables a shift from reactive monitoring to real-time, automated situational awareness. By using hardware-accelerated GStreamer plugins, the SDK offloads compute-intensive tasks — including video decoding, frame preparation, multi-stage AI inference, and encoding — to dedicated hardware blocks. Frame preparation includes resizing, color conversion (YUV to RGB), and pixel normalization for neural network input. At the core of this use case is a streamlined pipeline that combines machine learning with spatial reasoning. A foot detection model identifies the location of a person’s feet in each frame, providing a reliable indicator of physical position within the scene. These detections are then evaluated against a predefined restricted zone to determine, in real time, whether a boundary has been crossed. The SDK also provides hardware-accelerated visualization through overlay rendering. Detection results and zone boundaries are composited directly onto video frames, with dynamic visual cues: green bounding boxes indicate the foot is outside the restricted zone, red indicates a violation. Beyond visualization, the SDK supports AI metadata streaming, synchronizing detection and zone-evaluation results with the video stream and transmitting them in parallel for alerting, logging, or dashboard integration. The complete application source code is available here.

Use Case Overview

Video Input

A camera monitors the restricted area and its surroundings. The pipeline accepts input from RTSP, ISP, USB, or file-based sources.

Foot Detection

Each frame is submitted to a foot detection model that identifies the location of each person’s feet within the scene.

Zone Evaluation

qtirestrictedzonedbg evaluates each detected foot position against the predefined restricted zone polygon to determine whether a boundary has been crossed.

Metadata Generation

Detection results are attached to the video stream as structured per-frame metadata for downstream consumption.

Visualization

qtivoverlay renders bounding boxes and the restricted zone polygon onto video frames. Bounding boxes are green (outside zone) or red (violation).

Metadata Synchronization

qtimetamux synchronizes all inference and zone-evaluation results with the original video frames, maintaining per-frame consistency.

Output

The annotated stream is delivered via RTSP or WebRTC. Structured detection data is transmitted in parallel as JSON metadata for alerting, logging, and external integrations.

Pipeline diagram

Elements used in pipeline

Element	Description
`source`	Accepts video input from an RTSP camera, ISP camera, USB camera, or local file source.
`tee`	Splits the stream into parallel branches for simultaneous display and AI inference.
`qtimlvconverter`	Prepares frames for inference — performs resizing, YUV-to-RGB conversion, and normalization to match model input requirements.
`qtimltflite`	Executes the TFLite foot detection model on each frame using the Qualcomm HTP via the QNN external delegate.
`qtimlpostprocess`	Converts raw model tensors into structured bounding boxes and labels via a dynamically loaded module.
`qtimetamux`	Synchronizes inference results with the original video stream as per-frame structured metadata.
`qtiobjtracker`	Tracks detected feet across consecutive frames and assigns a consistent ID to each object.
`qtirestrictedzonedbg`	Defines the restricted zone polygon and evaluates detections against it. Updates bounding box color to red upon violation.
`qtivoverlay`	Renders bounding boxes, labels, and the restricted zone polygon directly onto video frames.
`qtimlmetaparser`	Serializes per-frame metadata into JSON format for integration with external systems.
`v4l2h264enc` / `h264parse`	Encodes the processed video stream into H.264 format.
`waylandsink`	Displays the video locally on the device.
`sink`	Streams the encoded video and metadata over RTSP or WebRTC via `rtspbin` or `webrtcbin`.

How it works

Stage 1 — Foot Detection

The full video frame is preprocessed by qtimlvconverter to align with model input requirements. Preprocessed tensors are passed to qtimltflite, which runs the foot detection model and produces raw output tensors.

Post-Processing

qtimlpostprocess converts the tensors into bounding box detections. qtimetamux attaches these detections to the original video frame as metadata.

Stage 2 — Zone Evaluation

qtirestrictedzonedbg evaluates each detected foot position against the configured restricted zone polygon. If a foot lies inside the polygon, the pipeline flags it as a violation in real time.

Tracking

qtiobjtracker maintains object identity across frames, assigning a consistent tracking ID to each detected foot for reliable zone evaluation over time.

Visualization and Output

qtivoverlay renders bounding boxes and the restricted zone on the video stream. Visual cues update dynamically based on zone evaluation. The processed stream can be displayed locally, saved to file, or streamed via RTSP or WebRTC. qtimlmetaparser serializes metadata into JSON for external system integration.

Run application on device

Setup Requirements

Hardware

Component	Description
Edge Device	RB3 Gen 2, IQ8, or IQ9 — Primary processing unit for AI inference and video composition.
Camera Source	IP/RTSP camera, ISP (on-device) camera, or USB camera. A local file source may be substituted if no physical camera is available.
HDMI Display Monitor	Connected to the edge device for rendering and visualizing pipeline output.
PoE Switch	Powers IP/RTSP cameras and provides network connectivity over a single Ethernet cable per camera. (Required for IP/RTSP camera setups only.)
Local Network	Ensures the edge device, RTSP camera, and host machine are reachable on the same network. (Required when using RTSP camera input or streaming results via RTSP or WebRTC.)

Software

Flash your Qualcomm Edge device by following the device setup and flashing instructions here. Once your device is ready, follow the instructions below to set up the Restricted Zone pipeline.

AI Model and config files

File	Download	Save as
Foot detection model	Qualcomm AI Hub — Foot Track Net	`foot_track_net_quantized.tflite`
Detection labels	foot_track_net.json	`foot_track_net.json`
Detection settings	foot_track_net_settings.json	`foot_track_net_settings.json`
Sample video	Input video	`rz_sample.mp4`

Copy files to device

# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.

ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media,media/output}"
scp foot_track_net_quantized.tflite   <user>@<device-ip>:$HOME/models/
scp foot_track_net.json               <user>@<device-ip>:$HOME/labels/
scp foot_track_net_settings.json      <user>@<device-ip>:$HOME/labels/
scp rz_sample.mp4                         <user>@<device-ip>:$HOME/media/

Connect to device

ssh <user>@<device-ip>

Run the Restricted Zone Application

Note: A display must be connected to the device. If no display is available, use the --no-display flag.

RTSP output

gst-restricted-zone \
  --input-type=file \
  --input-config=$HOME/media/rz_sample.mp4 \
  --output-type=rtsp \
  --output-config=8900

WebRTC output

gst-restricted-zone \
  --input-type=file \
  --input-config=$HOME/media/rz_sample.mp4 \
  --output-type=webrtc \
  --output-config=wss://webrtc.nirbheek.in:8443 \
  --webrtc-id=1010

Display only

gst-restricted-zone \
  --input-type=file \
  --input-config=$HOME/media/rz_sample.mp4

Note: This example uses an offline video file as input. To use an IP/RTSP camera or USB camera instead, update the --input-type argument accordingly — refer to the Command-Line Options section below.

It produces two key output results: an AI-annotated video stream and a JSON metadata stream. To visualize these results, refer to the Host-Side Visualization section below.

Visualize the Results - Host-Side Visualization (Windows + WSL)

This section describes how to run the visualization client on a Windows host machine using WSL (Windows Subsystem for Linux). The client renders the live video stream alongside a real-time AI metadata panel. 📥 The visualization client script can be downloaded here: rtsp_webrtc_client.zip It displays:

Left panel — Live video stream with AI overlays (bounding boxes and restricted zone polygon).
Right panel — Real-time AI metadata (JSON): object detections, tracking IDs, bounding boxes, and confidence scores.

Step 1 — Install WSL and Ubuntu If WSL is not already installed, run the following from a Windows terminal:

wsl --install Ubuntu-24.04

Once installed, update the system:

sudo apt update && sudo apt upgrade -y

Step 2 — Install System Dependencies

sudo apt install -y \
  python3 python3-pip python3-gi python3-gi-cairo \
  gir1.2-gstreamer-1.0 \
  gir1.2-gst-plugins-base-1.0 \
  gir1.2-gst-plugins-bad-1.0 \
  gstreamer1.0-tools \
  gstreamer1.0-plugins-base \
  gstreamer1.0-plugins-good \
  gstreamer1.0-plugins-bad \
  gstreamer1.0-plugins-ugly \
  gstreamer1.0-libav \
  python3-websocket \
  libnice10 \
  libnice-dev \
  gstreamer1.0-nice

Step 3 — Run the Visualization Client Script

RTSP

python3 rtsp_webrtc_client.py rtsp://<DEVICE_IP>:8900/live

WebRTC

python3 rtsp_webrtc_client.py --source webrtc --signalling-server wss://webrtc.nirbheek.in:8443 --peer-id 1010

Step 4 — Expected Output

Panel	Content
Left	Real-time decoded video stream with foot detection bounding boxes and restricted zone polygon
Right	Live AI metadata — tracking IDs, bounding boxes, foot coordinates, and confidence scores

After following the steps, the video and metadata streams should be up and running. When a person steps into the restricted zone, the corresponding bounding box changes from green to red. The pipeline generates structured JSON metadata in the following format:

{
  "object_detection": [
    {
      "tracking_id": 7,
      "label": "person",
      "confidence": 71.63,
      "color": 16711935,
      "rectangle": {
        "x": 0.16, "y": 0.05,
        "width": 0.16, "height": 0.16
      },
      "landmarks": {
        "left_ankle":  { "x": 0.21, "y": 0.10 },
        "right_ankle": { "x": 0.20, "y": 0.07 }
      }
    }
  ],
  "parameters": { "timestamp": "49058322222" }
}

Command-Line Options

--input-type

Selects the video input source for the pipeline.

Value	Description
`usb`	USB camera. Requires `--input-config=/dev/video0`.
`isp`	Built-in ISP (on-device) camera. Optionally specify a camera ID via `--input-config=0`.
`rtsp`	External IP/RTSP camera or stream. Requires `--input-config=rtsp://...`.
`file`	Local H.264-encoded video file. Requires `--input-config=/path/to/video.mp4`.

--input-config

Specifies the input source configuration corresponding to the selected --input-type.

Input Type	Value
USB	`/dev/videoX`
ISP	`<camera ID>`
RTSP	`rtsp://<ip-or-url>`
File	`/path/to/rz_sample.mp4`

--output-type

Defines how the processed output video stream is delivered.

Value	Description
`none`	No video output (headless mode).
`file`	Save encoded output to a file. Requires `--output-config`.
`rtsp`	Stream over RTSP. Requires `--output-config=<port>`. Access at `rtsp://<device-ip>:<port>/live`.
`webrtc`	Stream over WebRTC. Requires `--output-config=ws://...`.

--output-config

Specifies the output destination configuration.

Output Type	Value
File	`/path/to/output.mp4`
RTSP	`<port>`
WebRTC	`ws://<signalling-server>:<port>`

--model-base-path

Root directory for AI model, label, and configuration files.

Asset Type	Resolved Path
Model files (`*.tflite`)	`<base-path>/models/<model_file>`
Label/settings files (`*.json`)	`<base-path>/labels/<labels_file>`

--model-base-path=/root        # QLI
--model-base-path=/home/ubuntu # Ubuntu

--no-display

Disables local on-screen rendering. Recommended for headless deployments, remote streaming (RTSP/WebRTC), or performance optimization.

--width / --height / --framerate

Sets the raw input video resolution and frame rate. Applicable only to ISP and USB inputs.

--width=1920 --height=1080 --framerate=30

--webrtc-id

Specifies the local WebRTC signaling client ID.

--webrtc-id=1010

Update Restricted Zone Area

The restricted zone can be customized by setting the zone-config property of qtirestrictedzonedbg. The configuration supports one or more zones defined as ordered polygon vertex lists.

g_object_set(G_OBJECT(qtirestrictedzonedbg),
  "zone-config",
  "Zones,Zone1=<<640,420>,<687,408>,<843,476>,<848,528>,<937,583>,<641,729>>;",
  NULL);

Multiple zones can be defined in a single string:

"Zones,Zone1=<<...>>;Zone2=<<...>>;"

Each zone is an ordered list of (x, y) pixel coordinates. Points can be specified in clockwise or counterclockwise order. A polygon can contain any number of vertices. If a detected foot falls inside the polygon, a violation is triggered. After updating the restricted zone coordinates, rebuild the application — see Build the Application below.

Implementation Deep-Dive

1. Application Configuration and Runtime Context

The application separates user-configurable settings from runtime state.

typedef struct GstAppConfig {
  gchar *input_type, *input_config;
  gchar *output_type, *output_config;
  gchar *model_base_path;
  gboolean no_display;
  gint width, height, framerate, webrtc_id;
} GstAppConfig;

typedef struct GstAppContext {
  GstAppConfig config;
  GstElement  *pipeline;
  GMainLoop   *mloop;
  GstElement  *webrtc;
  gboolean     is_shutting_down;
} GstAppContext;

2. Reusable Pipeline Skeleton

The pipeline is assembled from three logical sections: input branch, output branch, and application-specific user branch.

static gboolean gst_app_create_pipe (GstAppContext *appctx) {
  GstElement *input_tail = NULL, *output_head = NULL, *meta_head = NULL;

  appctx->pipeline = gst_pipeline_new ("gst-ip-camera");

  if (!gst_app_create_input_pipe  (appctx, &input_tail))  return FALSE;
  if (!gst_app_create_output_pipe (appctx, &output_head, &meta_head)) return FALSE;
  if (!gst_app_create_user_pipe   (appctx, input_tail, output_head, meta_head)) return FALSE;
  return TRUE;
}

3. Inference, Zone, and Overlay Configuration

The pipeline uses the QNN external delegate, QPD post-processing, JSON metadata serialization, and overlay masks.

gst_element_set_enum_property (qtimlpostprocess, "module", "qpd");

delegate_options = gst_structure_from_string (
    "QNNExternalDelegate,backend_type=htp;", NULL);
g_object_set (G_OBJECT (qtimltflite),
    "external-delegate-path", "libQnnTFLiteDelegate.so",
    "external-delegate-options", delegate_options,
    "model", model_path, NULL);
gst_element_set_enum_property (qtimltflite, "delegate", "external");

g_object_set (G_OBJECT (qtirestrictedzonedbg),
    "zone-config",
    "Zones,Zone1=<<640,420>,<687,408>,<843,476>,<848,528>,<937,583>,<641,729>>;",
    NULL);

g_object_set (G_OBJECT (qtivoverlay), "masks", overlay_rz, NULL);

4. Pipeline Linking

The metadata-enriched stream is routed through tracking, zone evaluation, and overlay before reaching the output branch.

gst_element_link_many (input_tail, tee[0], queue[0], qtimetamux, NULL);

gst_element_link_many (tee[0], queue[1], qtimlvconverter, queue[2],
    qtimltflite, queue[3], qtimlpostprocess, postprocess_caps,
    queue[4], qtimetamux, NULL);

gst_element_link_many (qtimetamux, queue[5], qtiobjtracker,
    qtirestrictedzonedbg, qtivoverlay, tee[1], NULL);

if (output_head != NULL)
    gst_element_link_many (tee[1], queue[6], output_head, NULL);
if (meta_head != NULL)
    gst_element_link_many (tee[1], queue[7], qtimlmetaparser, meta_head, NULL);

5. WebRTC Signaling

WebRTC signaling uses explicit SDP offer/answer and ICE candidate exchange via WebSocket with libsoup.

g_signal_emit_by_name (webrtcbin, "create-data-channel", name, NULL, &ch);

GstPromise *promise = gst_promise_new_with_change_func (on_offer_created, appctx, NULL);
g_signal_emit_by_name (webrtcbin, "create-offer", NULL, promise);

g_signal_connect (appctx->webrtc, "on-ice-candidate",
    G_CALLBACK (on_webrtc_ice_candidate), appctx);

Callback	Responsibility
`on_offer_created`	Constructs and sends the SDP offer
`on_ice_candidate`	Transmits ICE candidates to the signaling server
`on_ws_message`	Handles incoming WebSocket signaling messages

Build the Application

Source code: gst-restricted-zone
Build instructions: Steps to build custom application

Conclusion

The Restricted Zone Alarm demonstrates how the IM SDK converts live video into actionable spatial intelligence. By combining AI-based foot detection with in-pipeline zone evaluation, it detects boundary violations in real time with low integration overhead. The pipeline is designed for interoperability — results can be rendered on-screen, streamed over the network, or exported as structured metadata for downstream systems. This modular architecture makes the solution easy to extend for visualization, alerting, and external processing, making it a practical foundation for location-aware video analytics.

​Introduction

​Use Case Overview

​Pipeline diagram

​Elements used in pipeline

​How it works

​Run application on device

​Setup Requirements

​Hardware

​Software