Real-time pipeline that detects when a person steps into a predefined restricted zone using IM SDK foot detection — with color-coded visual alerts, bounding box status indicators, and RTSP/WebRTC output.
Introduction
Monitoring access to restricted areas is a common requirement in industrial sites, warehouses, and other controlled environments. While these zones are often clearly marked, enforcing compliance in real time remains difficult. Manual supervision and post-event video review do not scale well and cannot provide the immediate response required in safety-critical scenarios. The QIM SDK enables a shift from reactive monitoring to real-time, automated situational awareness. By using hardware-accelerated GStreamer plugins, the SDK offloads compute-intensive tasks — including video decoding, frame preparation, multi-stage AI inference, and encoding — to dedicated hardware blocks. Frame preparation includes resizing, color conversion (YUV to RGB), and pixel normalization for neural network input. At the core of this use case is a streamlined pipeline that combines machine learning with spatial reasoning. A foot detection model identifies the location of a person’s feet in each frame, providing a reliable indicator of physical position within the scene. These detections are then evaluated against a predefined restricted zone to determine, in real time, whether a boundary has been crossed. The SDK also provides hardware-accelerated visualization through overlay rendering. Detection results and zone boundaries are composited directly onto video frames, with dynamic visual cues: green bounding boxes indicate the foot is outside the restricted zone, red indicates a violation. Beyond visualization, the SDK supports AI metadata streaming, synchronizing detection and zone-evaluation results with the video stream and transmitting them in parallel for alerting, logging, or dashboard integration. The complete application source code is available here.Use Case Overview
Video Input
Foot Detection
Zone Evaluation
qtirestrictedzonedbg evaluates each detected foot position against the predefined restricted zone polygon to determine whether a boundary has been crossed.Metadata Generation
Visualization
qtivoverlay renders bounding boxes and the restricted zone polygon onto video frames. Bounding boxes are green (outside zone) or red (violation).Metadata Synchronization
qtimetamux synchronizes all inference and zone-evaluation results with the original video frames, maintaining per-frame consistency.Pipeline diagram

Elements used in pipeline
| Element | Description |
|---|---|
source | Accepts video input from an RTSP camera, ISP camera, USB camera, or local file source. |
tee | Splits the stream into parallel branches for simultaneous display and AI inference. |
qtimlvconverter | Prepares frames for inference — performs resizing, YUV-to-RGB conversion, and normalization to match model input requirements. |
qtimltflite | Executes the TFLite foot detection model on each frame using the Qualcomm HTP via the QNN external delegate. |
qtimlpostprocess | Converts raw model tensors into structured bounding boxes and labels via a dynamically loaded module. |
qtimetamux | Synchronizes inference results with the original video stream as per-frame structured metadata. |
qtiobjtracker | Tracks detected feet across consecutive frames and assigns a consistent ID to each object. |
qtirestrictedzonedbg | Defines the restricted zone polygon and evaluates detections against it. Updates bounding box color to red upon violation. |
qtivoverlay | Renders bounding boxes, labels, and the restricted zone polygon directly onto video frames. |
qtimlmetaparser | Serializes per-frame metadata into JSON format for integration with external systems. |
v4l2h264enc / h264parse | Encodes the processed video stream into H.264 format. |
waylandsink | Displays the video locally on the device. |
sink | Streams the encoded video and metadata over RTSP or WebRTC via rtspbin or webrtcbin. |
How it works
Stage 1 — Foot Detection
qtimlvconverter to align with model input requirements. Preprocessed tensors are passed to qtimltflite, which runs the foot detection model and produces raw output tensors.Post-Processing
qtimlpostprocess converts the tensors into bounding box detections. qtimetamux attaches these detections to the original video frame as metadata.Stage 2 — Zone Evaluation
qtirestrictedzonedbg evaluates each detected foot position against the configured restricted zone polygon. If a foot lies inside the polygon, the pipeline flags it as a violation in real time.Tracking
qtiobjtracker maintains object identity across frames, assigning a consistent tracking ID to each detected foot for reliable zone evaluation over time.Visualization and Output
qtivoverlay renders bounding boxes and the restricted zone on the video stream. Visual cues update dynamically based on zone evaluation. The processed stream can be displayed locally, saved to file, or streamed via RTSP or WebRTC. qtimlmetaparser serializes metadata into JSON for external system integration.Run application on device
Setup Requirements
Hardware

| Component | Description |
|---|---|
| Edge Device | RB3 Gen 2, IQ8, or IQ9 — Primary processing unit for AI inference and video composition. |
| Camera Source | IP/RTSP camera, ISP (on-device) camera, or USB camera. A local file source may be substituted if no physical camera is available. |
| HDMI Display Monitor | Connected to the edge device for rendering and visualizing pipeline output. |
| PoE Switch | Powers IP/RTSP cameras and provides network connectivity over a single Ethernet cable per camera. (Required for IP/RTSP camera setups only.) |
| Local Network | Ensures the edge device, RTSP camera, and host machine are reachable on the same network. (Required when using RTSP camera input or streaming results via RTSP or WebRTC.) |
Software
Flash your Qualcomm Edge device by following the device setup and flashing instructions here. Once your device is ready, follow the instructions below to set up the Restricted Zone pipeline.AI Model and config files
| File | Download | Save as |
|---|---|---|
| Foot detection model | Qualcomm AI Hub — Foot Track Net | foot_track_net_quantized.tflite |
| Detection labels | foot_track_net.json | foot_track_net.json |
| Detection settings | foot_track_net_settings.json | foot_track_net_settings.json |
| Sample video | Input video | rz_sample.mp4 |
Note: A display must be connected to the device. If no display is available, use the --no-display flag.
RTSP output
RTSP output
WebRTC output
WebRTC output
Display only
Display only
Note: This example uses an offline video file as input. To use an IP/RTSP camera or USB camera instead, update the --input-type argument accordingly — refer to the Command-Line Options section below.
It produces two key output results: an AI-annotated video stream and a JSON metadata stream. To visualize these results, refer to the Host-Side Visualization section below.
Visualize the Results - Host-Side Visualization (Windows + WSL)
This section describes how to run the visualization client on a Windows host machine using WSL (Windows Subsystem for Linux). The client renders the live video stream alongside a real-time AI metadata panel. 📥 The visualization client script can be downloaded here: rtsp_webrtc_client.zip It displays:- Left panel — Live video stream with AI overlays (bounding boxes and restricted zone polygon).
- Right panel — Real-time AI metadata (JSON): object detections, tracking IDs, bounding boxes, and confidence scores.
RTSP
RTSP
WebRTC
WebRTC
| Panel | Content |
|---|---|
| Left | Real-time decoded video stream with foot detection bounding boxes and restricted zone polygon |
| Right | Live AI metadata — tracking IDs, bounding boxes, foot coordinates, and confidence scores |
Command-Line Options
--input-type
--input-type
| Value | Description |
|---|---|
usb | USB camera. Requires --input-config=/dev/video0. |
isp | Built-in ISP (on-device) camera. Optionally specify a camera ID via --input-config=0. |
rtsp | External IP/RTSP camera or stream. Requires --input-config=rtsp://.... |
file | Local H.264-encoded video file. Requires --input-config=/path/to/video.mp4. |
--input-config
--input-config
--input-type.| Input Type | Value |
|---|---|
| USB | /dev/videoX |
| ISP | <camera ID> |
| RTSP | rtsp://<ip-or-url> |
| File | /path/to/rz_sample.mp4 |
--output-type
--output-type
| Value | Description |
|---|---|
none | No video output (headless mode). |
file | Save encoded output to a file. Requires --output-config. |
rtsp | Stream over RTSP. Requires --output-config=<port>. Access at rtsp://<device-ip>:<port>/live. |
webrtc | Stream over WebRTC. Requires --output-config=ws://.... |
--output-config
--output-config
| Output Type | Value |
|---|---|
| File | /path/to/output.mp4 |
| RTSP | <port> |
| WebRTC | ws://<signalling-server>:<port> |
--model-base-path
--model-base-path
| Asset Type | Resolved Path |
|---|---|
Model files (*.tflite) | <base-path>/models/<model_file> |
Label/settings files (*.json) | <base-path>/labels/<labels_file> |
--no-display
--no-display
--width / --height / --framerate
--width / --height / --framerate
--webrtc-id
--webrtc-id
Update Restricted Zone Area
The restricted zone can be customized by setting thezone-config property of qtirestrictedzonedbg. The configuration supports one or more zones defined as ordered polygon vertex lists.
Implementation Deep-Dive
1. Application Configuration and Runtime Context
1. Application Configuration and Runtime Context
2. Reusable Pipeline Skeleton
2. Reusable Pipeline Skeleton
3. Inference, Zone, and Overlay Configuration
3. Inference, Zone, and Overlay Configuration
4. Pipeline Linking
4. Pipeline Linking
5. WebRTC Signaling
5. WebRTC Signaling
libsoup.| Callback | Responsibility |
|---|---|
on_offer_created | Constructs and sends the SDP offer |
on_ice_candidate | Transmits ICE candidates to the signaling server |
on_ws_message | Handles incoming WebSocket signaling messages |
Build the Application
- Source code: gst-restricted-zone
- Build instructions: Steps to build custom application
