Skip to main content
QIMSDK · Qualcomm
Computer Vision

A real-time object detection pipeline built with IM SDK using a YOLOv8 TensorFlow Lite model, supporting USB cameras, ISP cameras, RTSP streams, and video files — with RTSP/WebRTC output.

QIMSDK Team·Apr 10, 2026·← All posts

Introduction

Managing video feeds from multiple IP cameras simultaneously is a core requirement in modern surveillance, retail analytics, smart city, and industrial monitoring deployments. Processing each stream independently on a CPU is costly and difficult to scale — but with the QIM SDK, multiple concurrent RTSP streams can be decoded, processed through AI inference, and rendered or streamed out in real time, all using Qualcomm’s dedicated hardware accelerators. The QIM SDK’s GStreamer-based plugin architecture offloads compute-intensive tasks — including multi-stream hardware H.264/H.265 decoding, frame preparation, AI inference via the Neural Processing Unit (NPU), and re-encoding — entirely from the CPU to purpose-built hardware blocks. This enables low-latency, power-efficient processing of multiple simultaneous IP camera feeds directly on Qualcomm edge devices. At the core of this use case is a parallel multi-stream pipeline, where each incoming RTSP stream is independently decoded and processed through its own AI inference chain. Results from all streams are then composited into a unified output — either for display, recording, or network streaming. The pipeline supports flexible input configurations, from a single RTSP source up to multiple concurrent streams, making it suitable for scalable, real-world deployments. The complete application source code is available here.

Use Case Overview

1

Video Input

Accepts multiple concurrent RTSP streams from IP cameras, each decoded independently using hardware-accelerated H.264/H.265 decoding.
2

AI Inference

Each decoded stream is passed through a dedicated AI inference chain — including preprocessing (qtimlvconverter), model execution (qtimltflite), and post-processing (qtimlpostprocess) — for per-stream object detection.
3

Metadata Attachment

Inference results are attached to each stream as structured per-frame metadata via qtimetamux, maintaining stream–detection correspondence.
4

Visualization

Bounding boxes and labels are rendered onto each stream’s frames via qtivoverlay for real-time visual feedback.
5

Composition

Annotated frames from all streams are composited into a unified output layout using qtivcomposer.
6

Output

The composited stream is delivered to a local Wayland display, encoded and saved to file, or streamed over RTSP/WebRTC.

Pipeline diagram

Pipeline Diagram

Elements used in pipeline

ElementDescription
rtspsrcReceives an RTSP stream from an IP camera over the network.
rtph264depay / h264parseDepayloads and parses the incoming RTP/H.264 bitstream.
v4l2h264decHardware-accelerated H.264 video decoder.
teeSplits each decoded stream into parallel branches for simultaneous AI inference and output.
qtimlvconverterPrepares video frames for inference — handles resizing, YUV-to-RGB color space conversion, and pixel normalization.
qtimltfliteExecutes the TFLite inference model on each frame using the Qualcomm NPU.
qtimlpostprocessDecodes raw output tensors into structured bounding boxes and class labels via a dynamically loaded module.
qtimetamuxSynchronizes inference results with the original video stream and attaches them as per-frame structured metadata.
qtivoverlayRenders bounding boxes and labels directly onto video frames for real-time visualization.
qtivcomposerComposes multiple annotated video streams into a single unified output frame.
waylandsinkRenders the composited output to a local Wayland display.
v4l2h264enc / h264parse / mp4mux / filesinkEncodes and saves the output to a local file.
qtirtspbinStreams the output over RTSP for remote viewing.

How it works

1

Stream Ingestion

Each rtspsrc element receives an RTSP stream from an IP camera. The RTP/H.264 payload is depayloaded and parsed, then passed to a hardware-accelerated decoder (v4l2h264dec) to produce raw NV12 frames.
2

Parallel AI Processing

A tee splits each decoded stream into two branches. The first branch feeds qtimlvconverterqtimltfliteqtimlpostprocess for AI inference. The second branch feeds qtimetamux as the reference video.
3

Metadata Synchronization

qtimetamux attaches inference results from the AI branch to the corresponding reference video frames, maintaining per-stream temporal alignment.
4

Visualization

qtivoverlay draws bounding boxes and labels onto each annotated stream.
5

Composition

qtivcomposer tiles the annotated streams side-by-side (or in a configurable grid layout) into a single output frame.
6

Output Delivery

The composited frame is delivered to a Wayland display, saved to file, or streamed over RTSP.

Run application on device

Setup Requirements

Hardware

HW Setup
ComponentDescription
Edge DeviceRB3 Gen 2, IQ8, or IQ9 — Primary processing unit for AI inference and video composition.
IP/RTSP CamerasOne or more IP cameras accessible over the local network via RTSP.
HDMI Display MonitorConnected to the edge device for rendering the composited output.
PoE SwitchPowers IP cameras and provides network connectivity over Ethernet. (Required for IP/RTSP camera setups.)
Local NetworkEnsures the edge device, IP cameras, and host machine are reachable on the same network.

Software

Flash your Qualcomm Edge device by following the device setup and flashing instructions here: <provide QLI device setup and flash instruction link> Once your device is ready, follow the instructions below to set up the Multi-Stream IP Camera pipeline.
AI Model and config files
FileDownloadSave as
YOLOv8 W8A8 modelQualcomm AI Hub — YOLOv8 Detectionyolov8_det_quantized.tflite
Detection labelsyolov8.jsonyolov8.json
Sample videoInput videovideo.mp4
Copy files to device
# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.
# Run from your host machine — replace <user> and <device-ip>

ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media,media/output}"
scp yolov8_det_quantized.tflite   <user>@<device-ip>:$HOME/models/
scp yolov8.json          <user>@<device-ip>:$HOME/labels/
scp video.mp4            <user>@<device-ip>:$HOME/media/
Connect to device
ssh <user>@<device-ip>
Run the Multi-Stream IP Camera Application
A display must be connected to the device. If no display is available, use the --no-display flag to run in headless mode.
Use the following base path for model and label files based on your OS:
gst-multi-stream-ip-camera \
  --input-type=file \
  --input-config=$HOME/media/video.mp4 \
gst-ip-camera \
  --input-type=file \
  --input-config=$HOME/media/video.mp4 \
  --output-type=webrtc \
  --output-config=wss://webrtc.nirbheek.in:8443 \
  --webrtc-id=1010
gst-multi-stream-ip-camera \
  --input-type=rtsp \
  --input-config=rtsp://<camera1-ip>:<port>/stream,rtsp://<camera2-ip>:<port>/stream \
gst-multi-stream-ip-camera \
  --input-type=rtsp \
  --input-config=rtsp://<camera-ip>:<port>/stream \
  --output-type=rtsp \
  --output-config=8900
Note: This example uses an offline video file as input. To use an IP/RTSP camera or USB camera instead, update the --input-type argument accordingly — refer to the Command-Line Options section below for details.

Visualize the Results - Host-Side Visualization (Windows + WSL)

This section describes how to run the visualization client on a Windows host machine using WSL (Windows Subsystem for Linux). The client renders the live composited video stream alongside a real-time AI metadata panel. 📥 The visualization client script can be downloaded here: rtsp_webrtc_client.zip It displays:
  • Left panel — Live composited video stream with AI overlays from all camera inputs.
  • Right panel — Real-time AI metadata (JSON): object detections, bounding boxes, and confidence scores per stream.
Step 1 — Install WSL and Ubuntu If WSL is not already installed, run the following from a Windows terminal:
wsl --install Ubuntu-24.04
Once installed, open the Ubuntu terminal and update the system:
sudo apt update && sudo apt upgrade -y
Step 2 — Install System Dependencies
sudo apt install -y \
  python3 python3-pip python3-gi python3-gi-cairo \
  gir1.2-gstreamer-1.0 \
  gir1.2-gst-plugins-base-1.0 \
  gir1.2-gst-plugins-bad-1.0 \
  gstreamer1.0-tools \
  gstreamer1.0-plugins-base \
  gstreamer1.0-plugins-good \
  gstreamer1.0-plugins-bad \
  gstreamer1.0-plugins-ugly \
  gstreamer1.0-libav \
  python3-websocket \
  libnice10 \
  libnice-dev \
  gstreamer1.0-nice
Step 3 — Run the Visualization Client Script
python3 rtsp_webrtc_client.py rtsp://<DEVICE_IP>:8900/live
python3 rtsp_webrtc_client.py --source webrtc --signalling-server wss://webrtc.nirbheek.in:8443 --peer-id 1010
Step 4 — Expected Output
Panel ContentDescription
LeftReal-time composited video — all streams tiled into a single view with bounding boxes and labels
RightLive AI metadata panel — per-stream object detections, bounding boxes, and confidence scores
Expected Output

Command-Line Options

Selects the video input source for the pipeline.
ValueDescription
rtspIP/RTSP camera stream. Requires --input-config=rtsp://....
fileLocal H.264-encoded video file. Requires --input-config=/path/to/video.mp4.
usbUSB camera. Requires --input-config=/dev/video0.
ispBuilt-in ISP (on-device) camera. Optionally specify a camera ID via --input-config=0.
Specifies the input source configuration corresponding to the selected --input-type.
Input TypeValue
RTSPrtsp://<ip-or-url> (comma-separated for multiple streams)
File/path/to/video.mp4
USB/dev/videoX
ISP<camera ID>
Defines how the processed output video stream is delivered.
ValueDescription
displayRenders composited output on a local Wayland display.
fileSaves the encoded composited output to a file. Requires --output-config.
rtspStreams composited output over RTSP. Requires --output-config=<port>.
webrtcStreams composited output over WebRTC. Requires --output-config=ws://....
noneNo video output (headless mode).
Specifies the output destination configuration corresponding to the selected --output-type.
Output TypeValue
File/path/to/output.mp4
RTSP<port>
WebRTCws://<signalling-server>:<port>
Disables local on-screen rendering. Recommended for headless deployments and remote streaming setups.
Specifies the root directory for AI model, label, and configuration files.
Asset TypeResolved Path
Model files (*.tflite)<base-path>/models/<model_file>
Label/settings files (*.json)<base-path>/labels/<labels_file>
Sets the raw input video resolution and frame rate. Applicable only to ISP and USB inputs.
--width=1920 --height=1080 --framerate=30

Build the Application

Conclusion

The QIM SDK enables developers to build scalable multi-stream IP camera applications without sacrificing performance or flexibility. By leveraging Qualcomm’s dedicated hardware accelerators for decoding, inference, composition, and encoding — and by attaching structured AI metadata directly to each video frame — the SDK delivers a complete, production-ready foundation for multi-camera AI analytics at the edge. Whether the target use case is surveillance, retail analytics, or industrial monitoring, the pipeline can be configured to scale from a single stream to many concurrent feeds with minimal code changes.