Multi Stream IP Camera application - Qualcomm Intelligent Multimedia SDK

QIMSDK · Qualcomm

Computer Vision

A real-time object detection pipeline built with IM SDK using a YOLOv8 TensorFlow Lite model, supporting USB cameras, ISP cameras, RTSP streams, and video files — with RTSP/WebRTC output.

QIMSDK Team·Apr 10, 2026·← All posts

Introduction

Managing video feeds from multiple IP cameras simultaneously is a core requirement in modern surveillance, retail analytics, smart city, and industrial monitoring deployments. Processing each stream independently on a CPU is costly and difficult to scale — but with the QIM SDK, multiple concurrent RTSP streams can be decoded, processed through AI inference, and rendered or streamed out in real time, all using Qualcomm’s dedicated hardware accelerators. The QIM SDK’s GStreamer-based plugin architecture offloads compute-intensive tasks — including multi-stream hardware H.264/H.265 decoding, frame preparation, AI inference via the Neural Processing Unit (NPU), and re-encoding — entirely from the CPU to purpose-built hardware blocks. This enables low-latency, power-efficient processing of multiple simultaneous IP camera feeds directly on Qualcomm edge devices. At the core of this use case is a parallel multi-stream pipeline, where each incoming RTSP stream is independently decoded and processed through its own AI inference chain. Results from all streams are then composited into a unified output — either for display, recording, or network streaming. The pipeline supports flexible input configurations, from a single RTSP source up to multiple concurrent streams, making it suitable for scalable, real-world deployments. The complete application source code is available here.

Use Case Overview

Video Input

Accepts multiple concurrent RTSP streams from IP cameras, each decoded independently using hardware-accelerated H.264/H.265 decoding.

AI Inference

Each decoded stream is passed through a dedicated AI inference chain — including preprocessing (qtimlvconverter), model execution (qtimltflite), and post-processing (qtimlpostprocess) — for per-stream object detection.

Metadata Attachment

Inference results are attached to each stream as structured per-frame metadata via qtimetamux, maintaining stream–detection correspondence.

Visualization

Bounding boxes and labels are rendered onto each stream’s frames via qtivoverlay for real-time visual feedback.

Composition

Annotated frames from all streams are composited into a unified output layout using qtivcomposer.

Output

The composited stream is delivered to a local Wayland display, encoded and saved to file, or streamed over RTSP/WebRTC.

Pipeline diagram

Elements used in pipeline

Element	Description
`rtspsrc`	Receives an RTSP stream from an IP camera over the network.
`rtph264depay / h264parse`	Depayloads and parses the incoming RTP/H.264 bitstream.
`v4l2h264dec`	Hardware-accelerated H.264 video decoder.
`tee`	Splits each decoded stream into parallel branches for simultaneous AI inference and output.
`qtimlvconverter`	Prepares video frames for inference — handles resizing, YUV-to-RGB color space conversion, and pixel normalization.
`qtimltflite`	Executes the TFLite inference model on each frame using the Qualcomm NPU.
`qtimlpostprocess`	Decodes raw output tensors into structured bounding boxes and class labels via a dynamically loaded module.
`qtimetamux`	Synchronizes inference results with the original video stream and attaches them as per-frame structured metadata.
`qtivoverlay`	Renders bounding boxes and labels directly onto video frames for real-time visualization.
`qtivcomposer`	Composes multiple annotated video streams into a single unified output frame.
`waylandsink`	Renders the composited output to a local Wayland display.
`v4l2h264enc / h264parse / mp4mux / filesink`	Encodes and saves the output to a local file.
`qtirtspbin`	Streams the output over RTSP for remote viewing.

How it works

Stream Ingestion

Each rtspsrc element receives an RTSP stream from an IP camera. The RTP/H.264 payload is depayloaded and parsed, then passed to a hardware-accelerated decoder (v4l2h264dec) to produce raw NV12 frames.

Parallel AI Processing

A tee splits each decoded stream into two branches. The first branch feeds qtimlvconverter → qtimltflite → qtimlpostprocess for AI inference. The second branch feeds qtimetamux as the reference video.

Metadata Synchronization

qtimetamux attaches inference results from the AI branch to the corresponding reference video frames, maintaining per-stream temporal alignment.

Visualization

qtivoverlay draws bounding boxes and labels onto each annotated stream.

Composition

qtivcomposer tiles the annotated streams side-by-side (or in a configurable grid layout) into a single output frame.

Output Delivery

The composited frame is delivered to a Wayland display, saved to file, or streamed over RTSP.

Run application on device

Setup Requirements

Hardware

Component	Description
Edge Device	RB3 Gen 2, IQ8, or IQ9 — Primary processing unit for AI inference and video composition.
IP/RTSP Cameras	One or more IP cameras accessible over the local network via RTSP.
HDMI Display Monitor	Connected to the edge device for rendering the composited output.
PoE Switch	Powers IP cameras and provides network connectivity over Ethernet. (Required for IP/RTSP camera setups.)
Local Network	Ensures the edge device, IP cameras, and host machine are reachable on the same network.

Software

Flash your Qualcomm Edge device by following the device setup and flashing instructions here: <provide QLI device setup and flash instruction link> Once your device is ready, follow the instructions below to set up the Multi-Stream IP Camera pipeline.

AI Model and config files

File	Download	Save as
YOLOv8 W8A8 model	Qualcomm AI Hub — YOLOv8 Detection	`yolov8_det_quantized.tflite`
Detection labels	yolov8.json	`yolov8.json`
Sample video	Input video	`video.mp4`

Copy files to device

# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.
# Run from your host machine — replace <user> and <device-ip>

ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media,media/output}"
scp yolov8_det_quantized.tflite   <user>@<device-ip>:$HOME/models/
scp yolov8.json          <user>@<device-ip>:$HOME/labels/
scp video.mp4            <user>@<device-ip>:$HOME/media/

Connect to device

ssh <user>@<device-ip>

Run the Multi-Stream IP Camera Application

A display must be connected to the device. If no display is available, use the --no-display flag to run in headless mode.

Use the following base path for model and label files based on your OS:

File input (offline test)

gst-multi-stream-ip-camera \
  --input-type=file \
  --input-config=$HOME/media/video.mp4 \

WebRTC

gst-ip-camera \
  --input-type=file \
  --input-config=$HOME/media/video.mp4 \
  --output-type=webrtc \
  --output-config=wss://webrtc.nirbheek.in:8443 \
  --webrtc-id=1010

Display (multi-stream)

gst-multi-stream-ip-camera \
  --input-type=rtsp \
  --input-config=rtsp://<camera1-ip>:<port>/stream,rtsp://<camera2-ip>:<port>/stream \

RTSP output

gst-multi-stream-ip-camera \
  --input-type=rtsp \
  --input-config=rtsp://<camera-ip>:<port>/stream \
  --output-type=rtsp \
  --output-config=8900

Note: This example uses an offline video file as input. To use an IP/RTSP camera or USB camera instead, update the --input-type argument accordingly — refer to the Command-Line Options section below for details.

Visualize the Results - Host-Side Visualization (Windows + WSL)

This section describes how to run the visualization client on a Windows host machine using WSL (Windows Subsystem for Linux). The client renders the live composited video stream alongside a real-time AI metadata panel. 📥 The visualization client script can be downloaded here: rtsp_webrtc_client.zip It displays:

Left panel — Live composited video stream with AI overlays from all camera inputs.
Right panel — Real-time AI metadata (JSON): object detections, bounding boxes, and confidence scores per stream.

Step 1 — Install WSL and Ubuntu If WSL is not already installed, run the following from a Windows terminal:

wsl --install Ubuntu-24.04

Once installed, open the Ubuntu terminal and update the system:

sudo apt update && sudo apt upgrade -y

Step 2 — Install System Dependencies

sudo apt install -y \
  python3 python3-pip python3-gi python3-gi-cairo \
  gir1.2-gstreamer-1.0 \
  gir1.2-gst-plugins-base-1.0 \
  gir1.2-gst-plugins-bad-1.0 \
  gstreamer1.0-tools \
  gstreamer1.0-plugins-base \
  gstreamer1.0-plugins-good \
  gstreamer1.0-plugins-bad \
  gstreamer1.0-plugins-ugly \
  gstreamer1.0-libav \
  python3-websocket \
  libnice10 \
  libnice-dev \
  gstreamer1.0-nice

Step 3 — Run the Visualization Client Script

RTSP

python3 rtsp_webrtc_client.py rtsp://<DEVICE_IP>:8900/live

WebRTC

python3 rtsp_webrtc_client.py --source webrtc --signalling-server wss://webrtc.nirbheek.in:8443 --peer-id 1010

Step 4 — Expected Output

Panel Content	Description
Left	Real-time composited video — all streams tiled into a single view with bounding boxes and labels
Right	Live AI metadata panel — per-stream object detections, bounding boxes, and confidence scores

Command-Line Options

--input-type

Selects the video input source for the pipeline.

Value	Description
`rtsp`	IP/RTSP camera stream. Requires `--input-config=rtsp://...`.
`file`	Local H.264-encoded video file. Requires `--input-config=/path/to/video.mp4`.
`usb`	USB camera. Requires `--input-config=/dev/video0`.
`isp`	Built-in ISP (on-device) camera. Optionally specify a camera ID via `--input-config=0`.

--input-config

Specifies the input source configuration corresponding to the selected --input-type.

Input Type	Value
RTSP	`rtsp://<ip-or-url>` (comma-separated for multiple streams)
File	`/path/to/video.mp4`
USB	`/dev/videoX`
ISP	`<camera ID>`

--output-type

Defines how the processed output video stream is delivered.

Value	Description
`display`	Renders composited output on a local Wayland display.
`file`	Saves the encoded composited output to a file. Requires `--output-config`.
`rtsp`	Streams composited output over RTSP. Requires `--output-config=<port>`.
`webrtc`	Streams composited output over WebRTC. Requires `--output-config=ws://...`.
`none`	No video output (headless mode).

--output-config

Specifies the output destination configuration corresponding to the selected --output-type.

Output Type	Value
File	`/path/to/output.mp4`
RTSP	`<port>`
WebRTC	`ws://<signalling-server>:<port>`

--no-display

Disables local on-screen rendering. Recommended for headless deployments and remote streaming setups.

--model-base-path

Specifies the root directory for AI model, label, and configuration files.

Asset Type	Resolved Path
Model files (`*.tflite`)	`<base-path>/models/<model_file>`
Label/settings files (`*.json`)	`<base-path>/labels/<labels_file>`

--width / --height / --framerate

Sets the raw input video resolution and frame rate. Applicable only to ISP and USB inputs.

--width=1920 --height=1080 --framerate=30

Build the Application

Source code: gst-multi-stream-ip-camera
Build instructions: Steps to build custom application

Conclusion

The QIM SDK enables developers to build scalable multi-stream IP camera applications without sacrificing performance or flexibility. By leveraging Qualcomm’s dedicated hardware accelerators for decoding, inference, composition, and encoding — and by attaching structured AI metadata directly to each video frame — the SDK delivers a complete, production-ready foundation for multi-camera AI analytics at the edge. Whether the target use case is surveillance, retail analytics, or industrial monitoring, the pipeline can be configured to scale from a single stream to many concurrent feeds with minimal code changes.

​Introduction

​Use Case Overview

​Pipeline diagram

​Elements used in pipeline

​How it works

​Run application on device

​Setup Requirements

​Hardware

​Software

AI Model and config files

​Visualize the Results - Host-Side Visualization (Windows + WSL)

​Command-Line Options

​Build the Application

​Conclusion

Introduction

Use Case Overview

Pipeline diagram

Elements used in pipeline

How it works

Run application on device

Setup Requirements

Hardware

Software

Visualize the Results - Host-Side Visualization (Windows + WSL)

Command-Line Options

Build the Application

Conclusion