A real-time object detection pipeline built with IM SDK using a YOLOv8 TensorFlow Lite model, supporting USB cameras, ISP cameras, RTSP streams, and video files — with RTSP/WebRTC output.
Introduction
Managing video feeds from multiple IP cameras simultaneously is a core requirement in modern surveillance, retail analytics, smart city, and industrial monitoring deployments. Processing each stream independently on a CPU is costly and difficult to scale — but with the QIM SDK, multiple concurrent RTSP streams can be decoded, processed through AI inference, and rendered or streamed out in real time, all using Qualcomm’s dedicated hardware accelerators. The QIM SDK’s GStreamer-based plugin architecture offloads compute-intensive tasks — including multi-stream hardware H.264/H.265 decoding, frame preparation, AI inference via the Neural Processing Unit (NPU), and re-encoding — entirely from the CPU to purpose-built hardware blocks. This enables low-latency, power-efficient processing of multiple simultaneous IP camera feeds directly on Qualcomm edge devices. At the core of this use case is a parallel multi-stream pipeline, where each incoming RTSP stream is independently decoded and processed through its own AI inference chain. Results from all streams are then composited into a unified output — either for display, recording, or network streaming. The pipeline supports flexible input configurations, from a single RTSP source up to multiple concurrent streams, making it suitable for scalable, real-world deployments. The complete application source code is available here.Use Case Overview
Video Input
AI Inference
qtimlvconverter), model execution (qtimltflite), and post-processing (qtimlpostprocess) — for per-stream object detection.Metadata Attachment
qtimetamux, maintaining stream–detection correspondence.Visualization
qtivoverlay for real-time visual feedback.Composition
qtivcomposer.Pipeline diagram

Elements used in pipeline
| Element | Description |
|---|---|
rtspsrc | Receives an RTSP stream from an IP camera over the network. |
rtph264depay / h264parse | Depayloads and parses the incoming RTP/H.264 bitstream. |
v4l2h264dec | Hardware-accelerated H.264 video decoder. |
tee | Splits each decoded stream into parallel branches for simultaneous AI inference and output. |
qtimlvconverter | Prepares video frames for inference — handles resizing, YUV-to-RGB color space conversion, and pixel normalization. |
qtimltflite | Executes the TFLite inference model on each frame using the Qualcomm NPU. |
qtimlpostprocess | Decodes raw output tensors into structured bounding boxes and class labels via a dynamically loaded module. |
qtimetamux | Synchronizes inference results with the original video stream and attaches them as per-frame structured metadata. |
qtivoverlay | Renders bounding boxes and labels directly onto video frames for real-time visualization. |
qtivcomposer | Composes multiple annotated video streams into a single unified output frame. |
waylandsink | Renders the composited output to a local Wayland display. |
v4l2h264enc / h264parse / mp4mux / filesink | Encodes and saves the output to a local file. |
qtirtspbin | Streams the output over RTSP for remote viewing. |
How it works
Stream Ingestion
rtspsrc element receives an RTSP stream from an IP camera. The RTP/H.264 payload is depayloaded and parsed, then passed to a hardware-accelerated decoder (v4l2h264dec) to produce raw NV12 frames.Parallel AI Processing
tee splits each decoded stream into two branches. The first branch feeds qtimlvconverter → qtimltflite → qtimlpostprocess for AI inference. The second branch feeds qtimetamux as the reference video.Metadata Synchronization
qtimetamux attaches inference results from the AI branch to the corresponding reference video frames, maintaining per-stream temporal alignment.Composition
qtivcomposer tiles the annotated streams side-by-side (or in a configurable grid layout) into a single output frame.Run application on device
Setup Requirements
Hardware

| Component | Description |
|---|---|
| Edge Device | RB3 Gen 2, IQ8, or IQ9 — Primary processing unit for AI inference and video composition. |
| IP/RTSP Cameras | One or more IP cameras accessible over the local network via RTSP. |
| HDMI Display Monitor | Connected to the edge device for rendering the composited output. |
| PoE Switch | Powers IP cameras and provides network connectivity over Ethernet. (Required for IP/RTSP camera setups.) |
| Local Network | Ensures the edge device, IP cameras, and host machine are reachable on the same network. |
Software
Flash your Qualcomm Edge device by following the device setup and flashing instructions here:<provide QLI device setup and flash instruction link>
Once your device is ready, follow the instructions below to set up the Multi-Stream IP Camera pipeline.
AI Model and config files
| File | Download | Save as |
|---|---|---|
| YOLOv8 W8A8 model | Qualcomm AI Hub — YOLOv8 Detection | yolov8_det_quantized.tflite |
| Detection labels | yolov8.json | yolov8.json |
| Sample video | Input video | video.mp4 |
--no-display flag to run in headless mode.File input (offline test)
File input (offline test)
WebRTC
WebRTC
Display (multi-stream)
Display (multi-stream)
RTSP output
RTSP output
Note: This example uses an offline video file as input. To use an IP/RTSP camera or USB camera instead, update the --input-type argument accordingly — refer to the Command-Line Options section below for details.
Visualize the Results - Host-Side Visualization (Windows + WSL)
This section describes how to run the visualization client on a Windows host machine using WSL (Windows Subsystem for Linux). The client renders the live composited video stream alongside a real-time AI metadata panel. 📥 The visualization client script can be downloaded here: rtsp_webrtc_client.zip It displays:- Left panel — Live composited video stream with AI overlays from all camera inputs.
- Right panel — Real-time AI metadata (JSON): object detections, bounding boxes, and confidence scores per stream.
RTSP
RTSP
WebRTC
WebRTC
| Panel Content | Description |
|---|---|
| Left | Real-time composited video — all streams tiled into a single view with bounding boxes and labels |
| Right | Live AI metadata panel — per-stream object detections, bounding boxes, and confidence scores |

Command-Line Options
--input-type
--input-type
| Value | Description |
|---|---|
rtsp | IP/RTSP camera stream. Requires --input-config=rtsp://.... |
file | Local H.264-encoded video file. Requires --input-config=/path/to/video.mp4. |
usb | USB camera. Requires --input-config=/dev/video0. |
isp | Built-in ISP (on-device) camera. Optionally specify a camera ID via --input-config=0. |
--input-config
--input-config
--input-type.| Input Type | Value |
|---|---|
| RTSP | rtsp://<ip-or-url> (comma-separated for multiple streams) |
| File | /path/to/video.mp4 |
| USB | /dev/videoX |
| ISP | <camera ID> |
--output-type
--output-type
| Value | Description |
|---|---|
display | Renders composited output on a local Wayland display. |
file | Saves the encoded composited output to a file. Requires --output-config. |
rtsp | Streams composited output over RTSP. Requires --output-config=<port>. |
webrtc | Streams composited output over WebRTC. Requires --output-config=ws://.... |
none | No video output (headless mode). |
--output-config
--output-config
--output-type.| Output Type | Value |
|---|---|
| File | /path/to/output.mp4 |
| RTSP | <port> |
| WebRTC | ws://<signalling-server>:<port> |
--no-display
--no-display
--model-base-path
--model-base-path
| Asset Type | Resolved Path |
|---|---|
Model files (*.tflite) | <base-path>/models/<model_file> |
Label/settings files (*.json) | <base-path>/labels/<labels_file> |
--width / --height / --framerate
--width / --height / --framerate
Build the Application
- Source code: gst-multi-stream-ip-camera
- Build instructions: Steps to build custom application
