> ## Documentation Index > Fetch the complete documentation index at: https://imsdkdocs.qualcomm.com/llms.txt > Use this file to discover all available pages before exploring further. # Building a Scalable Multi-Stream AI Video Wall with IM SDK > Process up to 31 concurrent IP camera streams in parallel with YOLOv8 object detection, compositing all streams into a single output and streaming over RTSP/WebRTC.

QIMSDK · Qualcomm

Multi-Stream AI

Process up to 31 concurrent IP camera streams in parallel with YOLOv8 object detection, compositing all streams into a unified video wall output and streaming over RTSP or WebRTC.

QIMSDK Team · Jun 14, 2026 · ← All posts

## Introduction Modern security systems rarely depend on a single camera. In real deployments, operators need to monitor dozens of streams simultaneously across warehouses, campuses, retail spaces, and large environments. As camera counts grow, manual monitoring becomes impractical, and traditional CPU-only processing struggles to keep up — increasing latency, power usage, and system load. The IM SDK addresses this directly. It enables scalable, real-time multi-stream video analytics at the edge using hardware-accelerated GStreamer plugins that shift demanding work — decoding, frame preparation, AI inference, and encoding — entirely onto dedicated hardware blocks. Frame preparation includes resizing frames to match model input, converting YUV to RGB, and normalizing pixel values for neural network input. The pipeline processes each stream independently, running object detection on every input source while preserving a consistent visual experience. To improve efficiency, inference runs at a reduced frame rate without affecting the continuity of the displayed output. Detection results are rendered as RGBA overlay masks and composited with the corresponding video feeds, allowing object visualization without modifying source frames directly. Integration with Qualcomm AI Hub gives developers access to optimized, ready-to-use models such as YOLO-based detectors for multi-stream analytics. The complete application source code is available [here](https://github.com/qualcomm/gst-plugins-imsdk/tree/main/gst-sample-apps/gst-video-wall). ## Use Case Overview Each RTSP camera provides an H.264/H.265-encoded stream decoded into raw YUV frames for inference and visualization. `videorate` controls the inference processing rate to improve performance while preserving temporal accuracy in the displayed output. Each decoded frame is analyzed independently by an object detection model running on the Qualcomm NPU. The model produces an RGBA overlay mask containing bounding boxes and class labels over a transparent background. [`qtivcomposer`](../plugin-reference/qtivcomposer) composites all stream masks onto their corresponding video frames and tiles them into a configurable M×N grid (up to 8×4). [`qtimetamux`](../plugin-reference/qtimetamux) attaches detection results as structured per-frame metadata synchronized with the video stream. The final composited stream is displayed locally on an HDMI monitor or streamed over RTSP/WebRTC with structured metadata transmitted in parallel. ## Pipeline diagram Security Video Wall Pipeline

## Elements used in pipeline | Element | Description | | ---------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | | `source` | Accepts input from an RTSP camera, USB camera, or local file source. | | `tee` | Splits each decoded stream into parallel branches for simultaneous display and AI inference. | | `videorate` | Adjusts the video frame rate — reduces rate by half to lower compute load while maintaining display continuity. | | [`qtimlvconverter`](../plugin-reference/qtimlvconverter) | Prepares frames for inference — resizes, converts YUV to RGB, and normalizes input to match model requirements. | | [`qtimltflite`](../plugin-reference/qtimltflite) | Runs the TFLite object detection model on each frame using the Qualcomm NPU via the QNN external delegate. | | [`qtimlpostprocess`](../plugin-reference/qtimlpostprocess) | Converts raw output tensors into structured bounding boxes and labels via a dynamically loaded module. | | [`qtimetamux`](../plugin-reference/qtimetamux) | Synchronizes inference results with the original video stream as per-frame structured metadata. | | [`qtivcomposer`](../plugin-reference/qtivcomposer) | Composites video from all streams into a single 8×4 grid output and overlays RGBA masks onto corresponding YUV frames. | | `v4l2h264enc` / `h264parse` | Encodes the composited stream into H.264 format for transmission. | | [`waylandsink`](../plugin-reference/waylandsink) | Displays the composited video locally on the device. | | `sink` | Streams the encoded video and metadata over RTSP or WebRTC via `rtspbin` or `webrtcbin`. | ## How it works Each `rtspsrc` element receives an RTSP stream. The RTP/H.264 payload is depayloaded, parsed, and decoded into raw NV12 frames by the hardware decoder. A `tee` splits each decoded stream into two branches: one forwards frames directly to the compositor, the other runs AI inference at a reduced frame rate. The inference branch runs through [`qtimlvconverter`](../plugin-reference/qtimlvconverter) → [`qtimltflite`](../plugin-reference/qtimltflite) → [`qtimlpostprocess`](../plugin-reference/qtimlpostprocess) producing an RGBA overlay mask. [`qtivcomposer`](../plugin-reference/qtivcomposer) tiles all annotated streams into an 8×4 grid. If detection runs at a lower rate than the input, the most recent mask is reused to maintain a stable overlay. The composited frame is delivered to a Wayland display, saved to file, or streamed over RTSP/WebRTC. ## Run application on device ### Setup Requirements #### Hardware Hardware Setup

| Component | Description | | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------- | | **Edge Device** | IQ9 — Primary processing unit for AI inference and video composition. | | **Camera Source** | IP/RTSP cameras. A local file source may be substituted if no physical camera is available. | | **HDMI Display Monitor** | Connected to the edge device for rendering and visualizing pipeline output. | | **PoE Switch** | Powers IP/RTSP cameras and provides network connectivity over a single Ethernet cable per camera. (Required for IP/RTSP setups only.) | | **Local Network** | Ensures the edge device, cameras, and host machine are reachable on the same network. (Required for RTSP input or RTSP/WebRTC output.) | #### Software **Flash your Qualcomm Edge device** by following the device setup and flashing instructions [here](../installation). **Once your device is ready**, follow the instructions below to set up the Security Video Wall pipeline. ##### AI Model and config files | File | Download | Save as | | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------- | | YOLOv8 W8A8 model | [Qualcomm AI Hub — YOLOv8 Detection](https://aihub.qualcomm.com/iot/models/yolov8_det) | `yolov8_det_quantized.tflite` | | Detection labels | yolov8.json | `yolov8.json` | | Sample video | Input video | `video.mp4` | **Copy files to device** ```bash SCP (SSH) theme={null} # Replace $HOME to the appropriate device path before running the commands. # For QLI: /root # For Ubuntu: /home/ubuntu # Modify this based on your platform and ensure files are copied to the correct location on the device. ssh @ "mkdir -p $HOME/{models,labels,media,media/output}" scp yolov8_det_quantized.tflite @:$HOME/models/ scp yolov8.json @:$HOME/labels/ scp video.mp4 @:$HOME/media/ ``` **Connect to device** ```bash theme={null} ssh @ ``` **Run the Security Video Wall** > **Note:** A display must be connected to the device. If no display is available, use the `--no-display` flag. ```bash theme={null} ulimit -n 16192 && \ gst-video-wall \ --input-count=31 \ $(for i in $(seq 1 31); do echo "--input-type=file --input-config=$HOME/media/video.mp4"; done) \ --output-type=rtsp \ --output-config=8900 ``` ```bash theme={null} ulimit -n 16192 && \ gst-video-wall \ --input-count=31 \ $(for i in $(seq 1 31); do echo "--input-type=file --input-config=$HOME/media/video.mp4"; done) \ --output-type=webrtc \ --output-config=wss://webrtc.nirbheek.in:8443 \ --webrtc-id=1010 ``` ```bash theme={null} ulimit -n 16192 && \ gst-video-wall \ --input-count=4 \ $(for i in $(seq 1 4); do echo "--input-type=file --input-config=$HOME/media/video.mp4"; done) ``` > **Note:** This example uses an offline video file as input. To use IP/RTSP cameras, update `--input-type=rtsp` and `--input-config=rtsp://...` accordingly. It produces an AI-annotated video stream. To visualize the results, refer to the **Host-Side Visualization** section below. ## Visualize the Results - Host-Side Visualization (Windows + WSL) This section describes how to run the visualization client on a Windows host machine using **WSL (Windows Subsystem for Linux)**. The client renders the live composited video stream alongside a real-time AI metadata panel. 📥 The visualization client script can be downloaded here: rtsp\_webrtc\_client.zip It displays: * **Left panel** — Live composited video stream with AI overlays from all camera inputs. * **Right panel** — Real-time AI metadata (JSON): object detections, bounding boxes, and confidence scores per stream. **Step 1 — Install WSL and Ubuntu** If WSL is not already installed, run the following from a Windows terminal: ```bash theme={null} wsl --install Ubuntu-24.04 ``` Once installed, update the system: ```bash theme={null} sudo apt update && sudo apt upgrade -y ``` **Step 2 — Install System Dependencies** ```bash theme={null} sudo apt install -y \ python3 python3-pip python3-gi python3-gi-cairo \ gir1.2-gstreamer-1.0 \ gir1.2-gst-plugins-base-1.0 \ gir1.2-gst-plugins-bad-1.0 \ gstreamer1.0-tools \ gstreamer1.0-plugins-base \ gstreamer1.0-plugins-good \ gstreamer1.0-plugins-bad \ gstreamer1.0-plugins-ugly \ gstreamer1.0-libav \ python3-websocket \ libnice10 \ libnice-dev \ gstreamer1.0-nice ``` **Step 3 — Run the Visualization Client Script** ```bash theme={null} python3 rtsp_webrtc_client.py rtsp://:8900/live ``` ```bash theme={null} python3 rtsp_webrtc_client.py --source webrtc --signalling-server wss://webrtc.nirbheek.in:8443 --peer-id 1010 ``` **Step 4 — Expected Output** | Panel | Content | | ----- | --------------------------------------------------------------------------------------- | | Left | Real-time composited video — all streams tiled in a grid with bounding boxes and labels | | Right | Live AI metadata — per-stream object detections, bounding boxes, and confidence scores | Expected Output

Beyond the default setup, the application offers flexible input and output configurations that can be tailored via command-line options, as described below: ## Command-Line Options Specifies the total number of input video streams. Must match the number of `--input-type` and `--input-config` entries. Valid range: 1 to 31. ```bash theme={null} --input-count=4 ``` Selects the video input source for each stream. | Value | Description | | ------ | ----------------------------------------------------------------------------- | | `rtsp` | External IP/RTSP camera. Requires `--input-config=rtsp://...`. | | `file` | Local H.264-encoded video file. Requires `--input-config=/path/to/video.mp4`. | Specifies the input source configuration for the selected `--input-type`. | Input Type | Value | | ---------- | -------------------- | | RTSP | `rtsp://` | | File | `/path/to/video.mp4` | Defines how the processed output is delivered. | Value | Description | | -------- | ------------------------------------------------------------------------------------------------ | | `none` | No video output (headless mode). | | `file` | Save encoded output to a file. Requires `--output-config`. | | `rtsp` | Stream over RTSP. Requires `--output-config=`. Access at `rtsp://:/live`. | | `webrtc` | Stream over WebRTC. Requires `--output-config=ws://...`. | Specifies the output destination configuration. | Output Type | Value | | ----------- | --------------------------------- | | File | `/path/to/output.mp4` | | RTSP | `` | | WebRTC | `ws://:` | Root directory for AI model, label, and configuration files. | Asset Type | Resolved Path | | ------------------------------- | ---------------------------------- | | Model files (`*.tflite`) | `/models/` | | Label/settings files (`*.json`) | `/labels/` | ```bash theme={null} --model-base-path=/root # QLI --model-base-path=/home/ubuntu # Ubuntu ``` Disables local on-screen rendering. Recommended for headless deployments, remote streaming (RTSP/WebRTC), or performance optimization. Specifies the video frame rate for the input stream. ```bash theme={null} --num-npus=N ``` Specifies the local WebRTC signaling client ID. ```bash theme={null} --webrtc-id=1010 ``` ## Implementation Deep-Dive The application separates user configuration from runtime state. ```c theme={null} typedef struct GstAppConfig { gint input_count; gchar **input_types; gchar **input_configs; gchar *output_type; gchar *output_config; gchar *model_base_path; gboolean no_display; gint width, height, framerate, webrtc_id; } GstAppConfig; typedef struct GstAppContext { GstAppConfig config; GstElement *pipeline; GMainLoop *mloop; GstAppPadLinkData qtdemux_links[GST_APP_MAX_INPUTS]; GstAppPadLinkData rtspsrc_links[GST_APP_MAX_INPUTS]; GstElement *webrtc; gboolean is_shutting_down; } GstAppContext; ``` The pipeline is assembled from three logical sections: input branch, output branch, and application-specific user branch. ```c theme={null} static gboolean gst_app_create_pipe (GstAppContext *appctx) { GstElement *input_tails[GST_APP_MAX_INPUTS] = { NULL }; GstElement *output_head = NULL; appctx->pipeline = gst_pipeline_new ("gst-video-wall"); if (!gst_app_create_input_pipe (appctx, input_tails)) return FALSE; if (!gst_app_create_output_pipe (appctx, &output_head)) return FALSE; if (!gst_app_create_user_pipe (appctx, input_tails, output_head)) return FALSE; return TRUE; } ``` Constants define the maximum input count and compositor layout. ```c theme={null} #define GST_APP_MAX_INPUTS 31 #define GST_APP_COMPOSER_COLUMNS 8 #define GST_APP_COMPOSER_ROWS 4 #define GST_APP_COMPOSER_CELL_WIDTH 240 #define GST_APP_COMPOSER_CELL_HEIGHT 135 #define MODEL_PATH "yolov8_det_quantized.tflite" #define LABELS_PATH "yolov8.json" ``` Each input stream contributes one direct video layer and one RGBA overlay layer to the same grid position in the compositor. WebRTC signaling uses explicit SDP offer/answer and ICE candidate exchange via WebSocket with `libsoup`. ```c theme={null} g_signal_emit_by_name (webrtcbin, "create-data-channel", name, NULL, &ch); GstPromise *promise = gst_promise_new_with_change_func (on_offer_created, appctx, NULL); g_signal_emit_by_name (webrtcbin, "create-offer", NULL, promise); g_signal_connect (appctx->webrtc, "on-ice-candidate", G_CALLBACK (on_webrtc_ice_candidate), appctx); ``` | Callback | Responsibility | | ------------------ | ------------------------------------------------ | | `on_offer_created` | Constructs and sends the SDP offer | | `on_ice_candidate` | Transmits ICE candidates to the signaling server | | `on_ws_message` | Handles incoming WebSocket signaling messages | ## Build the Application * **Source code:** [gst-video-wall](https://github.com/qualcomm/gst-plugins-imsdk/tree/main/gst-sample-apps/gst-video-wall) * **Build instructions:** [Steps to build custom application](../advanced/ubuntu-build#steps-to-build-custom-application) ## Conclusion The IM SDK modular architecture gives developers flexibility and control when building real-time multi-stream video analytics pipelines. By separating inference from post-processing, the post-processing stage can be customized without changing the model execution path. This keeps AI results and video frames decoupled while enabling efficient visualization — post-processing generates an RGBA overlay mask composited onto the original frame without duplicating video data, delivering lower latency, reduced memory overhead, and better scalability for real-time AI video applications.