qtimlqnn is only available in qcom-multimedia-proprietary-image
For more information on QLI images refer to Qualcomm Linux release

Overview

qtimlqnn is a GStreamer inference element that executes neural network models using the Qualcomm AI Engine Direct (QNN) runtime. The element operates entirely in tensor mode: it accepts input tensors on its sink pad and produces output tensors on its source pad according to the model’s declared input and output specifications. qtimlqnn is designed to run models prepared for the QNN runtime, typically in the form of a QNN context binary. To use this element, the model must first be exported to a QNN-compatible format using the Qualcomm AI Runtime (QAIRT) SDK. For additional details, refer to the QAIRT documentation. The element is limited to model execution. It does not perform preprocessing, tensor reshaping, batching, layout conversion, or model-specific post-processing. These functions are expected to be handled by adjacent elements in the pipeline. qtimlqnn supports multiple QNN execution backends, including CPU, GPU, and NPU targets. This allows the same pipeline structure to be deployed across different hardware configurations and tuned for different performance, latency, and power requirements. The element is intended for real-time and embedded AI pipelines, where inference is one stage in a larger modular processing flow.

Key Responsibilities

qtimlqnn is responsible for:

Loading and executing a QNN model artifact, such as a model library (.so) or cached binary (.bin)
Accepting preformatted input tensors from upstream elements
Producing output tensors that match the model output signature
Negotiating tensor data types and dimensions with adjacent pipeline elements
Propagating tensor metadata required by downstream elements
Managing DMA-backed buffers through GstMLBufferPool to reduce unnecessary memory copies

In practice, qtimlqnn serves as the inference stage in the pipeline, while tensor preparation and result interpretation are handled externally.

Example Pipeline

Download Required Files

File	Download	Save as
Yolov8 Detection W8A8 model	Export from Qualcomm AI Hub	`yolov8_det_w8a8.bin`
Detection labels	yolov8.json	`yolov8.json`
Sample video	Input video	`ai_demo_sample.mp4`

Copy files to device

# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.
# Run from your host machine — replace <user> and <device-ip>

ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media,media/output}"
scp yolov8_det_w8a8.bin          <user>@<device-ip>:$HOME/models/
scp yolov8.json                  <user>@<device-ip>:$HOME/labels/
scp ai_demo_sample.mp4    <user>@<device-ip>:$HOME/media/

Connect to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip>

Set environment variables

Run below command on your device

mkdir -p $HOME/{models,labels,media,media/output}
export MODEL_NAME=yolov8_det_w8a8.bin
export LABELS_NAME=yolov8.json
export SRC_VIDEO_NAME=ai_demo_sample.mp4

Run the pipeline

gst-launch-1.0 -e --gst-debug=2 \
filesrc location=$HOME/media/$SRC_VIDEO_NAME ! qtdemux ! h264parse ! \
v4l2h264dec capture-io-mode=4 output-io-mode=4 ! video/x-raw,format=NV12 ! queue ! \
tee name=t ! queue ! qtimetamux name=obj_mux ! qtivoverlay ! waylandsink fullscreen=true sync=false \
t. ! queue ! qtimlvconverter ! queue ! \
qtimlqnn model=$HOME/models/$MODEL_NAME backend=/usr/lib/libQnnHtp.so tensors="<boxes,scores,class_idx>" ! queue ! \
qtimlpostprocess module=yolov8 labels=$HOME/labels/$LABELS_NAME settings="{\"confidence\": 51.0}" ! text/x-raw ! queue ! obj_mux.

Hierarchy

GObject
   GstObject
      GstElement
         GstBaseTransform
            qtimlqnn

Pad Templates

sink

Capabilities
`neural-network/tensors`	`format: { INT8, UINT8, INT16, UINT16, INT32, UINT32, FLOAT16, FLOAT32 }`
Availability: Always
Direction: sink

src

Capabilities
`neural-network/tensors`	`format: { INT8, UINT8, INT16, UINT16, INT32, UINT32, FLOAT16, FLOAT32 }`
Availability: Always
Direction: source

Element Properties

Property	Description
`backend`	Path to the QNN backend library. Selects the execution backend used for inference. Supported backends include CPU, HTP or NPU, GPU, and DSP implementations depending on the library used. `Type: String` `Default: "/usr/lib/libQnnCpu.so"` `Flags: readable/writable`
`backend-device-id`	Backend device selector. Platform dependent and used for some DSP or HTP variants to select a specific hardware instance. `Type: Unsigned Integer` `Default: 0` `Flags: readable/writable`
`model`	Path to the QNN model file. This property is required and must reference a valid `.so` model or cached `.bin` file. `Type: String` `Default: NULL` `Flags: readable/writable`
`system`	Path to the QNN system library required for QNN runtime initialization. `Type: String` `Default: "/usr/lib/libQnnSystem.so"` `Flags: readable/writable`
`tensors`	Output tensor filter. When set, only the specified output tensor names are emitted on the source pad. When empty, all model outputs are emitted. `Type: GstValueArray of type gchararray` `Default: "< >"` `Flags: readable/writable`

Input and Output Behavior

Input Tensors

qtimlqnn exposes a single sink pad, but it supports both single-input and multi-input models. For multi-input models, all required tensors are delivered through the same sink pad as a tensor set. Input tensors must be fully prepared before they reach qtimlqnn. Expected tensor layout, shape, data type, and batch size are determined by:

The QNN model input signature
Caps negotiation with upstream elements

Typical upstream elements include:

qtimlvconverter — for scaling, color conversion, normalization, and quantization (if required)

qtimlqnn does not modify, reshape, batch, or reinterpret incoming tensors. It passes them to the QNN runtime as received.

Output Tensors

qtimlqnn exposes a single source pad and produces output tensors that follow the model’s declared output signature. Models with multiple output tensors are fully supported, and all outputs are emitted together on the source pad. Supported output behavior includes:

Single-tensor and multi-tensor outputs
Arbitrary tensor shapes and ranks, including batch and depth dimensions
Both quantized and floating-point tensor types
Selective emission of output tensors using the tensors property

The generated output tensors are intended for downstream post-processing stages, which are responsible for decoding model-specific results such as classification outputs, detection results, segmentation masks, landmark data, and other structured inference outputs.

Supported Data Types

qtimlqnn supports the tensor data types provided by the QNN runtime and the selected execution backend, subject to caps negotiation with adjacent elements. Supported data types include:

INT8
UINT8
INT16
UINT16
INT32
UINT32
FLOAT16
FLOAT32

The element does not impose additional data-type restrictions beyond those required by the runtime, the selected backend, and negotiated pipeline caps.

Backends

A QNN backend defines the hardware target used to run a model. Backends allow qtimlqnn to offload inference from the default CPU interpreter to an optimized hardware accelerator. The backend is selected through the backend property and controls how the QNN runtime dispatches model operations during inference.

NPU — libQnnHtp.so

Runs the model on the AI accelerator (NPU).

Backend: Qualcomm AI Accelerator / NPU
Use case: Preferred backend where available. Best performance and power efficiency for quantized models.
Additional configuration: set backend=libQnnHtp.so; optionally set backend-device-id for multi-device platforms

GPU — libQnnGpu.so

Runs supported operations through the QNN GPU backend.

Backend: GPU
Use case: Floating-point models and workloads that benefit from GPU parallelism.
Additional configuration: set backend=/usr/lib/libQnnGpu.so

CPU — libQnnCpu.so

Runs the model on the default QNN CPU backend.

Backend: CPU
Use case: Reference execution, debugging, bring-up, or systems without hardware acceleration.
Additional configuration: none required

Runtime Memory Behavior and GAP Handling

QNN Memory Model

qtimlqnn operates within the memory model of the QNN runtime. The element uses DMA buffers via GstMLBufferPool to minimize memory copies and maintain zero-copy transport where possible. QNN uses runtime-managed memory to allocate:

Input tensors
Intermediate activation tensors
Output tensors

The element discovers input/output tensor metadata (count, shape, type) at model load time and configures buffer pools accordingly.

GAP Buffer Handling

qtimlqnn is GAP-aware and correctly handles input buffers marked with GST_BUFFER_FLAG_GAP. When a GAP buffer is received, the element skips inference and forwards the buffer downstream. This preserves timing and synchronization while explicitly indicating that no valid inference input is available for that timestamp. GAP buffers commonly appear in conditional AI pipelines, such as cascaded workflows where later inference stages run only when earlier stages produce valid regions of interest.

Usage

Single-Stage AI Inference on Live Camera Stream (HTP)

This example demonstrates real-time inference on a live camera stream using a single instance of qtimlqnn with the HTP backend. Inference results are attached to each GstBuffer as MLMeta, allowing downstream elements to access synchronized metadata directly from the frame. An overlay stage then uses this metadata to render annotations such as bounding boxes, labels, or key-points before display or further processing.

Download Required Files

File	Download	Save as
Yolov8 Detection W8A8 model	Export from Qualcomm AI Hub	`yolov8_det_w8a8.bin`
Detection labels	yolov8.json	`yolov8.json`

Copy files to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels}"
scp yolov8_det_w8a8.bin   <user>@<device-ip>:$HOME/models/
scp yolov8.json           <user>@<device-ip>:$HOME/labels/

Connect to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip>

Set environment variables

Run below command on your device

mkdir -p $HOME/{models,labels}
export MODEL_NAME=yolov8_det_w8a8.bin
export LABELS_NAME=yolov8.json

Run the pipeline

gst-launch-1.0 -e --gst-debug=2 \
qticamsrc name=camsrc ! video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 ! queue ! \
tee name=t ! queue ! qtimetamux name=obj_mux ! qtivoverlay ! waylandsink fullscreen=true sync=false \
t. ! queue ! qtimlvconverter ! queue ! \
qtimlqnn model=$HOME/models/$MODEL_NAME backend=/usr/lib/libQnnHtp.so tensors="<boxes,scores,class_idx>" ! queue ! \
qtimlpostprocess module=yolov8 labels=$HOME/labels/$LABELS_NAME settings="{\"confidence\": 51.0}" ! text/x-raw ! queue ! obj_mux.

Single-Stage AI Inference on Live Camera Stream (GPU)

This example demonstrates the same single-stage inference workflow using the GPU backend instead of HTP. This is suitable for floating-point models or workloads that benefit from GPU parallelism.

Download Required Files

File	Download	Save as
Yolov8 Detection Float model	Export from Qualcomm AI Hub	`yolov8_det_float.bin`
Detection labels	yolov8.json	`yolov8.json`

Copy files to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels}"
scp yolov8_det_float.bin   <user>@<device-ip>:$HOME/models/
scp yolov8.json            <user>@<device-ip>:$HOME/labels/

Connect to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip>

Set environment variables

Run below command on your device

export MODEL_NAME=yolov8_det_float.bin
export LABELS_NAME=yolov8.json

Run the pipeline

gst-launch-1.0 -e --gst-debug=2 \
qticamsrc name=camsrc ! video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 ! queue ! \
tee name=t ! queue ! qtimetamux name=obj_mux ! qtivoverlay ! waylandsink fullscreen=true sync=false \
t. ! queue ! qtimlvconverter ! queue ! \
qtimlqnn model=$HOME/models/$MODEL_NAME backend=/usr/lib/libQnnGpu.so tensors="<boxes,scores,class_idx>" ! queue ! \
qtimlpostprocess module=yolov8 labels=$HOME/labels/$LABELS_NAME settings="{\"confidence\": 51.0}" ! text/x-raw ! queue ! obj_mux.

​Overview

​Key Responsibilities

​Example Pipeline

​Hierarchy

​Pad Templates

​sink

​src

​Element Properties

​Input and Output Behavior

​Input Tensors

​Output Tensors

​Supported Data Types

​Backends

​NPU — libQnnHtp.so

​GPU — libQnnGpu.so

​CPU — libQnnCpu.so

​Runtime Memory Behavior and GAP Handling

​QNN Memory Model

​GAP Buffer Handling

​Usage

​Single-Stage AI Inference on Live Camera Stream (HTP)

​Single-Stage AI Inference on Live Camera Stream (GPU)

Overview

Key Responsibilities

Example Pipeline

Hierarchy

Pad Templates

sink

src

Element Properties

Input and Output Behavior

Input Tensors

Output Tensors

Supported Data Types

Backends

NPU — libQnnHtp.so

GPU — libQnnGpu.so

CPU — libQnnCpu.so

Runtime Memory Behavior and GAP Handling

QNN Memory Model

GAP Buffer Handling

Usage

Single-Stage AI Inference on Live Camera Stream (HTP)

Single-Stage AI Inference on Live Camera Stream (GPU)