Skip to main content
qtimlqnn is only available in qcom-multimedia-proprietary-image
For more information on QLI images refer to Qualcomm Linux release

Overview

qtimlqnn is a GStreamer inference element that executes neural network models using the Qualcomm AI Engine Direct (QNN) runtime. The element operates entirely in tensor mode: it accepts input tensors on its sink pad and produces output tensors on its source pad according to the model’s declared input and output specifications. qtimlqnn is designed to run models prepared for the QNN runtime, typically in the form of a QNN context binary. To use this element, the model must first be exported to a QNN-compatible format using the Qualcomm AI Runtime (QAIRT) SDK. For additional details, refer to the QAIRT documentation. The element is limited to model execution. It does not perform preprocessing, tensor reshaping, batching, layout conversion, or model-specific post-processing. These functions are expected to be handled by adjacent elements in the pipeline. qtimlqnn supports multiple QNN execution backends, including CPU, GPU, and NPU targets. This allows the same pipeline structure to be deployed across different hardware configurations and tuned for different performance, latency, and power requirements. The element is intended for real-time and embedded AI pipelines, where inference is one stage in a larger modular processing flow.

Key Responsibilities

qtimlqnn is responsible for:
  • Loading and executing a QNN model artifact, such as a model library (.so) or cached binary (.bin)
  • Accepting preformatted input tensors from upstream elements
  • Producing output tensors that match the model output signature
  • Negotiating tensor data types and dimensions with adjacent pipeline elements
  • Propagating tensor metadata required by downstream elements
  • Managing DMA-backed buffers through GstMLBufferPool to reduce unnecessary memory copies
In practice, qtimlqnn serves as the inference stage in the pipeline, while tensor preparation and result interpretation are handled externally.

Example Pipeline

1

Download Required Files

FileDownloadSave as
Yolov8 Detection W8A8 modelExport from Qualcomm AI Hubyolov8_det_w8a8.bin
Detection labelsyolov8.jsonyolov8.json
Sample videoInput videoai_demo_sample.mp4
2

Copy files to device

# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.
# Run from your host machine — replace <user> and <device-ip>

ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media,media/output}"
scp yolov8_det_w8a8.bin          <user>@<device-ip>:$HOME/models/
scp yolov8.json                  <user>@<device-ip>:$HOME/labels/
scp ai_demo_sample.mp4    <user>@<device-ip>:$HOME/media/
3

Connect to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip>
4

Set environment variables

Run below command on your device
mkdir -p $HOME/{models,labels,media,media/output}
export MODEL_NAME=yolov8_det_w8a8.bin
export LABELS_NAME=yolov8.json
export SRC_VIDEO_NAME=ai_demo_sample.mp4
5

Run the pipeline

gst-launch-1.0 -e --gst-debug=2 \
filesrc location=$HOME/media/$SRC_VIDEO_NAME ! qtdemux ! h264parse ! \
v4l2h264dec capture-io-mode=4 output-io-mode=4 ! video/x-raw,format=NV12 ! queue ! \
tee name=t ! queue ! qtimetamux name=obj_mux ! qtivoverlay ! waylandsink fullscreen=true sync=false \
t. ! queue ! qtimlvconverter ! queue ! \
qtimlqnn model=$HOME/models/$MODEL_NAME backend=/usr/lib/libQnnHtp.so tensors="<boxes,scores,class_idx>" ! queue ! \
qtimlpostprocess module=yolov8 labels=$HOME/labels/$LABELS_NAME settings="{\"confidence\": 51.0}" ! text/x-raw ! queue ! obj_mux.

Hierarchy

GObject
   GstObject
      GstElement
         GstBaseTransform
            qtimlqnn

Pad Templates

sink

Capabilities
neural-network/tensorsformat: { INT8, UINT8, INT16, UINT16, INT32, UINT32, FLOAT16, FLOAT32 }
Availability: Always
Direction: sink

src

Capabilities
neural-network/tensorsformat: { INT8, UINT8, INT16, UINT16, INT32, UINT32, FLOAT16, FLOAT32 }
Availability: Always
Direction: source

Element Properties

PropertyDescription
backendPath to the QNN backend library. Selects the execution backend used for inference. Supported backends include CPU, HTP or NPU, GPU, and DSP implementations depending on the library used.

Type: String
Default: "/usr/lib/libQnnCpu.so"
Flags: readable/writable
backend-device-idBackend device selector. Platform dependent and used for some DSP or HTP variants to select a specific hardware instance.

Type: Unsigned Integer
Default: 0
Flags: readable/writable
modelPath to the QNN model file. This property is required and must reference a valid .so model or cached .bin file.

Type: String
Default: NULL
Flags: readable/writable
systemPath to the QNN system library required for QNN runtime initialization.

Type: String
Default: "/usr/lib/libQnnSystem.so"
Flags: readable/writable
tensorsOutput tensor filter. When set, only the specified output tensor names are emitted on the source pad. When empty, all model outputs are emitted.

Type: GstValueArray of type gchararray
Default: "< >"
Flags: readable/writable

Input and Output Behavior

Input Tensors

qtimlqnn exposes a single sink pad, but it supports both single-input and multi-input models. For multi-input models, all required tensors are delivered through the same sink pad as a tensor set. Input tensors must be fully prepared before they reach qtimlqnn. Expected tensor layout, shape, data type, and batch size are determined by:
  • The QNN model input signature
  • Caps negotiation with upstream elements
Typical upstream elements include:
  • qtimlvconverter — for scaling, color conversion, normalization, and quantization (if required)
qtimlqnn does not modify, reshape, batch, or reinterpret incoming tensors. It passes them to the QNN runtime as received.

Output Tensors

qtimlqnn exposes a single source pad and produces output tensors that follow the model’s declared output signature. Models with multiple output tensors are fully supported, and all outputs are emitted together on the source pad. Supported output behavior includes:
  • Single-tensor and multi-tensor outputs
  • Arbitrary tensor shapes and ranks, including batch and depth dimensions
  • Both quantized and floating-point tensor types
  • Selective emission of output tensors using the tensors property
The generated output tensors are intended for downstream post-processing stages, which are responsible for decoding model-specific results such as classification outputs, detection results, segmentation masks, landmark data, and other structured inference outputs.

Supported Data Types

qtimlqnn supports the tensor data types provided by the QNN runtime and the selected execution backend, subject to caps negotiation with adjacent elements. Supported data types include:
  • INT8
  • UINT8
  • INT16
  • UINT16
  • INT32
  • UINT32
  • FLOAT16
  • FLOAT32
The element does not impose additional data-type restrictions beyond those required by the runtime, the selected backend, and negotiated pipeline caps.

Backends

A QNN backend defines the hardware target used to run a model. Backends allow qtimlqnn to offload inference from the default CPU interpreter to an optimized hardware accelerator. The backend is selected through the backend property and controls how the QNN runtime dispatches model operations during inference.

NPU — libQnnHtp.so

Runs the model on the AI accelerator (NPU).
  • Backend: Qualcomm AI Accelerator / NPU
  • Use case: Preferred backend where available. Best performance and power efficiency for quantized models.
  • Additional configuration: set backend=libQnnHtp.so; optionally set backend-device-id for multi-device platforms

GPU — libQnnGpu.so

Runs supported operations through the QNN GPU backend.
  • Backend: GPU
  • Use case: Floating-point models and workloads that benefit from GPU parallelism.
  • Additional configuration: set backend=/usr/lib/libQnnGpu.so

CPU — libQnnCpu.so

Runs the model on the default QNN CPU backend.
  • Backend: CPU
  • Use case: Reference execution, debugging, bring-up, or systems without hardware acceleration.
  • Additional configuration: none required

Runtime Memory Behavior and GAP Handling

QNN Memory Model

qtimlqnn operates within the memory model of the QNN runtime. The element uses DMA buffers via GstMLBufferPool to minimize memory copies and maintain zero-copy transport where possible. QNN uses runtime-managed memory to allocate:
  • Input tensors
  • Intermediate activation tensors
  • Output tensors
The element discovers input/output tensor metadata (count, shape, type) at model load time and configures buffer pools accordingly.

GAP Buffer Handling

qtimlqnn is GAP-aware and correctly handles input buffers marked with GST_BUFFER_FLAG_GAP. When a GAP buffer is received, the element skips inference and forwards the buffer downstream. This preserves timing and synchronization while explicitly indicating that no valid inference input is available for that timestamp. GAP buffers commonly appear in conditional AI pipelines, such as cascaded workflows where later inference stages run only when earlier stages produce valid regions of interest.

Usage

Single-Stage AI Inference on Live Camera Stream (HTP)

This example demonstrates real-time inference on a live camera stream using a single instance of qtimlqnn with the HTP backend. Inference results are attached to each GstBuffer as MLMeta, allowing downstream elements to access synchronized metadata directly from the frame. An overlay stage then uses this metadata to render annotations such as bounding boxes, labels, or key-points before display or further processing.
1

Download Required Files

FileDownloadSave as
Yolov8 Detection W8A8 modelExport from Qualcomm AI Hubyolov8_det_w8a8.bin
Detection labelsyolov8.jsonyolov8.json
2

Copy files to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels}"
scp yolov8_det_w8a8.bin   <user>@<device-ip>:$HOME/models/
scp yolov8.json           <user>@<device-ip>:$HOME/labels/
3

Connect to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip>
4

Set environment variables

Run below command on your device
mkdir -p $HOME/{models,labels}
export MODEL_NAME=yolov8_det_w8a8.bin
export LABELS_NAME=yolov8.json
5

Run the pipeline

Run the pipeline
gst-launch-1.0 -e --gst-debug=2 \
qticamsrc name=camsrc ! video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 ! queue ! \
tee name=t ! queue ! qtimetamux name=obj_mux ! qtivoverlay ! waylandsink fullscreen=true sync=false \
t. ! queue ! qtimlvconverter ! queue ! \
qtimlqnn model=$HOME/models/$MODEL_NAME backend=/usr/lib/libQnnHtp.so tensors="<boxes,scores,class_idx>" ! queue ! \
qtimlpostprocess module=yolov8 labels=$HOME/labels/$LABELS_NAME settings="{\"confidence\": 51.0}" ! text/x-raw ! queue ! obj_mux.

Single-Stage AI Inference on Live Camera Stream (GPU)

This example demonstrates the same single-stage inference workflow using the GPU backend instead of HTP. This is suitable for floating-point models or workloads that benefit from GPU parallelism.
1

Download Required Files

FileDownloadSave as
Yolov8 Detection Float modelExport from Qualcomm AI Hubyolov8_det_float.bin
Detection labelsyolov8.jsonyolov8.json
2

Copy files to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels}"
scp yolov8_det_float.bin   <user>@<device-ip>:$HOME/models/
scp yolov8.json            <user>@<device-ip>:$HOME/labels/
3

Connect to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip>
4

Set environment variables

Run below command on your device
export MODEL_NAME=yolov8_det_float.bin
export LABELS_NAME=yolov8.json
5

Run the pipeline

Run the pipeline
gst-launch-1.0 -e --gst-debug=2 \
qticamsrc name=camsrc ! video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 ! queue ! \
tee name=t ! queue ! qtimetamux name=obj_mux ! qtivoverlay ! waylandsink fullscreen=true sync=false \
t. ! queue ! qtimlvconverter ! queue ! \
qtimlqnn model=$HOME/models/$MODEL_NAME backend=/usr/lib/libQnnGpu.so tensors="<boxes,scores,class_idx>" ! queue ! \
qtimlpostprocess module=yolov8 labels=$HOME/labels/$LABELS_NAME settings="{\"confidence\": 51.0}" ! text/x-raw ! queue ! obj_mux.