Inferencing - Qualcomm Intelligent Multimedia SDK

The inference plugin is responsible for executing the AI model on the prepared input tensor. The QIM SDK supports multiple inference runtimes, each encapsulated within a dedicated GStreamer plugin. This architecture allows for straightforward replacement and integration of inference engines, depending on the target platform or model format. Supported runtimes include:

qtimlsnpe — SNPE (Qualcomm Neural Processing): Executes models in DLC format on Qualcomm Snapdragon platforms.
qtimlqnn — QNN (Qualcomm AI Engine Direct): Supports models optimized for QNN.
qtimltflite — TFLite / Lite-RT: Enables execution of TensorFlow Lite models.
qtimlonnx — Enables execution of ONNX models.

All plugins leverage hardware acceleration provided by Qualcomm NPUs and GPUs for optimal performance.

Run example on device

The example below uses the ResNeXt101 model with the qtimltflite plugin to classify objects in a video stream.

Download Required Files

File	Download	Save as
ResNeXt101 W8A8 model	Qualcomm AI Hub — ResNeXt101	`resnext101-w8a8.tflite`
Sample video	Input video	`ai_demo_sample.mp4`

If any downloaded file is a .zip archive, extract it on your host machine before copying: unzip filename.zip

Copy files to device

Create the required directories and transfer the downloaded files to your device.

# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.
# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media,media/output}"
scp resnext101-w8a8.tflite  <user>@<device-ip>:$HOME/models/
scp ai_demo_sample.mp4                <user>@<device-ip>:$HOME/media/

Connect to device

ssh <user>@<device-ip>

Set environment variables

export MODEL_NAME=resnext101-w8a8.tflite
export SRC_VIDEO_NAME=ai_demo_sample.mp4
export VIDEO_SOURCE="filesrc location=$HOME/media/$SRC_VIDEO_NAME ! qtdemux ! h264parse ! v4l2h264dec capture-io-mode=4 output-io-mode=4 ! video/x-raw,format=NV12"

Run example on device

GStreamer Command line
GStreamer Python application
GStreamer C/C++ application

gst-launch-1.0 -e \
  filesrc location=$HOME/media/$SRC_VIDEO_NAME ! qtdemux ! h264parse ! \
  v4l2h264dec capture-io-mode=4 output-io-mode=4 ! video/x-raw,format=NV12 ! queue ! \
  qtimlvconverter name=preprocess ! queue ! \
  qtimltflite name=inference \
    delegate=external \
    external-delegate-path=libQnnTFLiteDelegate.so \
    external-delegate-options="QNNExternalDelegate,backend_type=htp;" \
    model=$HOME/models/$MODEL_NAME ! queue ! \
  multifilesink location=$HOME/media/tensor.bin sync=false

Python source code: gst-ai-video-inference.py

Run:

python3 gst-ai-video-inference.py -s "$VIDEO_SOURCE"

Application source code: gst-ai-video-inference
Build your application:
- Yocto
- Ubuntu
Steps to build custom application
Steps to build custom application

Run:

gst-ai-video-inference -s "$VIDEO_SOURCE"

Expected output

The pipeline classifies objects in the video stream in real time. The qtimltflite plugin automatically reads tensor specifications from the model and propagates them to adjacent plugins — no manual tensor configuration is required. By default, all QIM SDK inference plugins perform dequantization on output tensors automatically. Now that we are able to take video input from a data source and do hardware accelerated preprocessing and inferencing on each frame, let’s turn our attention to the post processing of the results and generating meaningful data.

​Run example on device

​Expected output

Run example on device

Expected output