Skip to main content
The inference plugin is responsible for executing the AI model on the prepared input tensor. The QIM SDK supports multiple inference runtimes, each encapsulated within a dedicated GStreamer plugin. This architecture allows for straightforward replacement and integration of inference engines, depending on the target platform or model format. Supported runtimes include:
  • qtimlsnpe — SNPE (Qualcomm Neural Processing): Executes models in DLC format on Qualcomm Snapdragon platforms.
  • qtimlqnn — QNN (Qualcomm AI Engine Direct): Supports models optimized for QNN.
  • qtimltflite — TFLite / Lite-RT: Enables execution of TensorFlow Lite models.
  • qtimlonnx — Enables execution of ONNX models.
All plugins leverage hardware acceleration provided by Qualcomm NPUs and GPUs for optimal performance.

Run example on device

The example below uses the ResNeXt101 model with the qtimltflite plugin to classify objects in a video stream. Inference pipeline diagram
1

Download Required Files

FileDownloadSave as
ResNeXt101 W8A8 modelQualcomm AI Hub — ResNeXt101resnext101-w8a8.tflite
Sample videoInput videoai_demo_sample.mp4
If any downloaded file is a .zip archive, extract it on your host machine before copying: unzip filename.zip
2

Copy files to device

Create the required directories and transfer the downloaded files to your device.
# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.
# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media,media/output}"
scp resnext101-w8a8.tflite  <user>@<device-ip>:$HOME/models/
scp ai_demo_sample.mp4                <user>@<device-ip>:$HOME/media/
3

Connect to device

ssh <user>@<device-ip>
4

Set environment variables

export MODEL_NAME=resnext101-w8a8.tflite
export SRC_VIDEO_NAME=ai_demo_sample.mp4
export VIDEO_SOURCE="filesrc location=$HOME/media/$SRC_VIDEO_NAME ! qtdemux ! h264parse ! v4l2h264dec capture-io-mode=4 output-io-mode=4 ! video/x-raw,format=NV12"
5

Run example on device

gst-launch-1.0 -e \
  filesrc location=$HOME/media/$SRC_VIDEO_NAME ! qtdemux ! h264parse ! \
  v4l2h264dec capture-io-mode=4 output-io-mode=4 ! video/x-raw,format=NV12 ! queue ! \
  qtimlvconverter name=preprocess ! queue ! \
  qtimltflite name=inference \
    delegate=external \
    external-delegate-path=libQnnTFLiteDelegate.so \
    external-delegate-options="QNNExternalDelegate,backend_type=htp;" \
    model=$HOME/models/$MODEL_NAME ! queue ! \
  multifilesink location=$HOME/media/tensor.bin sync=false

Expected output

The pipeline classifies objects in the video stream in real time. The qtimltflite plugin automatically reads tensor specifications from the model and propagates them to adjacent plugins — no manual tensor configuration is required. By default, all QIM SDK inference plugins perform dequantization on output tensors automatically. Now that we are able to take video input from a data source and do hardware accelerated preprocessing and inferencing on each frame, let’s turn our attention to the post processing of the results and generating meaningful data.