Benefits - Qualcomm Intelligent Multimedia SDK

Why developers should use the Qualcomm Intelligent Multimedia SDK

Single, Unified Multimedia + AI Framework

Combines multimedia pipelines and ML inference into a single, coherent developer experience.

Hardware-Accelerated by Default

QIM SDK natively exploits QCOM accelerators including CPU, GPU, NPU, ISP, VPU and DSP through optimized plugins

Gstreamer-Based, Industry-Standard Pipelines

Built on Gstreamer, providing a mature, composable pipeline model with 40+ Qualcomm-optimized plugins.

Flexible AI Model Support

SDK supports running models from common frameworks such as TFLite/LiteRT, ONNX, and QNN.

Qualcomm AI Hub Integration

Browse, download, and deploy hundreds of pre-trained, quantized models directly from Qualcomm AI Hub. Models arrive ready to run — no manual conversion, calibration, or profiling required.

Plugin Performance: QIM SDK vs Upstream GStreamer

AI Preprocessing: Upstream vs QIM SDK (`qtimlvconverter`)

Example: AI preprocessing with upstream GStreamer vs `qtimlvconverter`

This example shows a 1080p frame being prepared for neural network inference. The upstream approach chains videoconvert and videoscale on the CPU. The QIM SDK approach uses qtivtransform to convert to NV12 on the GPU, then qtimlvconverter to pack it into a tensor buffer ready for qtimltflite. Upstream (CPU) — upstream_preprocess.py:

from gst_helper import gst_grouped_frames, atomic_save_image, timing_marks_to_str
import argparse

parser = argparse.ArgumentParser(description='Upstream CPU preprocessing')
parser.add_argument('--video-source', type=str, required=True,
    help='GStreamer video source (e.g. "v4l2src device=/dev/video2")')
args, unknown = parser.parse_known_args()

PIPELINE = (
    f"{args.video_source} ! "
    "video/x-raw,width=1920,height=1080 ! "
    "identity name=frame_ready silent=false ! "
    "videoconvert ! aspectratiocrop aspect-ratio=1/1 ! "
    "videoscale ! video/x-raw,format=RGB,width=224,height=224 ! "
    "identity name=preprocess_done silent=false ! "
    "queue max-size-buffers=2 leaky=downstream ! "
    "appsink name=frame drop=true sync=false max-buffers=1 emit-signals=true"
)

for frames_by_sink, marks in gst_grouped_frames(PIPELINE):
    frame = frames_by_sink['frame']
    print(f'Frame shape: {frame.shape}')
    print('Timings:', timing_marks_to_str(marks))
    atomic_save_image(frame=frame, path='out/frame_upstream.png')

python3 upstream_preprocess.py --video-source "$IMSDK_VIDEO_SOURCE"

# Frame shape: (224, 224, 3)
# Timings: frame_ready->preprocess_done: 22.16ms, preprocess_done->pipeline_finished: 2.14ms (total 24.31ms)

QIM SDK (qtivtransform + qtimlvconverter) — imsdk_preprocess.py:

from gst_helper import gst_grouped_frames, timing_marks_to_str
import argparse

parser = argparse.ArgumentParser(description='QIM SDK GPU preprocessing')
parser.add_argument('--video-source', type=str, required=True,
    help='GStreamer video source (e.g. "v4l2src device=/dev/video2")')
args, unknown = parser.parse_known_args()

PIPELINE = (
    f"{args.video_source} ! "
    "video/x-raw,width=1920,height=1080 ! "
    "identity name=frame_ready silent=false ! "
    "qtivtransform ! "
    "video/x-raw,format=NV12 ! "
    "identity name=transform_done silent=false ! "
    "qtimlvconverter ! neural-network/tensors,type=UINT8,dimensions=<<1,224,224,3>> ! "
    "identity name=preprocess_done silent=false ! "
    "queue max-size-buffers=2 leaky=downstream ! "
    "appsink name=tensor drop=true sync=false max-buffers=1 emit-signals=true"
)

for frames_by_sink, marks in gst_grouped_frames(PIPELINE):
    tensor = frames_by_sink['tensor']
    print(f'Tensor shape: {tensor.shape} dtype: {tensor.dtype}')
    print('Timings:', timing_marks_to_str(marks))

python3 imsdk_preprocess.py --video-source "$IMSDK_VIDEO_SOURCE"

# Tensor shape: (150528,) dtype: uint8
# Timings: frame_ready->transform_done: 6.55ms, transform_done->preprocess_done: 0.83ms (total 7.38ms)

qtivtransform handles the GPU color conversion and qtimlvconverter packs the NV12 buffer into a flat UINT8 tensor in a single pass, ready to hand directly to qtimltflite without any CPU copy.

Video Pipeline: Upstream vs QIM SDK (VTransform, VSplit, VComposer, Overlay)

Example: Video transform and multi-stream split with upstream GStreamer vs QIM SDK

This example shows the same two-branch pipeline — one branch getting the original 1080p frame, the other getting a cropped and resized 224×224 frame — built with upstream elements and then with QIM SDK plugins. Upstream (CPU) — upstream_split.py:

from gst_helper import gst_grouped_frames, atomic_save_image, timing_marks_to_str
import argparse

parser = argparse.ArgumentParser(description='Upstream CPU tee and transform')
parser.add_argument('--video-source', type=str, required=True,
    help='GStreamer video source (e.g. "v4l2src device=/dev/video2")')
args, unknown = parser.parse_known_args()

PIPELINE = (
    f"{args.video_source} ! "
    "video/x-raw,width=1920,height=1080 ! "
    "identity name=frame_ready silent=false ! "

    "tee name=t "

    # Branch A: full-resolution RGB
    "t. ! queue max-size-buffers=1 leaky=downstream ! "
    "videoconvert ! video/x-raw,format=RGB ! "
    "appsink name=original drop=true sync=false max-buffers=1 emit-signals=true "

    # Branch B: square crop then scale to 224x224
    "t. ! queue max-size-buffers=1 leaky=downstream ! "
    "videoconvert ! aspectratiocrop aspect-ratio=1/1 ! "
    "videoscale ! video/x-raw,format=RGB,width=224,height=224 ! "
    "identity name=transform_done silent=false ! "
    "queue max-size-buffers=2 leaky=downstream ! "
    "appsink name=frame drop=true sync=false max-buffers=1 emit-signals=true "
)

for frames_by_sink, marks in gst_grouped_frames(PIPELINE):
    frame = frames_by_sink['frame']
    original = frames_by_sink['original']
    print(f'frame: {frame.shape}  original: {original.shape}')
    print('Timings:', timing_marks_to_str(marks))
    atomic_save_image(frame=frame, path='out/frame_upstream.png')
    atomic_save_image(frame=original, path='out/original_upstream.png')

python3 upstream_split.py --video-source "$IMSDK_VIDEO_SOURCE"

# frame: (224, 224, 3)  original: (1080, 1920, 3)
# Timings: frame_ready->transform_done: 22.21ms, transform_done->pipeline_finished: 1.25ms (total 23.46ms)

QIM SDK (qtivtransform) — imsdk_split.py:

from gst_helper import gst_grouped_frames, atomic_save_image, timing_marks_to_str
import argparse

parser = argparse.ArgumentParser(description='QIM SDK GPU tee and transform')
parser.add_argument('--video-source', type=str, required=True,
    help='GStreamer video source (e.g. "v4l2src device=/dev/video2")')
args, unknown = parser.parse_known_args()

PIPELINE = (
    f"{args.video_source} ! "
    "video/x-raw,width=1920,height=1080 ! "
    "identity name=frame_ready silent=false ! "

    "tee name=t "

    # Branch A: full-resolution RGB via GPU color convert
    "t. ! queue max-size-buffers=1 leaky=downstream ! "
    "qtivtransform ! video/x-raw,format=RGB ! "
    "appsink name=original drop=true sync=false max-buffers=1 emit-signals=true "

    # Branch B: GPU center-crop to 1080x1080 then resize to 224x224
    "t. ! queue max-size-buffers=1 leaky=downstream ! "
    'qtivtransform crop="<420, 0, 1080, 1080>" ! '
    "video/x-raw,format=RGB,width=224,height=224 ! "
    "identity name=transform_done silent=false ! "
    "queue max-size-buffers=2 leaky=downstream ! "
    "appsink name=frame drop=true sync=false max-buffers=1 emit-signals=true "
)

for frames_by_sink, marks in gst_grouped_frames(PIPELINE):
    frame = frames_by_sink['frame']
    original = frames_by_sink['original']
    print(f'frame: {frame.shape}  original: {original.shape}')
    print('Timings:', timing_marks_to_str(marks))
    atomic_save_image(frame=frame, path='out/frame_imsdk.png')
    atomic_save_image(frame=original, path='out/original_imsdk.png')

python3 imsdk_split.py --video-source "$IMSDK_VIDEO_SOURCE"

# frame: (224, 224, 3)  original: (1080, 1920, 3)
# Timings: frame_ready->transform_done: 5.51ms, transform_done->pipeline_finished: 4.41ms (total 9.92ms)

qtivtransform combines crop, resize, and color conversion into one GPU-accelerated step. The crop/resize time drops from ~22 ms to ~5.5 ms, and both branches share the same DMA buffer without CPU copies.

Neural Network Inference: Upstream TFLite vs `qtimltflite`

Running inference in Python with the TFLite CPU delegate means every frame must travel through the CPU: preprocess on CPU, invoke the interpreter on CPU, then post-process on CPU. qtimltflite moves inference into the GStreamer pipeline and routes it to the NPU via the QNN HTP delegate, keeping the frame in hardware memory the entire time.

	Upstream (LiteRT CPU delegate in Python)	QIM SDK (`qtimltflite` on NPU)
Inference location	CPU, invoked from Python	NPU (HTP), inside the pipeline
Pre-process	`videoconvert` + `videoscale` on CPU	`qtivtransform` + `qtimlvconverter` on GPU
Post-process	Python + PIL on CPU	`qtimlpostprocess` inside pipeline
Frame crossing CPU	Yes — NumPy array passed to interpreter	No — tensor stays in DMA buffer

Example: Image classification with upstream LiteRT vs `qtimltflite`

Both scripts run SqueezeNet on every webcam frame and print the top-5 class predictions. Upstream (LiteRT CPU delegate in Python) — upstream_inference.py:

from gst_helper import gst_grouped_frames, atomic_save_pillow_image, timing_marks_to_str, download_file_if_needed, softmax
import time, argparse, numpy as np
from ai_edge_litert.interpreter import Interpreter, load_delegate
from PIL import ImageDraw, Image

parser = argparse.ArgumentParser(description='Upstream LiteRT CPU inference')
parser.add_argument('--video-source', type=str, required=True,
    help='GStreamer video source (e.g. "v4l2src device=/dev/video2")')
args, unknown = parser.parse_known_args()

MODEL_PATH = download_file_if_needed('models/squeezenet1_1-squeezenet-1.1-w8a8.tflite', 'https://cdn.edgeimpulse.com/qc-ai-docs/models/squeezenet1_1-squeezenet-1.1-w8a8.tflite')
LABELS_PATH = download_file_if_needed('models/SqueezeNet-1.1_labels.txt', 'https://cdn.edgeimpulse.com/qc-ai-docs/models/SqueezeNet-1.1_labels.txt')

with open(LABELS_PATH, 'r') as f:
    labels = [line for line in f.read().splitlines() if line.strip()]

interpreter = Interpreter(
    model_path=MODEL_PATH,
    experimental_delegates=[load_delegate("libQnnTFLiteDelegate.so", options={"backend_type": "htp"})]
)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

PIPELINE = (
    f"{args.video_source} ! "
    "video/x-raw,width=1920,height=1080 ! "
    "identity name=frame_ready_webcam silent=false ! "
    'qtivtransform crop="<420, 0, 1080, 1080>" ! '
    "video/x-raw,format=RGB,width=224,height=224 ! "
    "identity name=transform_done silent=false ! "
    "queue max-size-buffers=2 leaky=downstream ! "
    "appsink name=frame drop=true sync=false max-buffers=1 emit-signals=true"
)

for frames_by_sink, marks in gst_grouped_frames(PIPELINE):
    inference_start = time.perf_counter()
    interpreter.set_tensor(input_details[0]['index'], frames_by_sink['frame'].reshape((1, 224, 224, 3)))
    interpreter.invoke()
    q_output = interpreter.get_tensor(output_details[0]['index'])
    scale, zero_point = output_details[0]['quantization']
    f_output = (q_output.astype(np.float32) - zero_point) * scale
    scores = softmax(f_output[0])
    marks['inference_done'] = list(marks.items())[-1][1] + (time.perf_counter() - inference_start)

    top_k = scores.argsort()[-5:][::-1]
    print('Top-5 predictions:')
    for i in top_k:
        print(f'  {labels[i]}: {scores[i]:.4f}')

    image_composition_start = time.perf_counter()
    frame = frames_by_sink['frame']
    img = Image.fromarray(frame)
    img_draw = ImageDraw.Draw(img)
    img_draw.text((10, 10), f"{labels[top_k[0]]} ({scores[top_k[0]]:.2f})", fill="black")
    atomic_save_pillow_image(img=img, path='out/upstream_prediction.png')
    marks['image_composition_end'] = list(marks.items())[-1][1] + (time.perf_counter() - image_composition_start)

    print('Timings:', timing_marks_to_str(marks))

# We use '| grep -v "<W>"' to filter out some warnings - you can omit it if you want.
python3 upstream_inference.py --video-source "$IMSDK_VIDEO_SOURCE" | grep -v "<W>"

# Top-5 predictions:
#   laptop: 0.4952
#   notebook: 0.3394
#   computer keyboard: 0.0750
#   space bar: 0.0621
#   typewriter keyboard: 0.0078
# Timings: frame_ready_webcam->transform_done: 6.81ms, transform_done->pipeline_finished: 0.77ms, pipeline_finished->inference_done: 1.19ms, inference_done->image_composition_end: 37.85ms (total 46.61ms)

QIM SDK (qtimltflite on NPU) — imsdk_inference.py:

from gst_helper import gst_grouped_frames, timing_marks_to_str, download_file_if_needed
import argparse

parser = argparse.ArgumentParser(description='QIM SDK NPU inference with qtimltflite')
parser.add_argument('--video-source', type=str, required=True,
    help='GStreamer video source (e.g. "v4l2src device=/dev/video2")')
args, unknown = parser.parse_known_args()

MODEL_PATH = download_file_if_needed('models/squeezenet1_1-squeezenet-1.1-w8a8.tflite', 'https://cdn.edgeimpulse.com/qc-ai-docs/models/squeezenet1_1-squeezenet-1.1-w8a8.tflite')
LABELS_PATH = download_file_if_needed('models/SqueezeNet-1.1_labels.txt', 'https://cdn.edgeimpulse.com/qc-ai-docs/models/SqueezeNet-1.1_labels.txt')

PIPELINE = (
    f"{args.video_source} ! "
    "video/x-raw,width=1920,height=1080 ! "
    "identity name=frame_ready_webcam silent=false ! "
    "qtivtransform ! "
    "video/x-raw,format=NV12 ! "
    "identity name=transform_done silent=false ! "
    "qtimlvconverter ! neural-network/tensors,type=UINT8,dimensions=<<1,224,224,3>> ! "
    "identity name=conversion_done silent=false ! "
    f'qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so '
    f'external-delegate-options="QNNExternalDelegate,backend_type=htp;" model="{MODEL_PATH}" ! '
    "identity name=inference_done silent=false ! "
    f'qtimlpostprocess name=postproc module=mobilenet-softmax '
    f'labels="{LABELS_PATH}" results=5 settings="{{\\"confidence\\": 10.0}}" ! '
    "text/x-raw,format=utf8 ! "
    "queue max-size-buffers=2 leaky=downstream ! "
    "appsink name=predictions drop=true sync=false max-buffers=1 emit-signals=true"
)

for frames_by_sink, marks in gst_grouped_frames(PIPELINE):
    cls_text = frames_by_sink['predictions'].tobytes().decode('utf-8')
    print('Predictions:', cls_text[:120], '...')
    print('Timings:', timing_marks_to_str(marks))

# We use '| grep -v "<W>"' to filter out some warnings - you can omit it if you want.
python3 imsdk_inference.py --video-source "$IMSDK_VIDEO_SOURCE" | grep -v "<W>"

# Predictions: { (structure)"ImageClassification\, labels\=...laptop...notebook... }
# Timings: frame_ready_webcam->transform_done: 7.29ms, transform_done->conversion_done: 0.83ms, conversion_done->inference_done: 1.25ms, inference_done->postproc_done: 0.45ms (total 9.82ms)

Moving inference into qtimltflite cuts the total pipeline time from ~47 ms to ~10 ms. The frame never leaves hardware memory — qtivtransform → qtimlvconverter → qtimltflite → qtimlpostprocess all operate on the same DMA buffer, and the NPU runs the model ~37x faster than the CPU delegate for this workload.

qtimltflite requires a tightly packed buffer. Use video/x-raw,format=NV12 after qtivtransform before passing to qtimlvconverter, as RGB output uses row-stride padding that causes silent misalignment.

How QIM SDK Delivers These Benefits

Single, Unified Multimedia + AI Framework

What developers get

One pipeline for capture, preprocess, inference, overlay, encode, and streaming.
One metadata flow for pixels and AI results.
Less custom glue code between multimedia and ML stages.

How QIM SDK delivers it

Builds on GStreamer, so media and AI elements live in the same pipeline graph.
Carries detections, labels, keypoints, and tensor info as frame metadata.
Supports sequential flows like detection → classification and parallel flows using tee + qtimetamux.
Supports cross-pipeline designs with qtisocketsink and qtisocketsrc when applications need modular stages.

Developer takeaway You describe the full application as a pipeline instead of stitching together separate camera, inference, and rendering subsystems.

Hardware-Accelerated by Default

What developers get

Lower latency.
Lower CPU load.
Better throughput for multi-stream and real-time AI.

How QIM SDK delivers it

ISP: camera ingest and imaging path use Qualcomm imaging hardware.
GPU: plugins such as qtimlvconverter, qtivtransform, qtivoverlay, and qtivcomposer offload resize, color convert, crop, rotate, overlay, and composition work.
NPU: inference workloads run through supported runtimes on dedicated AI hardware.
VPU: v4l2h264dec, v4l2h265dec, v4l2h264enc, and v4l2h265enc handle video decode/encode efficiently.
Zero-copy + buffer pools: frames move through the pipeline as DMA-buf handles with fewer copies and lower allocation overhead.

Developer takeaway Most performance-critical stages are expressed by choosing the right plugins, not by writing accelerator-specific application code.

Qualcomm AI Hub Integration

What developers get

Hundreds of pre-trained, profiled, quantized models ready to deploy.
No manual conversion, calibration, or accuracy validation.
Models arrive as .tflite, .dlc, or .bin files compatible with IM SDK runtimes.

How QIM SDK delivers it

Models from AI Hub are exported targeting specific Qualcomm SoC backends (HTP, GPU, CPU).
qtimltflite, qtimlsnpe, and qtimlqnn load these models directly — no format adaptation layer.
Confidence thresholds, post-processing modules, and label files are supplied alongside the model, matching the IM SDK qtimlpostprocess interface.

Developer takeaway Browse AI Hub, download a model, drop it into the pipeline — no custom integration work required.

DMA Zero-Copy Buffer Sharing

Loadable Post-Processing Modules

What developers get

Built-in modules for detection, classification, segmentation, pose, audio, and super-resolution.
Custom module support without SDK recompilation.
Runtime module swap — change post-processing logic by updating a .so file on the device.

How QIM SDK delivers it

qtimlpostprocess loads libml-postprocess-<name>.so at runtime via dlopen.
Each module implements Caps(), Configure(), and Process() from the IModule interface.
Modules are deployed to /usr/lib/gstreamer-1.0/ml/modules/ — no pipeline changes required after deployment.

Developer takeaway Swap or extend post-processing without touching the pipeline definition or recompiling the application.

ISP Camera Integration

What developers get

Hardware-demosaiced, ISP-processed frames delivered directly to downstream AI and multimedia stages.
Multi-camera support (primary + secondary sensor).
Zero-copy from ISP to GPU/NPU via GBM/DMA-buf.

How QIM SDK delivers it

qticamsrc exposes a GStreamer source interface over the Qualcomm camera service.
Outputs video/x-raw,format=NV12 — directly consumable by qtimlvconverter, qtivtransform, and encode plugins.
Camera properties (resolution, framerate, camera index) are set via standard GStreamer element properties.

Developer takeaway A single pipeline element replaces the entire camera driver integration layer — no V4L2 or camera HAL code to manage.

GStreamer-Based, Industry-Standard Pipelines

What developers get

Familiar pipeline construction.
Easier integration with upstream plugins.
Better portability of application logic.

How QIM SDK delivers it

Follows standard GStreamer plugin architecture and negotiation rules.
Uses caps negotiation for format compatibility.
Uses allocation queries for memory negotiation and zero-copy paths.
Reuses standard image formats such as NV12, RGBA, and I420.
Reuses standard metadata where possible and adds AI-specific metadata only when needed.

Developer takeaway If your team already knows GStreamer, QIM SDK fits into the same workflow with Qualcomm-optimized building blocks.

Flexible AI Model Support

What developers get

Freedom to choose runtime per use case.
Support for common edge AI workloads.
A path from prototype to optimized deployment.

How QIM SDK delivers it

Supports TFLite, SNPE, and QNN runtime paths.
Supports use cases such as detection, classification, segmentation, pose, super resolution, and audio AI.
Supports chained and parallel model execution in the same media flow.
Preserves model outputs as metadata so downstream stages can reuse them.

Developer takeaway You can keep one pipeline architecture while changing the model, runtime, or accelerator strategy based on product needs.

Developer Notes

Why zero-copy matters

Zero-copy pipelines keep the same frame buffer moving across decode, preprocess, inference, overlay, and encode stages. This reduces memory traffic, CPU overhead, and end-to-end latency — especially important for high-resolution and multi-stream workloads.

Why the GPU preprocessing path matters

A typical AI pipeline must resize, color-convert, and normalize frames before inference. QIM SDK offloads these steps through GPU-accelerated plugins instead of doing them in generic CPU-only stages, which helps preserve frame rate and lowers CPU utilization.

How to read the examples

Each example shows the same operation built with upstream GStreamer CPU elements and then with QIM SDK hardware-accelerated plugins. Timing numbers are measured on IQ9 with a USB camera at 1080p input.

​Why developers should use the Qualcomm Intelligent Multimedia SDK

Single, Unified Multimedia + AI Framework

Hardware-Accelerated by Default

Gstreamer-Based, Industry-Standard Pipelines

Flexible AI Model Support

Qualcomm AI Hub Integration

​Plugin Performance: QIM SDK vs Upstream GStreamer

​AI Preprocessing: Upstream vs QIM SDK (qtimlvconverter)

​Example: AI preprocessing with upstream GStreamer vs qtimlvconverter

​Video Pipeline: Upstream vs QIM SDK (VTransform, VSplit, VComposer, Overlay)

​Example: Video transform and multi-stream split with upstream GStreamer vs QIM SDK

​Neural Network Inference: Upstream TFLite vs qtimltflite

​Example: Image classification with upstream LiteRT vs qtimltflite

​How QIM SDK Delivers These Benefits

​Developer Notes

Why developers should use the Qualcomm Intelligent Multimedia SDK

Plugin Performance: QIM SDK vs Upstream GStreamer

AI Preprocessing: Upstream vs QIM SDK (`qtimlvconverter`)

Example: AI preprocessing with upstream GStreamer vs `qtimlvconverter`

Video Pipeline: Upstream vs QIM SDK (VTransform, VSplit, VComposer, Overlay)

Example: Video transform and multi-stream split with upstream GStreamer vs QIM SDK

Neural Network Inference: Upstream TFLite vs `qtimltflite`

Example: Image classification with upstream LiteRT vs `qtimltflite`

How QIM SDK Delivers These Benefits

Developer Notes