Skip to main content

Why developers should use the Qualcomm Intelligent Multimedia SDK

Single, Unified Multimedia + AI Framework

Combines multimedia pipelines and ML inference into a single, coherent developer experience.

Hardware-Accelerated by Default

QIM SDK natively exploits QCOM accelerators including CPU, GPU, NPU, ISP, VPU and DSP through optimized plugins

Gstreamer-Based, Industry-Standard Pipelines

Built on Gstreamer, providing a mature, composable pipeline model with 40+ Qualcomm-optimized plugins.

Flexible AI Model Support

SDK supports running models from common frameworks such as TFLite/LiteRT, ONNX, and QNN.

Qualcomm AI Hub Integration

Browse, download, and deploy hundreds of pre-trained, quantized models directly from Qualcomm AI Hub. Models arrive ready to run — no manual conversion, calibration, or profiling required.

Plugin Performance: QIM SDK vs Upstream GStreamer

AI Preprocessing: Upstream vs QIM SDK (qtimlvconverter)

Example: AI preprocessing with upstream GStreamer vs qtimlvconverter

This example shows a 1080p frame being prepared for neural network inference. The upstream approach chains videoconvert and videoscale on the CPU. The QIM SDK approach uses qtivtransform to convert to NV12 on the GPU, then qtimlvconverter to pack it into a tensor buffer ready for qtimltflite. Upstream (CPU) — upstream_preprocess.py:
from gst_helper import gst_grouped_frames, atomic_save_image, timing_marks_to_str
import argparse

parser = argparse.ArgumentParser(description='Upstream CPU preprocessing')
parser.add_argument('--video-source', type=str, required=True,
    help='GStreamer video source (e.g. "v4l2src device=/dev/video2")')
args, unknown = parser.parse_known_args()

PIPELINE = (
    f"{args.video_source} ! "
    "video/x-raw,width=1920,height=1080 ! "
    "identity name=frame_ready silent=false ! "
    "videoconvert ! aspectratiocrop aspect-ratio=1/1 ! "
    "videoscale ! video/x-raw,format=RGB,width=224,height=224 ! "
    "identity name=preprocess_done silent=false ! "
    "queue max-size-buffers=2 leaky=downstream ! "
    "appsink name=frame drop=true sync=false max-buffers=1 emit-signals=true"
)

for frames_by_sink, marks in gst_grouped_frames(PIPELINE):
    frame = frames_by_sink['frame']
    print(f'Frame shape: {frame.shape}')
    print('Timings:', timing_marks_to_str(marks))
    atomic_save_image(frame=frame, path='out/frame_upstream.png')
python3 upstream_preprocess.py --video-source "$IMSDK_VIDEO_SOURCE"

# Frame shape: (224, 224, 3)
# Timings: frame_ready->preprocess_done: 22.16ms, preprocess_done->pipeline_finished: 2.14ms (total 24.31ms)
QIM SDK (qtivtransform + qtimlvconverter) — imsdk_preprocess.py:
from gst_helper import gst_grouped_frames, timing_marks_to_str
import argparse

parser = argparse.ArgumentParser(description='QIM SDK GPU preprocessing')
parser.add_argument('--video-source', type=str, required=True,
    help='GStreamer video source (e.g. "v4l2src device=/dev/video2")')
args, unknown = parser.parse_known_args()

PIPELINE = (
    f"{args.video_source} ! "
    "video/x-raw,width=1920,height=1080 ! "
    "identity name=frame_ready silent=false ! "
    "qtivtransform ! "
    "video/x-raw,format=NV12 ! "
    "identity name=transform_done silent=false ! "
    "qtimlvconverter ! neural-network/tensors,type=UINT8,dimensions=<<1,224,224,3>> ! "
    "identity name=preprocess_done silent=false ! "
    "queue max-size-buffers=2 leaky=downstream ! "
    "appsink name=tensor drop=true sync=false max-buffers=1 emit-signals=true"
)

for frames_by_sink, marks in gst_grouped_frames(PIPELINE):
    tensor = frames_by_sink['tensor']
    print(f'Tensor shape: {tensor.shape} dtype: {tensor.dtype}')
    print('Timings:', timing_marks_to_str(marks))
python3 imsdk_preprocess.py --video-source "$IMSDK_VIDEO_SOURCE"

# Tensor shape: (150528,) dtype: uint8
# Timings: frame_ready->transform_done: 6.55ms, transform_done->preprocess_done: 0.83ms (total 7.38ms)
qtivtransform handles the GPU color conversion and qtimlvconverter packs the NV12 buffer into a flat UINT8 tensor in a single pass, ready to hand directly to qtimltflite without any CPU copy.

Video Pipeline: Upstream vs QIM SDK (VTransform, VSplit, VComposer, Overlay)

Example: Video transform and multi-stream split with upstream GStreamer vs QIM SDK

This example shows the same two-branch pipeline — one branch getting the original 1080p frame, the other getting a cropped and resized 224×224 frame — built with upstream elements and then with QIM SDK plugins. Upstream (CPU) — upstream_split.py:
from gst_helper import gst_grouped_frames, atomic_save_image, timing_marks_to_str
import argparse

parser = argparse.ArgumentParser(description='Upstream CPU tee and transform')
parser.add_argument('--video-source', type=str, required=True,
    help='GStreamer video source (e.g. "v4l2src device=/dev/video2")')
args, unknown = parser.parse_known_args()

PIPELINE = (
    f"{args.video_source} ! "
    "video/x-raw,width=1920,height=1080 ! "
    "identity name=frame_ready silent=false ! "

    "tee name=t "

    # Branch A: full-resolution RGB
    "t. ! queue max-size-buffers=1 leaky=downstream ! "
    "videoconvert ! video/x-raw,format=RGB ! "
    "appsink name=original drop=true sync=false max-buffers=1 emit-signals=true "

    # Branch B: square crop then scale to 224x224
    "t. ! queue max-size-buffers=1 leaky=downstream ! "
    "videoconvert ! aspectratiocrop aspect-ratio=1/1 ! "
    "videoscale ! video/x-raw,format=RGB,width=224,height=224 ! "
    "identity name=transform_done silent=false ! "
    "queue max-size-buffers=2 leaky=downstream ! "
    "appsink name=frame drop=true sync=false max-buffers=1 emit-signals=true "
)

for frames_by_sink, marks in gst_grouped_frames(PIPELINE):
    frame = frames_by_sink['frame']
    original = frames_by_sink['original']
    print(f'frame: {frame.shape}  original: {original.shape}')
    print('Timings:', timing_marks_to_str(marks))
    atomic_save_image(frame=frame, path='out/frame_upstream.png')
    atomic_save_image(frame=original, path='out/original_upstream.png')
python3 upstream_split.py --video-source "$IMSDK_VIDEO_SOURCE"

# frame: (224, 224, 3)  original: (1080, 1920, 3)
# Timings: frame_ready->transform_done: 22.21ms, transform_done->pipeline_finished: 1.25ms (total 23.46ms)
QIM SDK (qtivtransform) — imsdk_split.py:
from gst_helper import gst_grouped_frames, atomic_save_image, timing_marks_to_str
import argparse

parser = argparse.ArgumentParser(description='QIM SDK GPU tee and transform')
parser.add_argument('--video-source', type=str, required=True,
    help='GStreamer video source (e.g. "v4l2src device=/dev/video2")')
args, unknown = parser.parse_known_args()

PIPELINE = (
    f"{args.video_source} ! "
    "video/x-raw,width=1920,height=1080 ! "
    "identity name=frame_ready silent=false ! "

    "tee name=t "

    # Branch A: full-resolution RGB via GPU color convert
    "t. ! queue max-size-buffers=1 leaky=downstream ! "
    "qtivtransform ! video/x-raw,format=RGB ! "
    "appsink name=original drop=true sync=false max-buffers=1 emit-signals=true "

    # Branch B: GPU center-crop to 1080x1080 then resize to 224x224
    "t. ! queue max-size-buffers=1 leaky=downstream ! "
    'qtivtransform crop="<420, 0, 1080, 1080>" ! '
    "video/x-raw,format=RGB,width=224,height=224 ! "
    "identity name=transform_done silent=false ! "
    "queue max-size-buffers=2 leaky=downstream ! "
    "appsink name=frame drop=true sync=false max-buffers=1 emit-signals=true "
)

for frames_by_sink, marks in gst_grouped_frames(PIPELINE):
    frame = frames_by_sink['frame']
    original = frames_by_sink['original']
    print(f'frame: {frame.shape}  original: {original.shape}')
    print('Timings:', timing_marks_to_str(marks))
    atomic_save_image(frame=frame, path='out/frame_imsdk.png')
    atomic_save_image(frame=original, path='out/original_imsdk.png')
python3 imsdk_split.py --video-source "$IMSDK_VIDEO_SOURCE"

# frame: (224, 224, 3)  original: (1080, 1920, 3)
# Timings: frame_ready->transform_done: 5.51ms, transform_done->pipeline_finished: 4.41ms (total 9.92ms)
qtivtransform combines crop, resize, and color conversion into one GPU-accelerated step. The crop/resize time drops from ~22 ms to ~5.5 ms, and both branches share the same DMA buffer without CPU copies.

Neural Network Inference: Upstream TFLite vs qtimltflite

Running inference in Python with the TFLite CPU delegate means every frame must travel through the CPU: preprocess on CPU, invoke the interpreter on CPU, then post-process on CPU. qtimltflite moves inference into the GStreamer pipeline and routes it to the NPU via the QNN HTP delegate, keeping the frame in hardware memory the entire time.
Upstream (LiteRT CPU delegate in Python)QIM SDK (qtimltflite on NPU)
Inference locationCPU, invoked from PythonNPU (HTP), inside the pipeline
Pre-processvideoconvert + videoscale on CPUqtivtransform + qtimlvconverter on GPU
Post-processPython + PIL on CPUqtimlpostprocess inside pipeline
Frame crossing CPUYes — NumPy array passed to interpreterNo — tensor stays in DMA buffer

Example: Image classification with upstream LiteRT vs qtimltflite

Both scripts run SqueezeNet on every webcam frame and print the top-5 class predictions. Upstream (LiteRT CPU delegate in Python) — upstream_inference.py:
from gst_helper import gst_grouped_frames, atomic_save_pillow_image, timing_marks_to_str, download_file_if_needed, softmax
import time, argparse, numpy as np
from ai_edge_litert.interpreter import Interpreter, load_delegate
from PIL import ImageDraw, Image

parser = argparse.ArgumentParser(description='Upstream LiteRT CPU inference')
parser.add_argument('--video-source', type=str, required=True,
    help='GStreamer video source (e.g. "v4l2src device=/dev/video2")')
args, unknown = parser.parse_known_args()

MODEL_PATH = download_file_if_needed('models/squeezenet1_1-squeezenet-1.1-w8a8.tflite', 'https://cdn.edgeimpulse.com/qc-ai-docs/models/squeezenet1_1-squeezenet-1.1-w8a8.tflite')
LABELS_PATH = download_file_if_needed('models/SqueezeNet-1.1_labels.txt', 'https://cdn.edgeimpulse.com/qc-ai-docs/models/SqueezeNet-1.1_labels.txt')

with open(LABELS_PATH, 'r') as f:
    labels = [line for line in f.read().splitlines() if line.strip()]

interpreter = Interpreter(
    model_path=MODEL_PATH,
    experimental_delegates=[load_delegate("libQnnTFLiteDelegate.so", options={"backend_type": "htp"})]
)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

PIPELINE = (
    f"{args.video_source} ! "
    "video/x-raw,width=1920,height=1080 ! "
    "identity name=frame_ready_webcam silent=false ! "
    'qtivtransform crop="<420, 0, 1080, 1080>" ! '
    "video/x-raw,format=RGB,width=224,height=224 ! "
    "identity name=transform_done silent=false ! "
    "queue max-size-buffers=2 leaky=downstream ! "
    "appsink name=frame drop=true sync=false max-buffers=1 emit-signals=true"
)

for frames_by_sink, marks in gst_grouped_frames(PIPELINE):
    inference_start = time.perf_counter()
    interpreter.set_tensor(input_details[0]['index'], frames_by_sink['frame'].reshape((1, 224, 224, 3)))
    interpreter.invoke()
    q_output = interpreter.get_tensor(output_details[0]['index'])
    scale, zero_point = output_details[0]['quantization']
    f_output = (q_output.astype(np.float32) - zero_point) * scale
    scores = softmax(f_output[0])
    marks['inference_done'] = list(marks.items())[-1][1] + (time.perf_counter() - inference_start)

    top_k = scores.argsort()[-5:][::-1]
    print('Top-5 predictions:')
    for i in top_k:
        print(f'  {labels[i]}: {scores[i]:.4f}')

    image_composition_start = time.perf_counter()
    frame = frames_by_sink['frame']
    img = Image.fromarray(frame)
    img_draw = ImageDraw.Draw(img)
    img_draw.text((10, 10), f"{labels[top_k[0]]} ({scores[top_k[0]]:.2f})", fill="black")
    atomic_save_pillow_image(img=img, path='out/upstream_prediction.png')
    marks['image_composition_end'] = list(marks.items())[-1][1] + (time.perf_counter() - image_composition_start)

    print('Timings:', timing_marks_to_str(marks))
# We use '| grep -v "<W>"' to filter out some warnings - you can omit it if you want.
python3 upstream_inference.py --video-source "$IMSDK_VIDEO_SOURCE" | grep -v "<W>"

# Top-5 predictions:
#   laptop: 0.4952
#   notebook: 0.3394
#   computer keyboard: 0.0750
#   space bar: 0.0621
#   typewriter keyboard: 0.0078
# Timings: frame_ready_webcam->transform_done: 6.81ms, transform_done->pipeline_finished: 0.77ms, pipeline_finished->inference_done: 1.19ms, inference_done->image_composition_end: 37.85ms (total 46.61ms)
QIM SDK (qtimltflite on NPU) — imsdk_inference.py:
from gst_helper import gst_grouped_frames, timing_marks_to_str, download_file_if_needed
import argparse

parser = argparse.ArgumentParser(description='QIM SDK NPU inference with qtimltflite')
parser.add_argument('--video-source', type=str, required=True,
    help='GStreamer video source (e.g. "v4l2src device=/dev/video2")')
args, unknown = parser.parse_known_args()

MODEL_PATH = download_file_if_needed('models/squeezenet1_1-squeezenet-1.1-w8a8.tflite', 'https://cdn.edgeimpulse.com/qc-ai-docs/models/squeezenet1_1-squeezenet-1.1-w8a8.tflite')
LABELS_PATH = download_file_if_needed('models/SqueezeNet-1.1_labels.txt', 'https://cdn.edgeimpulse.com/qc-ai-docs/models/SqueezeNet-1.1_labels.txt')

PIPELINE = (
    f"{args.video_source} ! "
    "video/x-raw,width=1920,height=1080 ! "
    "identity name=frame_ready_webcam silent=false ! "
    "qtivtransform ! "
    "video/x-raw,format=NV12 ! "
    "identity name=transform_done silent=false ! "
    "qtimlvconverter ! neural-network/tensors,type=UINT8,dimensions=<<1,224,224,3>> ! "
    "identity name=conversion_done silent=false ! "
    f'qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so '
    f'external-delegate-options="QNNExternalDelegate,backend_type=htp;" model="{MODEL_PATH}" ! '
    "identity name=inference_done silent=false ! "
    f'qtimlpostprocess name=postproc module=mobilenet-softmax '
    f'labels="{LABELS_PATH}" results=5 settings="{{\\"confidence\\": 10.0}}" ! '
    "text/x-raw,format=utf8 ! "
    "queue max-size-buffers=2 leaky=downstream ! "
    "appsink name=predictions drop=true sync=false max-buffers=1 emit-signals=true"
)

for frames_by_sink, marks in gst_grouped_frames(PIPELINE):
    cls_text = frames_by_sink['predictions'].tobytes().decode('utf-8')
    print('Predictions:', cls_text[:120], '...')
    print('Timings:', timing_marks_to_str(marks))
# We use '| grep -v "<W>"' to filter out some warnings - you can omit it if you want.
python3 imsdk_inference.py --video-source "$IMSDK_VIDEO_SOURCE" | grep -v "<W>"

# Predictions: { (structure)"ImageClassification\, labels\=...laptop...notebook... }
# Timings: frame_ready_webcam->transform_done: 7.29ms, transform_done->conversion_done: 0.83ms, conversion_done->inference_done: 1.25ms, inference_done->postproc_done: 0.45ms (total 9.82ms)
Moving inference into qtimltflite cuts the total pipeline time from ~47 ms to ~10 ms. The frame never leaves hardware memory — qtivtransformqtimlvconverterqtimltfliteqtimlpostprocess all operate on the same DMA buffer, and the NPU runs the model ~37x faster than the CPU delegate for this workload.
qtimltflite requires a tightly packed buffer. Use video/x-raw,format=NV12 after qtivtransform before passing to qtimlvconverter, as RGB output uses row-stride padding that causes silent misalignment.

How QIM SDK Delivers These Benefits

What developers get
  • One pipeline for capture, preprocess, inference, overlay, encode, and streaming.
  • One metadata flow for pixels and AI results.
  • Less custom glue code between multimedia and ML stages.
How QIM SDK delivers it
  • Builds on GStreamer, so media and AI elements live in the same pipeline graph.
  • Carries detections, labels, keypoints, and tensor info as frame metadata.
  • Supports sequential flows like detection → classification and parallel flows using tee + qtimetamux.
  • Supports cross-pipeline designs with qtisocketsink and qtisocketsrc when applications need modular stages.
Developer takeaway You describe the full application as a pipeline instead of stitching together separate camera, inference, and rendering subsystems.
What developers get
  • Lower latency.
  • Lower CPU load.
  • Better throughput for multi-stream and real-time AI.
How QIM SDK delivers it
  • ISP: camera ingest and imaging path use Qualcomm imaging hardware.
  • GPU: plugins such as qtimlvconverter, qtivtransform, qtivoverlay, and qtivcomposer offload resize, color convert, crop, rotate, overlay, and composition work.
  • NPU: inference workloads run through supported runtimes on dedicated AI hardware.
  • VPU: v4l2h264dec, v4l2h265dec, v4l2h264enc, and v4l2h265enc handle video decode/encode efficiently.
  • Zero-copy + buffer pools: frames move through the pipeline as DMA-buf handles with fewer copies and lower allocation overhead.
Developer takeaway Most performance-critical stages are expressed by choosing the right plugins, not by writing accelerator-specific application code.
What developers get
  • Hundreds of pre-trained, profiled, quantized models ready to deploy.
  • No manual conversion, calibration, or accuracy validation.
  • Models arrive as .tflite, .dlc, or .bin files compatible with IM SDK runtimes.
How QIM SDK delivers it
  • Models from AI Hub are exported targeting specific Qualcomm SoC backends (HTP, GPU, CPU).
  • qtimltflite, qtimlsnpe, and qtimlqnn load these models directly — no format adaptation layer.
  • Confidence thresholds, post-processing modules, and label files are supplied alongside the model, matching the IM SDK qtimlpostprocess interface.
Developer takeaway Browse AI Hub, download a model, drop it into the pipeline — no custom integration work required.
What developers get
  • Frames never copied through CPU memory between accelerator stages.
  • Lower peak memory bandwidth utilization.
  • Reduced end-to-end pipeline latency, especially at 1080p and above.
How QIM SDK delivers it
  • Plugins negotiate DMA-buf allocation through GStreamer’s allocation query mechanism.
  • ISP (qticamsrc) → GPU (qtimlvconverter, qtivtransform) → NPU (qtimltflite) → display (waylandsink) all share the same buffer handle.
  • The capture-io-mode=dmabuf and output-io-mode=dmabuf-import properties on decode/encode plugins enable hardware-to-hardware buffer passing.
Developer takeaway Zero-copy is opt-in via plugin properties, not a framework limitation — the pipeline negotiates it automatically when all stages support it.
What developers get
  • Built-in modules for detection, classification, segmentation, pose, audio, and super-resolution.
  • Custom module support without SDK recompilation.
  • Runtime module swap — change post-processing logic by updating a .so file on the device.
How QIM SDK delivers it
  • qtimlpostprocess loads libml-postprocess-<name>.so at runtime via dlopen.
  • Each module implements Caps(), Configure(), and Process() from the IModule interface.
  • Modules are deployed to /usr/lib/gstreamer-1.0/ml/modules/ — no pipeline changes required after deployment.
Developer takeaway Swap or extend post-processing without touching the pipeline definition or recompiling the application.
What developers get
  • Hardware-demosaiced, ISP-processed frames delivered directly to downstream AI and multimedia stages.
  • Multi-camera support (primary + secondary sensor).
  • Zero-copy from ISP to GPU/NPU via GBM/DMA-buf.
How QIM SDK delivers it
  • qticamsrc exposes a GStreamer source interface over the Qualcomm camera service.
  • Outputs video/x-raw,format=NV12 — directly consumable by qtimlvconverter, qtivtransform, and encode plugins.
  • Camera properties (resolution, framerate, camera index) are set via standard GStreamer element properties.
Developer takeaway A single pipeline element replaces the entire camera driver integration layer — no V4L2 or camera HAL code to manage.
What developers get
  • Familiar pipeline construction.
  • Easier integration with upstream plugins.
  • Better portability of application logic.
How QIM SDK delivers it
  • Follows standard GStreamer plugin architecture and negotiation rules.
  • Uses caps negotiation for format compatibility.
  • Uses allocation queries for memory negotiation and zero-copy paths.
  • Reuses standard image formats such as NV12, RGBA, and I420.
  • Reuses standard metadata where possible and adds AI-specific metadata only when needed.
Developer takeaway If your team already knows GStreamer, QIM SDK fits into the same workflow with Qualcomm-optimized building blocks.
What developers get
  • Freedom to choose runtime per use case.
  • Support for common edge AI workloads.
  • A path from prototype to optimized deployment.
How QIM SDK delivers it
  • Supports TFLite, SNPE, and QNN runtime paths.
  • Supports use cases such as detection, classification, segmentation, pose, super resolution, and audio AI.
  • Supports chained and parallel model execution in the same media flow.
  • Preserves model outputs as metadata so downstream stages can reuse them.
Developer takeaway You can keep one pipeline architecture while changing the model, runtime, or accelerator strategy based on product needs.

Developer Notes

Zero-copy pipelines keep the same frame buffer moving across decode, preprocess, inference, overlay, and encode stages. This reduces memory traffic, CPU overhead, and end-to-end latency — especially important for high-resolution and multi-stream workloads.
A typical AI pipeline must resize, color-convert, and normalize frames before inference. QIM SDK offloads these steps through GPU-accelerated plugins instead of doing them in generic CPU-only stages, which helps preserve frame rate and lowers CPU utilization.
Each example shows the same operation built with upstream GStreamer CPU elements and then with QIM SDK hardware-accelerated plugins. Timing numbers are measured on IQ9 with a USB camera at 1080p input.