Why developers should use the Qualcomm Intelligent Multimedia SDK
Single, Unified Multimedia + AI Framework
Combines multimedia pipelines and ML inference into a single, coherent developer experience.
Hardware-Accelerated by Default
QIM SDK natively exploits QCOM accelerators including CPU, GPU, NPU, ISP, VPU and DSP through optimized plugins
Gstreamer-Based, Industry-Standard Pipelines
Built on Gstreamer, providing a mature, composable pipeline model with 40+ Qualcomm-optimized plugins.
Flexible AI Model Support
SDK supports running models from common frameworks such as TFLite/LiteRT, ONNX, and QNN.
Qualcomm AI Hub Integration
Browse, download, and deploy hundreds of pre-trained, quantized models directly from Qualcomm AI Hub. Models arrive ready to run — no manual conversion, calibration, or profiling required.
Plugin Performance: QIM SDK vs Upstream GStreamer
AI Preprocessing: Upstream vs QIM SDK (qtimlvconverter)
Example: AI preprocessing with upstream GStreamer vs qtimlvconverter
This example shows a 1080p frame being prepared for neural network inference. The upstream approach chains videoconvert and videoscale on the CPU. The QIM SDK approach uses qtivtransform to convert to NV12 on the GPU, then qtimlvconverter to pack it into a tensor buffer ready for qtimltflite.
Upstream (CPU) — upstream_preprocess.py:
qtivtransform + qtimlvconverter) — imsdk_preprocess.py:
qtivtransform handles the GPU color conversion and qtimlvconverter packs the NV12 buffer into a flat UINT8 tensor in a single pass, ready to hand directly to qtimltflite without any CPU copy.
Video Pipeline: Upstream vs QIM SDK (VTransform, VSplit, VComposer, Overlay)
Example: Video transform and multi-stream split with upstream GStreamer vs QIM SDK
This example shows the same two-branch pipeline — one branch getting the original 1080p frame, the other getting a cropped and resized 224×224 frame — built with upstream elements and then with QIM SDK plugins. Upstream (CPU) —upstream_split.py:
qtivtransform) — imsdk_split.py:
qtivtransform combines crop, resize, and color conversion into one GPU-accelerated step. The crop/resize time drops from ~22 ms to ~5.5 ms, and both branches share the same DMA buffer without CPU copies.
Neural Network Inference: Upstream TFLite vs qtimltflite
Running inference in Python with the TFLite CPU delegate means every frame must travel through the CPU: preprocess on CPU, invoke the interpreter on CPU, then post-process on CPU. qtimltflite moves inference into the GStreamer pipeline and routes it to the NPU via the QNN HTP delegate, keeping the frame in hardware memory the entire time.
| Upstream (LiteRT CPU delegate in Python) | QIM SDK (qtimltflite on NPU) | |
|---|---|---|
| Inference location | CPU, invoked from Python | NPU (HTP), inside the pipeline |
| Pre-process | videoconvert + videoscale on CPU | qtivtransform + qtimlvconverter on GPU |
| Post-process | Python + PIL on CPU | qtimlpostprocess inside pipeline |
| Frame crossing CPU | Yes — NumPy array passed to interpreter | No — tensor stays in DMA buffer |
Example: Image classification with upstream LiteRT vs qtimltflite
Both scripts run SqueezeNet on every webcam frame and print the top-5 class predictions.
Upstream (LiteRT CPU delegate in Python) — upstream_inference.py:
qtimltflite on NPU) — imsdk_inference.py:
qtimltflite cuts the total pipeline time from ~47 ms to ~10 ms. The frame never leaves hardware memory — qtivtransform → qtimlvconverter → qtimltflite → qtimlpostprocess all operate on the same DMA buffer, and the NPU runs the model ~37x faster than the CPU delegate for this workload.
qtimltflite requires a tightly packed buffer. Use video/x-raw,format=NV12 after qtivtransform before passing to qtimlvconverter, as RGB output uses row-stride padding that causes silent misalignment.How QIM SDK Delivers These Benefits
Single, Unified Multimedia + AI Framework
Single, Unified Multimedia + AI Framework
What developers get
- One pipeline for capture, preprocess, inference, overlay, encode, and streaming.
- One metadata flow for pixels and AI results.
- Less custom glue code between multimedia and ML stages.
- Builds on GStreamer, so media and AI elements live in the same pipeline graph.
- Carries detections, labels, keypoints, and tensor info as frame metadata.
- Supports sequential flows like detection → classification and parallel flows using
tee+qtimetamux. - Supports cross-pipeline designs with
qtisocketsinkandqtisocketsrcwhen applications need modular stages.
Hardware-Accelerated by Default
Hardware-Accelerated by Default
What developers get
- Lower latency.
- Lower CPU load.
- Better throughput for multi-stream and real-time AI.
- ISP: camera ingest and imaging path use Qualcomm imaging hardware.
- GPU: plugins such as
qtimlvconverter,qtivtransform,qtivoverlay, andqtivcomposeroffload resize, color convert, crop, rotate, overlay, and composition work. - NPU: inference workloads run through supported runtimes on dedicated AI hardware.
- VPU:
v4l2h264dec,v4l2h265dec,v4l2h264enc, andv4l2h265enchandle video decode/encode efficiently. - Zero-copy + buffer pools: frames move through the pipeline as DMA-buf handles with fewer copies and lower allocation overhead.
Qualcomm AI Hub Integration
Qualcomm AI Hub Integration
What developers get
- Hundreds of pre-trained, profiled, quantized models ready to deploy.
- No manual conversion, calibration, or accuracy validation.
- Models arrive as
.tflite,.dlc, or.binfiles compatible with IM SDK runtimes.
- Models from AI Hub are exported targeting specific Qualcomm SoC backends (HTP, GPU, CPU).
qtimltflite,qtimlsnpe, andqtimlqnnload these models directly — no format adaptation layer.- Confidence thresholds, post-processing modules, and label files are supplied alongside the model, matching the IM SDK
qtimlpostprocessinterface.
DMA Zero-Copy Buffer Sharing
DMA Zero-Copy Buffer Sharing
What developers get
- Frames never copied through CPU memory between accelerator stages.
- Lower peak memory bandwidth utilization.
- Reduced end-to-end pipeline latency, especially at 1080p and above.
- Plugins negotiate DMA-buf allocation through GStreamer’s allocation query mechanism.
- ISP (
qticamsrc) → GPU (qtimlvconverter,qtivtransform) → NPU (qtimltflite) → display (waylandsink) all share the same buffer handle. - The
capture-io-mode=dmabufandoutput-io-mode=dmabuf-importproperties on decode/encode plugins enable hardware-to-hardware buffer passing.
Loadable Post-Processing Modules
Loadable Post-Processing Modules
What developers get
- Built-in modules for detection, classification, segmentation, pose, audio, and super-resolution.
- Custom module support without SDK recompilation.
- Runtime module swap — change post-processing logic by updating a
.sofile on the device.
qtimlpostprocessloadslibml-postprocess-<name>.soat runtime viadlopen.- Each module implements
Caps(),Configure(), andProcess()from theIModuleinterface. - Modules are deployed to
/usr/lib/gstreamer-1.0/ml/modules/— no pipeline changes required after deployment.
ISP Camera Integration
ISP Camera Integration
What developers get
- Hardware-demosaiced, ISP-processed frames delivered directly to downstream AI and multimedia stages.
- Multi-camera support (primary + secondary sensor).
- Zero-copy from ISP to GPU/NPU via GBM/DMA-buf.
qticamsrcexposes a GStreamer source interface over the Qualcomm camera service.- Outputs
video/x-raw,format=NV12— directly consumable byqtimlvconverter,qtivtransform, and encode plugins. - Camera properties (resolution, framerate, camera index) are set via standard GStreamer element properties.
GStreamer-Based, Industry-Standard Pipelines
GStreamer-Based, Industry-Standard Pipelines
What developers get
- Familiar pipeline construction.
- Easier integration with upstream plugins.
- Better portability of application logic.
- Follows standard GStreamer plugin architecture and negotiation rules.
- Uses caps negotiation for format compatibility.
- Uses allocation queries for memory negotiation and zero-copy paths.
- Reuses standard image formats such as
NV12,RGBA, andI420. - Reuses standard metadata where possible and adds AI-specific metadata only when needed.
Flexible AI Model Support
Flexible AI Model Support
What developers get
- Freedom to choose runtime per use case.
- Support for common edge AI workloads.
- A path from prototype to optimized deployment.
- Supports
TFLite,SNPE, andQNNruntime paths. - Supports use cases such as detection, classification, segmentation, pose, super resolution, and audio AI.
- Supports chained and parallel model execution in the same media flow.
- Preserves model outputs as metadata so downstream stages can reuse them.
Developer Notes
Why zero-copy matters
Why zero-copy matters
Zero-copy pipelines keep the same frame buffer moving across decode, preprocess, inference, overlay, and encode stages. This reduces memory traffic, CPU overhead, and end-to-end latency — especially important for high-resolution and multi-stream workloads.
Why the GPU preprocessing path matters
Why the GPU preprocessing path matters
A typical AI pipeline must resize, color-convert, and normalize frames before inference. QIM SDK offloads these steps through GPU-accelerated plugins instead of doing them in generic CPU-only stages, which helps preserve frame rate and lowers CPU utilization.
How to read the examples
How to read the examples
Each example shows the same operation built with upstream GStreamer CPU elements and then with QIM SDK hardware-accelerated plugins. Timing numbers are measured on IQ9 with a USB camera at 1080p input.
