Skip to main content

GStreamer plugin architecture

The QIM SDK encapsulates hardware complexity within a modular plugin architecture, freeing developers from the burden of managing low-level platform libraries or hardware-specific details that vary across Qualcomm chipsets and generations. Each QIMSDK plugin in the SDK maps directly to a dedicated hardware accelerator — including video encode/decode, camera ISP, GPU, display, and AI/ML accelerators — giving developers a clean, unified API surface to build sophisticated multimedia and AI pipelines with minimal integration overhead Figure:Qualcomm IM SDK GStreamer plugin architecture

Camera Architecture

The QIM SDK abstracts the underlying camera driver and hardware through a client-server architecture, shielding developers from the complexities of low-level camera management and enabling rapid construction of single-stream, multi-stream, multi-client, and multi-camera applications. At its core, the qticamsrc GStreamer element operates as a client to the le-camera service — a system-level service responsible for managing the camera HAL, stream lifecycle, buffer allocation, and capture control. This separation of responsibilities allows qticamsrc to surface the full camera capability set to the GStreamer pipeline through a developer-friendly interface of source pads, camera properties, and control signals. At runtime, qticamsrc performs the following on behalf of the pipeline:
  • Opens and configures the target camera device
  • Creates and manages video and image output streams
  • Receives captured frames from the le-camera service
  • Wraps camera buffers as GstBuffer objects and pushes them downstream through the corresponding source pads
  • Returns buffers to the camera service upon downstream consumption
This architecture enables zero-copy camera output by retaining buffer ownership and stream resources within the service layer — maximizing pipeline throughput while minimizing memory overhead. Camera pipeline Camera use cases

AI Inference Architecture

The QIM SDK provides end-to-end support for AI/ML video and audio analytics pipelines, covering the full processing chain from raw media ingestion through model inference to real-time result visualization. AI pipelines in QIM SDK are built around a tensor-based, three-stage processing model: preprocessing, inference, and postprocessing. Each stage is handled by a dedicated plugin and executes in parallel across consecutive frames — source capture at frame N-2, preprocessing at N-1, inference at N, and postprocessing at N+1 — delivering maximum pipeline throughput and optimal hardware utilization. The pipeline source can be live camera frames via qticamsrc (YUV) or offline media via filesrc (with format conversion); the sink can be an on-screen display via waylandsink or local storage via filesink. The qtimlvconverter preprocessing plugin converts raw video frames into normalized tensors by performing color space conversion, resizing, and mean subtraction. It queries tensor dimensions and format requirements directly from the downstream inference plugin at runtime, making it a fully generic, model-agnostic preprocessing stage compatible with all inference backends. All inference plugins operate in tensor-in / tensor-out mode and share the same qtimlvconverter for preprocessing. QIM SDK supports the following inference engines, each with hardware delegate support for acceleration across Qualcomm’s neural processing subsystems:
PluginEngine
qtimltfliteLiteRT / TensorFlow Lite
qtimlsnpeSnapdragon Neural Processing Engine (SNPE)
qtimlqnnQualcomm Neural Network (QNN)
qtimlonnxONNX Runtime
Inference output tensors are passed directly to task-specific postprocessing plugins that decode multi-dimensional tensor data — bounding boxes, class labels, segmentation masks, and keypoints — into structured ML metadata. The postprocessing layer follows a modular sub-module architecture, allowing developers to author custom postprocessing modules to support proprietary or non-standard model architectures. Finally, the qtivoverlay plugin consumes the structured ML metadata and renders inference results directly onto the video buffer in real time. The annotated stream can then be rendered to a display, encoded and stored locally, or streamed over a network. Figure:AI Inference Architecture Run AI/ML use cases

Video Architecture

The QIM SDK provides hardware-accelerated video encode and decode through V4L2-based plugins that interface directly with the Qualcomm multimedia engine. Video encoding and decoding are performed entirely on the dedicated VPU (Video Processing Unit), offloading this work from the CPU and allowing concurrent camera capture, AI inference, and video processing without resource contention. The architecture supports AVC (H.264) and HEVC (H.265) formats for both encode and decode paths. Encoded output can be written to file via a multiplexer or streamed over the network. Decoded output is delivered as DMA-buf-backed NV12 frames, compatible with all downstream QIM SDK plugins including AI preprocessing and display.

Encode

Video frames captured from the camera via qticamsrc are passed directly to the hardware encoder. The encoded bitstream is parsed and multiplexed into an MP4 or MPEGTS container before being written to file or transmitted over the network. Video encode pipeline
ComponentDescription
qticamsrcCaptures video streams from the ISP camera. See Camera Architecture.
v4l2h264encEncodes the video stream to AVC (H.264) format using the V4L2 driver.
v4l2h265encEncodes the video stream to HEVC (H.265) format using the V4L2 driver.
h264parse / h265parseParses the encoded bitstream and inserts headers required for downstream muxing or streaming.
mp4mux / mpegtsmuxMultiplexes the encoded stream into an MP4 or MPEGTS container.
filesinkWrites the muxed container to the file system.

Decode

A video file is demultiplexed, the elementary stream is parsed, and the hardware decoder produces raw NV12 frames for display or downstream AI processing. Decode parameters are exposed as GStreamer element properties for direct application control. Video decode pipeline
ComponentDescription
filesrcReads the video container from the file system.
qtdemuxDemultiplexes the container into separate elementary streams.
h264parse / h265parseParses the encoded elementary stream before decoding.
v4l2h264decDecodes the AVC (H.264) video stream to raw NV12 frames using the V4L2 driver.
v4l2h265decDecodes the HEVC (H.265) video stream to raw NV12 frames using the V4L2 driver.
waylandsinkReceives decoded NV12 frames as GBM buffers and submits them to the Weston compositor for display.
Video encode and decode use cases Video playback use cases

Audio Architecture

Audio capture and playback in QIM SDK are handled through the pulsesrc and pulsesink GStreamer plugins, which interface with the system-level PulseAudio server. The PulseAudio server in turn communicates with the ALSA driver to interact with the underlying audio hardware. Audio encode and decode use open-source GStreamer plugins (such as flacenc, flacparse, flacdec, lamemp3enc, mpegaudioparse, mpg123audiodec) combined with the PulseAudio integration, enabling a complete audio pipeline without requiring custom hardware-specific code.

Capture

The pulsesrc plugin captures raw PCM audio from the microphone and passes it to the PulseAudio server, which handles hardware interaction via the ALSA driver. The PCM stream can then be encoded or written directly to file. Audio capture pipeline
ComponentDescription
pulsesrcCaptures PCM audio samples from the microphone via the PulseAudio server.
PulseAudio serverInterfaces with the ALSA driver to acquire audio data from the hardware.
Encode / filesinkOptionally encodes and writes the captured audio to a file.

Playback

The pulsesink plugin receives PCM audio data and routes it through the PulseAudio server for hardware playback. It supports both live audio sources and pre-encoded audio files after decoding. Audio playback pipeline
ComponentDescription
filesrcReads audio data from a file.
DecodeDecodes encoded audio (e.g., MP3, FLAC, WAV) to raw PCM using open-source decoder plugins.
pulsesinkSends PCM audio data to the PulseAudio server for playback on the audio hardware.
PulseAudio serverInterfaces with the ALSA driver to route audio to the hardware output.

Encode

Captured PCM audio is encoded using an open-source encoder, parsed, and multiplexed into a container before being written to file. Audio encoding pipeline
ComponentDescription
pulsesrcCaptures PCM audio from the microphone.
PulseAudio serverInteracts with the ALSA driver to supply raw audio data.
EncodeEncodes PCM to a compressed format (MP3, FLAC, AAC) using open-source plugins.
ParseParses the encoded bitstream before muxing.
mp4mux / mpegtsmuxMultiplexes the encoded audio into an MP4 or MPEGTS container.
filesinkWrites the container to the file system.

Decode

An encoded audio file is read, demultiplexed, decoded back to PCM, and played back through pulsesink. Audio decode pipeline
ComponentDescription
filesrcReads the audio container from the file system.
qtdemuxDemultiplexes the container to extract the audio elementary stream.
DecodeDecodes the compressed audio to raw PCM using open-source decoder plugins.
pulsesinkSends the decoded PCM to the PulseAudio server for playback.
PulseAudio serverInterfaces with the ALSA driver to route audio to the hardware output.
Audio use cases

Graphics and Display Architecture

The QIM SDK uses the Wayland display protocol with the Weston compositor to manage display composition and output. Weston runs as an independent process and handles all display composition using OpenGL ES, communicating with the display hardware through the DRM/KMS subsystem. QIM SDK plugins deliver video buffers as GBM-backed DMA-buf handles. Weston receives these buffers via the Wayland protocol and composites them directly onto the display without intermediate CPU copies, maintaining the zero-copy path from camera capture or video decode through to the screen. Weston/Wayland architecture
ComponentDescription
Wayland/GLES clientGStreamer plugins such as waylandsink implement the Wayland client protocol to submit video buffers to Weston.
Weston serverImplements the Wayland compositor. Uses KMS to configure the display and OpenGL ES with DRM for hardware-accelerated compositing.
SDM back-endDisplay hardware abstraction layer (HAL) that provides the DRM/KMS platform implementation for Weston.
libGBMBuffer allocation and sharing library. Provides a DMA-backed allocator enabling zero-copy buffer sharing between the GPU, display, and other hardware blocks.
EGL sub-driverInterface between GBM/EGL and the Wayland protocol, allowing GStreamer elements to share hardware-allocated buffers with the Weston compositor.
Display and composition use cases