Breaking Through On-Device AI Computing Power Barriers: RK182X Series Computing Cards Simplify Large Model Deployment
Edge AI is now in a phase where large language models are closely integrated with multimodal perception. There is a growing demand for local real-time inference, low-latency responses, and compliance with data security in various applications, including energy storage, industrial gateways, intelligent robotics, and video analytics. Deploying large models with over 3 billion parameters at the edge often encounters hardware limitations with mainstream industrial controllers such as the RK3588, RK3576, and RK3568. This is primarily due to their limited native NPU computing power and inadequate memory bandwidth.
To address the challenge in the industry of balancing strong business needs with limited on-device computing power, Rockchip has introduced the high-performance RK182X series computing cards designed specifically for AI applications. With the release of the RKNN3 SDK V1.0.4, these cards offer a comprehensive software support system for deploying AI models on-device. They feature significant enhancements in edge inference performance, model compatibility, functional interfaces, and inference accuracy, demonstrating high performance, adaptability, and energy efficiency. Simply plug them in to bridge the gap in computing power, ensuring stable and seamless deployment of LLM/VLM on edge devices.
01. 20 TOPS Dedicated AI Power, Supporting Up to 8B-Parameter Models for Local Inference
The RK182X series integrates multi-core RISC-V CPU and 3D stacked high-bandwidth DRAM, featuring a multi-core high-performance NPU with a peak computing power of up to 20 TOPS. It comprehensively supports multiple computational precisions from INT4 to FP16. Through high-speed PCIe/USB interfaces connecting with the main control device, it supports the inference and local deployment of large language/multimodal models ranging from 0.5B to 8B parameters, as well as traditional CNN models. Dedicated to on-device AI inference, it operates independently without occupying main control resources, providing dedicated computing power output.
02. Full Coverage of Mainstream Models, Breaking Algorithm Ecosystem Barriers
The RK182X computing card achieves full adaptation of mainstream AI algorithms, natively supporting three core model types: LLM (large language models), VLM (vision-language multimodal models), and CNN (convolutional neural networks). It covers full-scenario AI applications including natural language interaction, cross-modal image-text analysis, image classification/detection, and audio signal processing. With stable computing power scheduling and excellent inference latency, paired with a complete model compilation toolchain, it easily enables model quantization, adaptation optimization, and rapid deployment on embedded devices.
RK182X Supported Model List
03. Compatible with All Main Controls + Dual Systems, Enabling Low-Cost Smooth Computing Power Upgrades for Existing Industrial Equipment
The RK182X series computing cards are fully compatible with Rockchip's mainstream main controls such as RK3588, RK3576, and RK3568, and support both Linux/Android dual systems. They can be used via PCIe plug-and-play without requiring additional driver adaptations. Leveraging this architecture design, the product achieves cross-main-control and cross-system universality. Existing equipment in use can be upgraded with AI large model computing power without any modifications—no need to replace motherboards, alter device structures, or redo product certifications. Older edge gateways, industrial control hosts, and AI edge boxes can be iteratively upgraded into high-performance AI inference terminals at low cost, avoiding the high transformation costs and cycle losses associated with hardware generation replacement.
OK3588-C development board paired with the RK1828 computing card
The following shows a comparison of large model inference performance before and after pairing each main control platform with the RK182X computing card:
Test Parameter Description:
Input_Tokens and New_tokens represent the number of input/output tokens, respectively.
TPS (Tokens Per Second): The number of tokens the model can generate per second.
As a widely deployed platform, the RK3568 features a 1 TOPS integrated NPU, which is insufficient for on-device large model deployment. Its reserved PCIe interface allows the addition of 20 TOPS dedicated NPU computing power via RK1820/RK1828 accelerator cards. Existing hardware requires no modifications, enabling low-cost performance upgrades and reliable deployment of large language and multimodal models.
On the software level, Forlinx Embedded has completed in-depth driver debugging and full operator implementation verification for the entire RK182X series on both Linux and Android systems. Multiple scenarios—including industrial vision, service robots (Linux side), smart interactive all-in-ones, and commercial smart displays (Android side)—support plug-and-play functionality. A single computing card can be reused across different hardware platforms and operating systems, effectively reducing customers' inventory and post-maintenance costs. It implements an edge computing power upgrade solution characterized by ''one card fits all, revitalizing old devices.'' Based on real business scenarios considering context size and output length, please refer to the end of the document for measured on-device inference performance data of various LLM/VLM models with different parameter sizes when the RK182X computing card is paired with various RK main control platforms.
04 Energy Storage Industry: Private Knowledge Base Implementation
To address the AI-driven Q&A needs for energy storage BMS scenarios, Forlinx Embedded has developed a dedicated private knowledge base using RK3588 paired with the RK1828 accelerator card. The solution integrates ASR (speech recognition) and TTS (speech synthesis) modules, enabling fully voice-based interactions. It supports multi-level BMS equipment data queries, real-time operational status monitoring, and intelligent fault diagnosis. By accurately interpreting maintenance personnel's questions, the system facilitates continuous interactions—such as troubleshooting, data lookup, and analytical recommendations—all deployed offline at the edge without requiring internet connectivity, ensuring data locality, compliance, and security.
Core Capabilities
Local Deployment: Data remains within the facility, meeting security and compliance requirements for power storage applications.
Rapid Response: Edge-based large language model inference delivers a stable output speed of 60+ tokens/s for real-time fault diagnosis and data queries.
Plug-and-Play: Enables quick knowledge base import, voice interaction, customizable MCPs, and standardized interfaces.
05 Why Choose RK182X Compute Cards?
1. Plug-and-Play
Supports PCIe/USB dual interfaces and dual systems, reducing deployment time by over 50%.
2. Full Platform Coverage
Fully compatible with RK3588/3576/3568, offering seamless performance upgrades for existing hardware.
3. Scenario-Optimized Solutions
Tailored for verticals including energy storage, industrial automation, and robotics, with full technical support.
4. Stable & Reliable
Industrial-grade quality backed by mass delivery assurance and end-to-end technical support.
The RK182X compute card series effectively addresses edge-side computational shortages, empowering cost-effective, stable, and high-speed local deployment of LLMs and VLMs.
The following are the actual performance data for on-device inference of LLM/VLM models using the RK182X computing card in conjunction with various RK controller platforms:
Ubuntu on RK3568 + RK1828 Compute Card
LLM Edge Inference Key Performance Data:
VLM Edge Inference Key Performance Data:
Ubuntu on RK3576 + RK1828 Compute Card
LLM Edge Inference Key Performance Data:
VLM Edge Inference Key Performance Data:
Android on RK3588+RK1828 Computing Card
LLM Edge Inference Key Performance Data:
VLM Edge Inference Key Performance Data:
Test Parameter Description:
- 1. The test is based on a main control SOC and an RK1820/RK1828, connected via PCIe;
- 2. TTFT: The time taken by the model to generate the first token;
- 3. TPOT: The average time required to generate each output token;
- 4. TPS: The number of tokens the model can generate per second;
- 5. The time taken for VLM's Vision and LLM was measured in separate tests;
The RK182X series computing cards will be available soon – stay tuned for updates!



