RK182X Compute Cards: Accelerating Edge LLM & VLM Deployment - Blog

News
Blog

Breaking Through On-Device AI Computing Power Barriers: RK182X Series Computing Cards Simplify Large Model Deployment

Edge AI is now in a phase where large language models are closely integrated with multimodal perception. There is a growing demand for local real-time inference, low-latency responses, and compliance with data security in various applications, including energy storage, industrial gateways, intelligent robotics, and video analytics. Deploying large models with over 3 billion parameters at the edge often encounters hardware limitations with mainstream industrial controllers such as the RK3588, RK3576, and RK3568. This is primarily due to their limited native NPU computing power and inadequate memory bandwidth.

To address the challenge in the industry of balancing strong business needs with limited on-device computing power, Rockchip has introduced the high-performance RK182X series computing cards designed specifically for AI applications. With the release of the RKNN3 SDK V1.0.4, these cards offer a comprehensive software support system for deploying AI models on-device. They feature significant enhancements in edge inference performance, model compatibility, functional interfaces, and inference accuracy, demonstrating high performance, adaptability, and energy efficiency. Simply plug them in to bridge the gap in computing power, ensuring stable and seamless deployment of LLM/VLM on edge devices.

Rockchip RK182X series high-performance hardware computing card designed for hardware-accelerated local edge AI deployment

01. 20 TOPS Dedicated AI Power, Supporting Up to 8B-Parameter Models for Local Inference

The RK182X series integrates multi-core RISC-V CPU and 3D stacked high-bandwidth DRAM, featuring a multi-core high-performance NPU with a peak computing power of up to 20 TOPS. It comprehensively supports multiple computational precisions from INT4 to FP16. Through high-speed PCIe/USB interfaces connecting with the main control device, it supports the inference and local deployment of large language/multimodal models ranging from 0.5B to 8B parameters, as well as traditional CNN models. Dedicated to on-device AI inference, it operates independently without occupying main control resources, providing dedicated computing power output.

Hardware block diagram showing 20 TOPS dedicated NPU architecture, multi-core RISC-V CPU, and 3D stacked high-bandwidth DRAM integration

02. Full Coverage of Mainstream Models, Breaking Algorithm Ecosystem Barriers

The RK182X computing card achieves full adaptation of mainstream AI algorithms, natively supporting three core model types: LLM (large language models), VLM (vision-language multimodal models), and CNN (convolutional neural networks). It covers full-scenario AI applications including natural language interaction, cross-modal image-text analysis, image classification/detection, and audio signal processing. With stable computing power scheduling and excellent inference latency, paired with a complete model compilation toolchain, it easily enables model quantization, adaptation optimization, and rapid deployment on embedded devices.

RK182X Supported Model List

Comprehensive list of supported LLM, VLM, and CNN AI models compatible with the RK182X hardware compilation toolchain

03. Compatible with All Main Controls + Dual Systems, Enabling Low-Cost Smooth Computing Power Upgrades for Existing Industrial Equipment

The RK182X series computing cards are fully compatible with Rockchip's mainstream main controls such as RK3588, RK3576, and RK3568, and support both Linux/Android dual systems. They can be used via PCIe plug-and-play without requiring additional driver adaptations. Leveraging this architecture design, the product achieves cross-main-control and cross-system universality. Existing equipment in use can be upgraded with AI large model computing power without any modifications—no need to replace motherboards, alter device structures, or redo product certifications. Older edge gateways, industrial control hosts, and AI edge boxes can be iteratively upgraded into high-performance AI inference terminals at low cost, avoiding the high transformation costs and cycle losses associated with hardware generation replacement.

Hardware demonstration of the OK3588-C development board equipped and paired with the RK1828 computing card via PCIe interface

OK3588-C development board paired with the RK1828 computing card

The following shows a comparison of large model inference performance before and after pairing each main control platform with the RK182X computing card:

Performance benchmark comparison chart displaying large model inference metrics before and after upgrading main control SOCs with the RK182X card

Test Parameter Description:

Input_Tokens and New_tokens represent the number of input/output tokens, respectively.
TPS (Tokens Per Second): The number of tokens the model can generate per second.

As a widely deployed platform, the RK3568 features a 1 TOPS integrated NPU, which is insufficient for on-device large model deployment. Its reserved PCIe interface allows the addition of 20 TOPS dedicated NPU computing power via RK1820/RK1828 accelerator cards. Existing hardware requires no modifications, enabling low-cost performance upgrades and reliable deployment of large language and multimodal models.

On the software level, Forlinx Embedded has completed in-depth driver debugging and full operator implementation verification for the entire RK182X series on both Linux and Android systems. Multiple scenarios—including industrial vision, service robots (Linux side), smart interactive all-in-ones, and commercial smart displays (Android side)—support plug-and-play functionality. A single computing card can be reused across different hardware platforms and operating systems, effectively reducing customers' inventory and post-maintenance costs. It implements an edge computing power upgrade solution characterized by ''one card fits all, revitalizing old devices.'' Based on real business scenarios considering context size and output length, please refer to the end of the document for measured on-device inference performance data of various LLM/VLM models with different parameter sizes when the RK182X computing card is paired with various RK main control platforms.

04 Energy Storage Industry: Private Knowledge Base Implementation

To address the AI-driven Q&A needs for energy storage BMS scenarios, Forlinx Embedded has developed a dedicated private knowledge base using RK3588 paired with the RK1828 accelerator card. The solution integrates ASR (speech recognition) and TTS (speech synthesis) modules, enabling fully voice-based interactions. It supports multi-level BMS equipment data queries, real-time operational status monitoring, and intelligent fault diagnosis. By accurately interpreting maintenance personnel's questions, the system facilitates continuous interactions—such as troubleshooting, data lookup, and analytical recommendations—all deployed offline at the edge without requiring internet connectivity, ensuring data locality, compliance, and security.

Core Capabilities

Local Deployment: Data remains within the facility, meeting security and compliance requirements for power storage applications.
Rapid Response: Edge-based large language model inference delivers a stable output speed of 60+ tokens/s for real-time fault diagnosis and data queries.
Plug-and-Play: Enables quick knowledge base import, voice interaction, customizable MCPs, and standardized interfaces.

05 Why Choose RK182X Compute Cards?

1. Plug-and-Play

Supports PCIe/USB dual interfaces and dual systems, reducing deployment time by over 50%.

2. Full Platform Coverage

Fully compatible with RK3588/3576/3568, offering seamless performance upgrades for existing hardware.

3. Scenario-Optimized Solutions

Tailored for verticals including energy storage, industrial automation, and robotics, with full technical support.

4. Stable & Reliable

Industrial-grade quality backed by mass delivery assurance and end-to-end technical support.

The RK182X compute card series effectively addresses edge-side computational shortages, empowering cost-effective, stable, and high-speed local deployment of LLMs and VLMs.

The following are the actual performance data for on-device inference of LLM/VLM models using the RK182X computing card in conjunction with various RK controller platforms:

Ubuntu on RK3568 + RK1828 Compute Card

LLM Edge Inference Key Performance Data:

Performance statistics chart showing LLM edge inference benchmark results on an Ubuntu-based RK3568 paired with an RK1828 compute card

VLM Edge Inference Key Performance Data:

Performance statistics chart showing VLM edge inference benchmark results on an Ubuntu-based RK3568 paired with an RK1828 compute card

Ubuntu on RK3576 + RK1828 Compute Card

LLM Edge Inference Key Performance Data:

Performance statistics chart showing LLM edge inference benchmark results on an Ubuntu-based RK3576 paired with an RK1828 compute card

VLM Edge Inference Key Performance Data:

Performance statistics chart showing VLM edge inference benchmark results on an Ubuntu-based RK3576 paired with an RK1828 compute card

Android on RK3588+RK1828 Computing Card

LLM Edge Inference Key Performance Data:

Performance statistics chart showing LLM edge inference benchmark results on an Android-based RK3588 paired with an RK1828 computing card

VLM Edge Inference Key Performance Data:

Performance statistics chart showing VLM edge inference benchmark results on an Android-based RK3588 paired with an RK1828 computing card

Test Parameter Description:

1. The test is based on a main control SOC and an RK1820/RK1828, connected via PCIe;
2. TTFT: The time taken by the model to generate the first token;
3. TPOT: The average time required to generate each output token;
4. TPS: The number of tokens the model can generate per second;
5. The time taken for VLM's Vision and LLM was measured in separate tests;

The RK182X series computing cards will be available soon – stay tuned for updates!

Contact Sales Team

Our sales team will connect you with FAE engineers for one-on-one technical support.

Talk to Our Engineers

Get a Quote

Get pricing and project evaluation support from our team.

Request a Quote

Apply for Samples

Submit your request to receive product samples for evaluation.

Get Samples

Join Facebook Group

Get Forlinx technical updates and hands-on sharing from our experts.

Join Now

Related Products:

OK3568-C Single Board Computer

OK3588-C Single Board Computer

OK3576-C Single Board Computer

Breaking Through On-Device AI Computing Power Barriers: RK182X Series Computing Cards Simplify Large Model Deployment

01. 20 TOPS Dedicated AI Power, Supporting Up to 8B-Parameter Models for Local Inference

02. Full Coverage of Mainstream Models, Breaking Algorithm Ecosystem Barriers

03. Compatible with All Main Controls + Dual Systems, Enabling Low-Cost Smooth Computing Power Upgrades for Existing Industrial Equipment

04 Energy Storage Industry: Private Knowledge Base Implementation

05 Why Choose RK182X Compute Cards?

1. Plug-and-Play

2. Full Platform Coverage

3. Scenario-Optimized Solutions

4. Stable & Reliable

Ubuntu on RK3568 + RK1828 Compute Card

Ubuntu on RK3576 + RK1828 Compute Card

Android on RK3588+RK1828 Computing Card

Test Parameter Description:

Contact Sales Team

Get a Quote

Apply for Samples

Join Facebook Group

Related Products:

Product

Service

Company

Contact Us