i.MX 8M Plus SBC Ported NPU TensorFlow Routine

i.mx 8m plus SoM with NPU


Forlinx embedded OKMX8MP-C single-board computer is based on NXP i.MX 8M Plus processor development and design, this series of processors focus on machine learning and vision, advanced multimedia and industrial automation with high reliability. It aims to meet the needs of applications such as smart cities, industrial Internet, smart medical care, and smart transportation. Powerful quad-core or dual-core Arm® Cortex®-A53 processor clocked at up to 1.6GHz with a Neural Processing Unit (NPU) running at a maximum speed of 2.3TOPS.

The hardware board used in this article is the Forlinx embedded OKMX8MP-C SBC, the system version is Linux5.4.70+Qt5.15.0, and it mainly introduces the transplantation of the official NPU TensorFlow routine.

1. the image recognition routine of NPU

In the product manual provided by the OKMX8MP-C single-board computer, there is a chapter on the image recognition routine for the NPU on the board, which is located in /usr/bin/tensoRFlow-lite-2.3.1/examples of the EMMC partition, I will partition the EMMC mount finds the corresponding routine location for the /media partition.

the image recognition routine of NPU 1

the image recognition routine of NPU 2


Switch to EMMC startup, enter the /usr/bin/tensorflow-lite-2.3.1/examples/ directory, and run the test example:

run the test example


Then switch back to the TF card system to run, prompting an error, the nnapi of the label_image program needs dynamic link library support:

switch back to the TF card system


libm-2.30.so
libneuralnetworks.so.1.1.9
libnnrt.so.1.1.9
libArchModelSw.so
libGAL.so
libNNArchPerf.so
libOpenVX.so.1.3.0
libovxlib.so.1.1.0
libVSC.so

Among them, libm-2.30.so is linked as ld-linux-aarch64.so.1, located in the /usr/lib/aarch64-linux-gnu/ directory, if it is in /usr/lib/aarch64-linux of the transplanted target system If the library file is not under -gnu/, an error will be prompted at runtime. Copy all the above dynamic link libraries to the correct location (/usr/lib and /usr/lib/aarch64-linux-gnu/), run again:

npu test


The result is that there is no error, the runtime environment is successfully transplanted, and the tensorflow routine can be performed next.


2. TensorFlow routine verification

First use Forlinx embedded official DEMO to do the verification, and the verification results are as follows.

NPU pic 1

0.780392: 653 military unIForm
0.105882: 907 Windsor tie
0.0156863: 458 bow tie
0.0117647: 466 bulletproof vest
0.00784314: 835 suit

78% of the results match the army uniform, 10% of the results match the Windsor tie, 1% of the results match the bow tie, 1% of the results match the bulletproof vest, and less than 1% of the results match the suit. Overall, the results are quite satisfactory, NPU's The computing power is indeed OK. Running the program multiple times yields the same results, indicating that the NPU uses a fixed/static image recognition library for calculation.

Let's try another few pictures. In order to test whether the artificial intelligence computing power of this NPU can work, we collected ten pictures. Since the source code is not open, it is impossible to port the source code to your own routine:

NPU test pic 2

0.160784: 639 maillot
0.137255: 436 bathtub
0.117647: 886 velvet
0.0705882: 586 hair spray
0.0509804: 440 bearskin

NPU test pic 3

0.972549: 644 mask
0.00392157: 918 comic book
0.00392157: 904 wig
0.00392157: 797 ski mask
0.00392157: 732 plunger

NPU test pic 4

0.380392: 583 grocery store
0.321569: 957 custard apple
0.0862745: 955 banana
0.0352941: 956 jackfruit
0.027451: 954 pineapple

NPU test pic 5

0.254902: 918 comic book
0.0470588: 771 running shoe
0.0470588: 474 can opener
0.0470588: 412 apron
0.0392157: 794 shower cap

NPU test pic 6

0.52549: 922 book jacket
0.0705882: 788 shield
0.0705882: 452 bolo tie
0.0588235: 627 lighter
0.0352941: 701 paper towel

NPU test pic 7

0.121569: 656 miniskirt
0.054902: 835 suit
0.0470588: 852 television
0.0470588: 440 bearskin
0.0392157: 679 neck brace

NPU test pic 8

0.65098: 918 comic book
0.172549: 747 puck
0.0196078: 922 book jacket
0.0196078: 723 ping-pong ball
0.0117647: 806 soccer ball

NPU test pic 9

0.678431: 918 comic book
0.0784314: 418 balloon
0.0470588: 880 umbrella
0.0470588: 722 pillow
0.0156863: 644 mask

NPU test pic 10

0.184314: 585 hair slide
0.156863: 794 shower cap
0.0941176: 797 ski mask
0.0431373: 644 mask
0.0352941: 571 gasmask

The recognition results of the ten pictures are all presented in encoding. From the probability results of recognition, the computing power of the NPU of this iMX8MP development board from Forlinx is still very strong.


According to the official introduction, the i.MX 8M Plus processor has a built-in NPU, which can achieve 2.3 TOPS (Tera Operations Per Second, 1TOPS means that the processor can perform one trillion operations per second) arithmetic processing, and realize advanced AI algorithm processing. And NXP provides some specific use cases for the NPU of the i.MX 8M Plus processor, such as being able to process more than 40,000 English words, and the MobileNet v1 model can handle image classification over 500 images per second.