Norway


The last in a series of articles on machine learning and edge computing comparing Google, Intel, and NVIDIA accelerator hardware along with the Raspberry Pi 3 and 4

When the Raspberry Pi 4 was launched, I sat down to update the benchmarks I’d been putting together for the new generation of accelerator hardware intended for machine learning at the edge. Unfortunately at the , the Intel OpenVINO framework did not yet work under Raspbian Buster which meant I was unable to carry out benchmarking with the Intel hardware.

The original Movidius Stick () and newer Intel Neural Compute Stick 2 (bottom).

This changed a couple of weeks ago with the release of OpenVINO 2019.R2, so it was time to take another look at machine learning on the Raspberry Pi 4.

Headline Results From Benchmarking

Connecting the Intel Neural Compute Stick 2 to the USB 3 bus of the new Raspberry Pi 4, we do not see the dramatic ×3 increase in inferencing speed between our original results, and the new results, that we saw with the Coral USB Accelerator from Google.

Instead, for the both the Intel Neural Computer Stick 2, and the older Movidius Neural Compute Stick, we see only a moderate increase in inferencing speed when the accelerator hardware is connect via the USB 3, rather than USB 2, bus of the Raspberry Pi; we see only a 20 and 30 percent increase in speed.

However, unlike the Coral USB Accelerator where we saw inferencing slow—with inferencing times actually increase by a factor of ×2 when via the USB 2 rather than the USB 3 bus—we saw no statistically significant difference between the times recorded for inferencing when the Neural Compute Stick was to the USB 2 bus of the Raspberry Pi 4.

These results seem to suggest that, unlike the Coral USB Accelerator, the Intel Movidius-based hardware was not significantly throttled when used on the older Raspberry Pi hardware and restricted to USB 2.

The overall speed increase when using the hardware with the Raspberry Pi 4’s USB 3 bus was therefore disappointingly small, especially when compared with the Coral USB Accelerator from Google.

A More Detailed Analysis of the Results

Our original benchmarks were done using both TensorFlow and TensorFlow Lite on a Raspberry Pi 3, Model B+, and these were rerun using the new Raspberry Pi 4, Model B, with 4GB of RAM. Inferencing was carried out with the MobileNet v2 SSD and MobileNet v1 0.75 depth SSD models, both models trained on the Common Objects in Context (COCO) dataset. The Xnor.ai AI2GO platform was benchmarked using their ‘medium’ Kitchen Object Detector model. This model is a binary weight network, and while the nature of the training dataset is not known, some technical papers around the model are available.

A single 3888×2916 pixel test image was used containing two recognisable objects in the frame, a banana🍌 and an apple🍎. The image was resized down to 300×300 pixels before presenting it to each model, and the model was run 10,000 times before an average inferencing time was taken.

Benchmarking results in milli-seconds for MobileNet v1 SSD 0.75 depth model and the MobileNet v2 SSD model, both trained using the Common Objects in Context (COCO) dataset with an input size of 300×300.

Overall, comparing our new timings with our previous results little has changed. The Coral Edge TPU-based hardware keeps its place as ‘best in class’ while, without any evidence of a large speed up from the Intel hardware, the Raspberry Pi 4 running TensorFlow Lite remains competitive with both the NVIDIA Jetson Nano and the Intel Movidius hardware we tested here.

However probably the biggest takeaway for those wishing to use the new Raspberry Pi 4 for inferencing is the performance gains seen with the Coral USB Accelerator. The addition of USB 3.0 to the Raspberry Pi 4 means we see an approximate ×3 increase in inferencing speed over our original results.

Benchmarking results in milli-seconds for the Coral USB Accelerator using the MobileNet v1 SSD 0.75 depth model and the MobileNet v2 SSD model, both trained using the Common Objects in Context (COCO) dataset for the Raspberry Pi 3, Model B+ (left), and the Raspberry Pi 4, Model B over USB 3.0 (middle) and USB 2 (right).

We see no corresponding speed increase when using the Intel hardware.

Summary

Overall, a very disappointing result for the Intel Movidius-based hardware. Expecting similar speed ups to that seen with the Coral USB Accelerator, we saw only between 20 and 30 percent increase in inferencing speed when the hardware was attached to the Raspberry Pi 4’s USB 3 bus.

Preparing the Intel Neural Compute Stick 2 and Raspberry Pi

We last looked at the the Intel Neural Compute Stick 2 back in June, just after the launch of the new Raspberry Pi 4, Model B. At the time the OpenVINO framework did not work yet under Raspbian Buster, and Python 3.7. However that changed recently with the release of OpenVINO 2019.R2.

Getting Started with the Intel Neural Compute Stick 2 on Raspbian. (📹: Intel Movidius)

Installation of the OpenVINO framework has not changed significantly from our original hands on with the hardware back in April. Although the official installation instructions for Raspbian have been updated, and can now be followed without modification.

Go ahead and grab the new release and install it,

$ wget https://download.01.org/opencv/2019/openvinotoolkit/R2/l_openvino_toolkit_runtime_raspbian_p_2019.2.242.tgz
$ tar -zxvf l_openvino_toolkit_runtime_raspbian_p_2019.2.242.tgz
$ mv l_openvino_toolkit_runtime_raspbian_p_2019.2.242 openvino
$ source /home/pi/openvino/bin/setupvars.sh
[setupvars.sh] OpenVINO environment initialized

before appending the setup script to the end of your .bashrc file.

$ echo "source /home/pi/openvino/bin/setupvars.sh" >> ~/.bashrc

Then run the rules script to install new udev rules so that your Raspberry Pi can recognise the Neural Compute Stick when you plug it in.

$ sudo usermod -a -G users "$(whoami)"
$ sh openvino/install_dependencies/install_NCS_udev_rules.sh
Updating udev rules...
Udev rules have been successfully installed.
$

You should go ahead logout of the Raspberry Pi, and back in again, so that all these changes can take affect. Then plug in the Neural Compute Stick.

Checking dmesg you should see something a lot like this at the bottom,

[ 1491.382860] usb 1-1.2: new high-speed USB device number 5 using dwc_otg
[ 1491.513491] usb 1-1.2: New USB device found, idVendor=03e7, idProduct=2485
[ 1491.513504] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1491.513513] usb 1-1.2: Product: Movidius MyriadX
[ 1491.513522] usb 1-1.2: Manufacturer: Movidius Ltd.
[ 1491.513530] usb 1-1.2: SerialNumber: 03e72485

if you don’t see similar messages then the stick hasn’t been recognised. Try rebooting your Raspberry Pi and check again,

$ dmesg | grep Movidius
[ 2.062235] usb 1-1.2: Product: Movidius MyriadX
[ 2.062244] usb 1-1.2: Manufacturer: Movidius Ltd.
$

and you should that the stick has been detected.

The Benchmarking Code

The code from our previous benchmarks was reused unchanged.

Further code can be found in the official Intel Movidius Github repo.

In Closing

Comparing these on an even footing continues to be difficult. But despite the disappointing performance of the Intel hardware, it is clear that the new Raspberry Pi 4 is a solid platform for machine learning inferencing at the edge. The Coral USB Accelerator retains its ‘best in class’ result.

Links to Getting Started Guides

If you’re interested in getting started with any of the accelerator hardware I used during my benchmarks, I’ve put together getting started guides for the Google, Intel, and NVIDIA hardware I looked at during the analysis.

Links to Previous Benchmarks

This benchmarking article was the last in a series looking at accelerator hardware, and TensorFlow on the Raspberry Pi 3 and 4. If you’re interested in details of around the previous benchmarks details are below.



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here