Research

20 pages, 1066 KiB

Open AccessArticle

The Potential of SoC FPAAs for Emerging Ultra-Low-Power Machine Learning

by Jennifer Hasler

J. Low Power Electron. Appl. 2022, 12(2), 33; https://0-doi-org.brum.beds.ac.uk/10.3390/jlpea12020033 - 06 Jun 2022

Cited by 3 | Viewed by 3606

Abstract

Large-scale field-programmable analog arrays (FPAA) have the potential to handle machine inference and learning applications with significantly low energy requirements, potentially alleviating the high cost of these processes today, even in cloud-based systems. FPAA devices enable embedded machine learning, one form of physical [...] Read more.

Large-scale field-programmable analog arrays (FPAA) have the potential to handle machine inference and learning applications with significantly low energy requirements, potentially alleviating the high cost of these processes today, even in cloud-based systems. FPAA devices enable embedded machine learning, one form of physical mixed-signal computing, enabling machine learning and inference on low-power embedded platforms, particularly edge platforms. This discussion reviews the current capabilities of large-scale field-programmable analog arrays (FPAA), as well as considering the future potential of these SoC FPAA devices, including questions that enable ubiquitous use of FPAA devices similar to FPGA devices. Today’s FPAA devices include integrated analog and digital fabric, as well as specialized processors and infrastructure, becoming a platform of mixed-signal development and analog-enabled computing. We address and show that next-generation FPAAs can handle the required load of 10,000–10,000,000,000 PMAC, required for present and future large fielded applications, at orders of magnitude of lower energy levels than those expected by current technology, motivating the need to develop these new generations of FPAA devices. Full article

(This article belongs to the Special Issue Low Power AI)

► Show Figures

Figure 1

24 pages, 1645 KiB

Open AccessArticle

Big–Little Adaptive Neural Networks on Low-Power Near-Subthreshold Processors

by Zichao Shen, Neil Howard and Jose Nunez-Yanez

J. Low Power Electron. Appl. 2022, 12(2), 28; https://0-doi-org.brum.beds.ac.uk/10.3390/jlpea12020028 - 18 May 2022

Cited by 3 | Viewed by 2883

Abstract

This paper investigates the energy savings that near-subthreshold processors can obtain in edge AI applications and proposes strategies to improve them while maintaining the accuracy of the application. The selected processors deploy adaptive voltage scaling techniques in which the frequency and voltage levels [...] Read more.

This paper investigates the energy savings that near-subthreshold processors can obtain in edge AI applications and proposes strategies to improve them while maintaining the accuracy of the application. The selected processors deploy adaptive voltage scaling techniques in which the frequency and voltage levels of the processor core are determined at the run-time. In these systems, embedded RAM and flash memory size is typically limited to less than 1 megabyte to save power. This limited memory imposes restrictions on the complexity of the neural networks model that can be mapped to these devices and the required trade-offs between accuracy and battery life. To address these issues, we propose and evaluate alternative ‘big–little’ neural network strategies to improve battery life while maintaining prediction accuracy. The strategies are applied to a human activity recognition application selected as a demonstrator that shows that compared to the original network, the best configurations obtain an energy reduction measured at 80% while maintaining the original level of inference accuracy. Full article

(This article belongs to the Special Issue Low Power AI)

► Show Figures

Figure 1

17 pages, 425 KiB

Open AccessArticle

A Generalistic Approach to Machine-Learning-Supported Task Migration on Real-Time Systems

by Octavio Delgadillo, Bernhard Blieninger, Juri Kuhn and Uwe Baumgarten

J. Low Power Electron. Appl. 2022, 12(2), 26; https://0-doi-org.brum.beds.ac.uk/10.3390/jlpea12020026 - 03 May 2022

Viewed by 2270

Abstract

Consolidating tasks to a smaller number of electronic control units (ECUs) is an important strategy for optimizing costs and resources in the automotive industry. In our research, we aim to enable ECU consolidation by migrating tasks at runtime between different ECUs, which adds [...] Read more.

Consolidating tasks to a smaller number of electronic control units (ECUs) is an important strategy for optimizing costs and resources in the automotive industry. In our research, we aim to enable ECU consolidation by migrating tasks at runtime between different ECUs, which adds redundancy and fail-safety capabilities to the system. In this paper, we present a setup with a generalistic and modular architecture that allows for integrating and testing different ECU architectures and machine learning (ML) models. As part of a holistic testbed, we introduce a collection of reproducible tasks, as well as a toolchain that controls the dynamic migration of tasks depending on ECU status and load. The migration is aided by the machine learning predictions on the schedulability analysis of possible future task distributions. To demonstrate the capabilities of the setup, we show its integration with FreeRTOS-based ECUs and two ML models—a long short-term memory (LSTM) network and a spiking neural network—along with a collection of tasks to distribute among the ECUs. Our approach shows a promising potential for machine-learning-based schedulability analysis and enables a comparison between different ML models. Full article

(This article belongs to the Special Issue Low Power AI)

► Show Figures

Figure 1

20 pages, 2577 KiB

Open AccessArticle

Low-Power Deep Learning Model for Plant Disease Detection for Smart-Hydroponics Using Knowledge Distillation Techniques

by Aminu Musa, Mohammed Hassan, Mohamed Hamada and Farouq Aliyu

J. Low Power Electron. Appl. 2022, 12(2), 24; https://0-doi-org.brum.beds.ac.uk/10.3390/jlpea12020024 - 26 Apr 2022

Cited by 7 | Viewed by 4078

Abstract

Recent advances in computing allows researchers to propose the automation of hydroponic systems to boost efficiency and reduce manpower demands, hence increasing agricultural produce and profit. A completely automated hydroponic system should be equipped with tools capable of detecting plant diseases in real-time. [...] Read more.

Recent advances in computing allows researchers to propose the automation of hydroponic systems to boost efficiency and reduce manpower demands, hence increasing agricultural produce and profit. A completely automated hydroponic system should be equipped with tools capable of detecting plant diseases in real-time. Despite the availability of deep-learning-based plant disease detection models, the existing models are not designed for an embedded system environment, and the models cannot realistically be deployed on resource-constrained IoT devices such as raspberry pi or a smartphone. Some of the drawbacks of the existing models are the following: high computational resource requirements, high power consumption, dissipates energy rapidly, and occupies large storage space due to large complex structure. Therefore, in this paper, we proposed a low-power deep learning model for plant disease detection using knowledge distillation techniques. The proposed low-power model has a simple network structure of a shallow neural network. The parameters of the model were also reduced by more than 90%. This reduces its computational requirements as well as its power consumption. The proposed low-power model has a maximum power consumption of 6.22 w, which is significantly lower compared to the existing models, and achieved a detection accuracy of 99.4%. Full article

(This article belongs to the Special Issue Low Power AI)

► Show Figures

Figure 1

16 pages, 569 KiB

Open AccessArticle

DSCU: Accelerating CNN Inference in FPGAs with Dual Sizes of Compute Unit

by Zhenshan Bao, Junnan Guo, Wenbo Zhang and Hongbo Dang

J. Low Power Electron. Appl. 2022, 12(1), 11; https://0-doi-org.brum.beds.ac.uk/10.3390/jlpea12010011 - 13 Feb 2022

Cited by 2 | Viewed by 2707

Abstract

FPGA-based accelerators have shown great potential in improving the performance of CNN inference. However, the existing FPGA-based approaches suffer from a low compute unit (CU) efficiency due to their large number of redundant computations, thus leading to high levels of performance degradation. In [...] Read more.

FPGA-based accelerators have shown great potential in improving the performance of CNN inference. However, the existing FPGA-based approaches suffer from a low compute unit (CU) efficiency due to their large number of redundant computations, thus leading to high levels of performance degradation. In this paper, we show that no single CU can perform best across all the convolutional layers (CONV-layers). To this end, we propose the use of dual sizes of compute unit (DSCU), an approach that aims to accelerate CNN inference in FPGAs. The key idea of DSCU is to select the best combination of CUs via dynamic programming scheduling for each CONV-layer and then assemble each CONV-layer combination into a computing solution for the given CNN to deploy in FPGAs. The experimental results show that DSCU can achieve a performance density of 3.36 × 10

^{- 3}

GOPs/slice on a Xilinx Zynq ZU3EG, which is 4.29 times higher than that achieved by other approaches. Full article

(This article belongs to the Special Issue Low Power AI)

► Show Figures

Figure 1

14 pages, 1356 KiB

Open AccessFeature PaperArticle

Mapping Transformation Enabled High-Performance and Low-Energy Memristor-Based DNNs

by Md. Oli-Uz-Zaman, Saleh Ahmad Khan, Geng Yuan, Zhiheng Liao, Jingyan Fu, Caiwen Ding, Yanzhi Wang and Jinhui Wang

J. Low Power Electron. Appl. 2022, 12(1), 10; https://0-doi-org.brum.beds.ac.uk/10.3390/jlpea12010010 - 10 Feb 2022

Cited by 6 | Viewed by 3063

Abstract

When deep neural network (DNN) is extensively utilized for edge AI (Artificial Intelligence), for example, the Internet of things (IoT) and autonomous vehicles, it makes CMOS (Complementary Metal Oxide Semiconductor)-based conventional computers suffer from overly large computing loads. Memristor-based devices are emerging as [...] Read more.

When deep neural network (DNN) is extensively utilized for edge AI (Artificial Intelligence), for example, the Internet of things (IoT) and autonomous vehicles, it makes CMOS (Complementary Metal Oxide Semiconductor)-based conventional computers suffer from overly large computing loads. Memristor-based devices are emerging as an option to conduct computing in memory for DNNs to make them faster, much more energy efficient, and accurate. Despite having excellent properties, the memristor-based DNNs are yet to be commercially available because of Stuck-At-Fault (SAF) defects. A Mapping Transformation (MT) method is proposed in this paper to mitigate Stuck-at-Fault (SAF) defects from memristor-based DNNs. First, the weight distribution for the VGG8 model with the CIFAR10 dataset is presented and analyzed. Then, the MT method is used for recovering inference accuracies at 0.1% to 50% SAFs with two typical cases, SA1 (Stuck-At-One): SA0 (Stuck-At-Zero) = 5:1 and 1:5, respectively. The experiment results show that the MT method can recover DNNs to their original inference accuracies (90%) when the ratio of SAFs is smaller than 2.5%. Moreover, even when the SAF is in the extreme condition of 50%, it is still highly efficient to recover the inference accuracy to 80% and 21%. What is more, the MT method acts as a regulator to avoid energy and latency overhead generated by SAFs. Finally, the immunity of the MT Method against non-linearity is investigated, and we conclude that the MT method can benefit accuracy, energy, and latency even with high non-linearity LTP = 4 and LTD = −4. Full article

(This article belongs to the Special Issue Low Power AI)

► Show Figures

Figure 1

12 pages, 3153 KiB

Open AccessArticle

Hardware/Software Solution for Low Power Evaluation of Tsunami Danger

by Mikhail Lavrentiev, Konstantin Lysakov, Andrey Marchuk, Konstantin Oblaukhov and Mikhail Shadrin

J. Low Power Electron. Appl. 2022, 12(1), 6; https://0-doi-org.brum.beds.ac.uk/10.3390/jlpea12010006 - 21 Jan 2022

Cited by 2 | Viewed by 2530

Abstract

Carbon footprint reduction issues have been drawing more and more attention these days. Reducing the energy consumption is among the basic directions along this line. In the paper, a low-energy approach to tsunami danger evaluation is concerned. After several disaster tsunamis of the [...] Read more.

Carbon footprint reduction issues have been drawing more and more attention these days. Reducing the energy consumption is among the basic directions along this line. In the paper, a low-energy approach to tsunami danger evaluation is concerned. After several disaster tsunamis of the XXIst century, the question arises whether is it possible to evaluate in a couple of minutes the tsunami wave parameters, expected at the particular geo location. The point is that it takes around 20 min for the wave to approach the nearest coast after a seismic event offshore of Japan. Currently, the main tool for studying tsunamis is computer modeling. In particular, the expected tsunami height near the coastline, when a major underwater earthquake is detected, can be estimated by a series of numerical experiments of various scenarios of generation and the following wave propagation. Reducing the calculation time of such scenarios and the necessary energy consumption for this is the scope of this study. Moreover, in case of the major earthquake, the electric power shutdown is possible (e.g., the accident at the Fukushima nuclear power station in Japan on 11 May 2011), so the solution should be of low energy-consuming, preferably based at regular personal computers (PCs) or laptops. The way to achieve the requested performance of numerical modeling at the PC platform is a combination of efficient algorithms and their hardware acceleration. Following this strategy, a solution for the fast numerical simulation of tsunami wave propagation has been proposed. Most of tsunami researchers use the shallow-water approximation to simulate tsunami wave propagation at deep water areas. For software implementation, the MacCormack finite-difference scheme has been chosen, as it is suitable for pipelining. For hardware code acceleration, a special processor, that is, the calculator, has been designed at a field-programmable gate array (FPGA) platform. This combination was tested in terms of precision by comparison with the reference code and with the exact solutions (known for some special cases of the bottom profile). The achieved performance made it possible to calculate the wave propagation over a 1000 × 500 km water area in 1 min (the mesh size was compared to 250 m). It was nearly 300 times faster compared to that of a regular PC and 10 times faster compared to the use of a central processing unit (CPU). This result, being implemented into tsunami warning systems, will make it possible to reduce human casualties and economy losses for the so-called near-field tsunamis. The presented paper discussed the new aspect of such implementation, namely low energy consumption. The corresponding measurements for three platforms (PC and two types of FPGA) have been performed, and a comparison of the obtained results of energy consumption was given. As the numerical simulation of numerous tsunami propagation scenarios from different sources are needed for the purpose of coastal tsunami zoning, the integrated amount of the saving energy is expected to be really valuable. For the time being, tsunami researchers have not used the FPGA-based acceleration of computer code execution. Perhaps, the energy-saving aspect is able to promote the use of FPGAs in tsunami researches. The approach to designing special FPGA-based processors for the fast solution of various engineering problems using a PC could be extended to other areas, such as bioinformatics (motif search in DNA sequences and other algorithms of genome analysis and molecular dynamics) and seismic data processing (three-dimensional (3D) wave package decomposition, data compression, noise suppression, etc.). Full article

(This article belongs to the Special Issue Low Power AI)

► Show Figures

Figure 1

12 pages, 561 KiB

Open AccessArticle

CORDIC Hardware Acceleration Using DMA-Based ISA Extension

by Erez Manor, Avrech Ben-David and Shlomo Greenberg

J. Low Power Electron. Appl. 2022, 12(1), 4; https://0-doi-org.brum.beds.ac.uk/10.3390/jlpea12010004 - 15 Jan 2022

Cited by 6 | Viewed by 3971

Abstract

The use of RISC-based embedded processors aimed at low cost and low power is becoming an increasingly popular ecosystem for both hardware and software development. High-performance yet low-power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architecture [...] Read more.

The use of RISC-based embedded processors aimed at low cost and low power is becoming an increasingly popular ecosystem for both hardware and software development. High-performance yet low-power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architecture (ISA) extension. Recent publications of AI have demonstrated the use of Coordinate Rotation Digital Computer (CORDIC) as a dedicated low-power solution for solving nonlinear equations applied to Neural Networks (NN). This paper proposes ISA extension to support floating-point CORDIC, providing efficient hardware acceleration for mathematical functions. A new DMA-based ISA extension approach integrated with a pipeline CORDIC accelerator is proposed. The CORDIC ISA extension is directly interfaced with a standard processor data path, allowing efficient implementation of new trigonometric ALU-based custom instructions. The proposed DMA-based CORDIC accelerator can also be used to perform repeated array calculations, offering a significant speedup over software implementations. The proposed accelerator is evaluated on Intel Cyclone-IV FPGA as an extension to Nios processor. Experimental results show a significant speedup of over three orders of magnitude compared with software implementation, while applied to trigonometric arrays, and outperforms the existing commercial CORDIC hardware accelerator. Full article

(This article belongs to the Special Issue Low Power AI)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Low Power AI

Share This Special Issue

Special Issue Editor

Special Issue Information

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI