Hardware-Friendly Machine Learning and Its Applications

A special issue of Micromachines (ISSN 2072-666X). This special issue belongs to the section "E:Engineering and Technology".

Deadline for manuscript submissions: closed (30 November 2022) | Viewed by 16948

Special Issue Editor


E-Mail Website
Guest Editor
School of Computing, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
Interests: computer architecture; emerging technologies; machine learning; hardware security; neuromorphic computing; bioinformatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Machine learning algorithms, such as those for image object detection, object recognition, multicategory classification, and scene analysis, have shown impressive performance and success in recent decades in various applications, achieving close to human-level perception rates. However, their computational complexity still challenges the state-of-the-art computing platforms, especially when the application of interest is tightly constrained by the requirements of low-power, high-throughput, real-time response, etc. In recent years, there have been enormous advances in implementing machine learning algorithms with application-specific hardware. There is a timely need to map the latest learning algorithms to physical hardware to achieve huge improvements in performance, energy efficiency, and compactness. Recent progress in computational neurosciences and nanoelectronic technology will further help to shed light on future hardware–software platforms for efficient machine learning. This Special Issue aims to explore the potential of efficient machine learning, reveal emerging algorithms and design needs, and promote novel applications. It will also collect contributions on the advancement of methodologies and technologies for the design, evaluation, and optimization of software, hardware, and emerging applications representing the current solution to support the diverse computing scenarios in which machine learning is exploited. 

Topics of interest include, but are not limited to, the following:

  • New microarchitecture designs of hardware accelerators for ML;
  • Sparse learning, feature extraction, and personalization;
  • Deep learning with high speed and high power efficiency;
  • Computing models and hardware architecture co-design for machine learning;
  • New microarchitecture designs of hardware accelerators using emerging devices;
  • Tools for the modeling, simulation, and synthesis of hardware accelerators
  • ML acceleration for edge computing and IoT.

Dr. Arman Roohi
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Micromachines is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • design methodology
  • co-design
  • framework
  • computing methodologies
  • hardware accelerators
  • DNN compression
  • DNN quantization
  • edge AI

Related Special Issue

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 5408 KiB  
Article
Hybrid Compression Optimization Based Rapid Detection Method for Non-Coal Conveying Foreign Objects
by Mengchao Zhang, Yanbo Yue, Kai Jiang, Meixuan Li, Yuan Zhang and Manshan Zhou
Micromachines 2022, 13(12), 2085; https://0-doi-org.brum.beds.ac.uk/10.3390/mi13122085 - 26 Nov 2022
Cited by 1 | Viewed by 1218
Abstract
The existence of conveyor foreign objects poses a serious threat to the service life of conveyor belts, which will cause abnormal damage or even tearing, so fast and effective detection of conveyor foreign objects is of great significance to ensure the safe and [...] Read more.
The existence of conveyor foreign objects poses a serious threat to the service life of conveyor belts, which will cause abnormal damage or even tearing, so fast and effective detection of conveyor foreign objects is of great significance to ensure the safe and efficient operation of belt conveyors. Considering the need for the foreign object detection algorithm to operate in edge computing devices, this paper proposes a hybrid compression method that integrates network sparse, structured pruning, and knowledge distillation to compress the network parameters and calculations. Combined with a Yolov5 network for practice, three structured pruning strategies are specifically proposed, all of which are proven to have achieved a good compression effect. The experiment results show that under the pruning rate of 0.9, the proposed three pruning strategies can achieve more than 95% compression for network parameters, more than 90% compression for the computation, and more than 90% compression for the size of the network model, and the optimized network is able to accelerate inference on both Central Processing Unit (CPU) and Graphic Processing Unit (GPU) hardware platforms, with a maximum speedup of 70.3% on the GPU platform and 157.5% on the CPU platform, providing an excellent real-time performance but also causing a large accuracy loss. In contrast, the proposed method balances better real-time performance and detection accuracy (>88.2%) when the pruning rate is at 0.6~0.9. Further, to avoid the influence of motion blur, a method of introducing prior knowledge is proposed to improve the resistance of the network, thus strongly ensuring the detection effect. All the technical solutions proposed are of great significance in promoting the intelligent development of coal mine equipment, ensuring the safe and efficient operation of belt conveyors, and promoting sustainable development. Full article
(This article belongs to the Special Issue Hardware-Friendly Machine Learning and Its Applications)
Show Figures

Figure 1

28 pages, 5544 KiB  
Article
Reinforcement Learning Made Affordable for Hardware Verification Engineers
by Alexandru Dinu and Petre Lucian Ogrutan
Micromachines 2022, 13(11), 1887; https://0-doi-org.brum.beds.ac.uk/10.3390/mi13111887 - 01 Nov 2022
Cited by 1 | Viewed by 1560
Abstract
Constrained random stimulus generation is no longer sufficient to fully simulate the functionality of a digital design. The increasing complexity of today’s hardware devices must be supported by powerful development and simulation environments, powerful computational mechanisms, and appropriate software to exploit them. Reinforcement [...] Read more.
Constrained random stimulus generation is no longer sufficient to fully simulate the functionality of a digital design. The increasing complexity of today’s hardware devices must be supported by powerful development and simulation environments, powerful computational mechanisms, and appropriate software to exploit them. Reinforcement learning, a powerful technique belonging to the field of artificial intelligence, provides the means to efficiently exploit computational resources to find even the least obvious correlations between configuration parameters, stimuli applied to digital design inputs, and their functional states. This paper, in which a novel software system is used to simplify the analysis of simulation outputs and the generation of input stimuli through reinforcement learning methods, provides important details about the setup of the proposed method to automate the verification process. By understanding how to properly configure a reinforcement algorithm to fit the specifics of a digital design, verification engineers can more quickly adopt this automated and efficient stimulus generation method (compared with classical verification) to bring the digital design to a desired functional state. The results obtained are most promising, with even 52 times fewer steps needed to reach a target state using reinforcement learning than when constrained random stimulus generation was used. Full article
(This article belongs to the Special Issue Hardware-Friendly Machine Learning and Its Applications)
Show Figures

Figure 1

14 pages, 979 KiB  
Article
Autonomous Binarized Focal Loss Enhanced Model Compression Design Using Tensor Train Decomposition
by Mingshuo Liu, Shiyi Luo, Kevin Han, Ronald F. DeMara and Yu Bai
Micromachines 2022, 13(10), 1738; https://0-doi-org.brum.beds.ac.uk/10.3390/mi13101738 - 14 Oct 2022
Viewed by 1327
Abstract
Deep learning methods have exhibited the great capacity to process object detection tasks, offering a practical and viable approach in many applications. When researchers have advanced deep learning models to improve their performance, the model derived from the algorithmic improvement may itself require [...] Read more.
Deep learning methods have exhibited the great capacity to process object detection tasks, offering a practical and viable approach in many applications. When researchers have advanced deep learning models to improve their performance, the model derived from the algorithmic improvement may itself require complementary increases in computational and power demands. Recently, model compression and pruning techniques have received more attention to promote the wide employment of the DNN model. Although these techniques have achieved a remarkable performance, the class imbalance issue during the mode compression process does not vanish. This paper exploits the Autonomous Binarized Focal Loss Enhanced Model Compression (ABFLMC) model to address the issue. Additionally, our proposed ABFLMC can automatically receive the dynamic difficulty term during the training process to improve performance and reduce complexity. A novel hardware architecture is proposed to accelerate inference. Our experimental results show that the ABFLMC can achieve higher accuracy, faster speed, and smaller model size. Full article
(This article belongs to the Special Issue Hardware-Friendly Machine Learning and Its Applications)
Show Figures

Figure 1

12 pages, 6298 KiB  
Article
Enabling Intelligent IoTs for Histopathology Image Analysis Using Convolutional Neural Networks
by Mohammed H. Alali, Arman Roohi, Shaahin Angizi and Jitender S. Deogun
Micromachines 2022, 13(8), 1364; https://0-doi-org.brum.beds.ac.uk/10.3390/mi13081364 - 22 Aug 2022
Cited by 2 | Viewed by 1532
Abstract
Medical imaging is an essential data source that has been leveraged worldwide in healthcare systems. In pathology, histopathology images are used for cancer diagnosis, whereas these images are very complex and their analyses by pathologists require large amounts of time and effort. On [...] Read more.
Medical imaging is an essential data source that has been leveraged worldwide in healthcare systems. In pathology, histopathology images are used for cancer diagnosis, whereas these images are very complex and their analyses by pathologists require large amounts of time and effort. On the other hand, although convolutional neural networks (CNNs) have produced near-human results in image processing tasks, their processing time is becoming longer and they need higher computational power. In this paper, we implement a quantized ResNet model on two histopathology image datasets to optimize the inference power consumption. We analyze classification accuracy, energy estimation, and hardware utilization metrics to evaluate our method. First, the original RGB-colored images are utilized for the training phase, and then compression methods such as channel reduction and sparsity are applied. Our results show an accuracy increase of 6% from RGB on 32-bit (baseline) to the optimized representation of sparsity on RGB with a lower bit-width, i.e., <8:8>. For energy estimation on the used CNN model, we found that the energy used in RGB color mode with 32-bit is considerably higher than the other lower bit-width and compressed color modes. Moreover, we show that lower bit-width implementations yield higher resource utilization and a lower memory bottleneck ratio. This work is suitable for inference on energy-limited devices, which are increasingly being used in the Internet of Things (IoT) systems that facilitate healthcare systems. Full article
(This article belongs to the Special Issue Hardware-Friendly Machine Learning and Its Applications)
Show Figures

Figure 1

13 pages, 3289 KiB  
Article
RLC Circuit Forecast in Analog IC Packaging and Testing by Machine Learning Techniques
by Jung-Pin Lai, Ying-Lei Lin, Ho-Chuan Lin, Chih-Yuan Shih, Yu-Po Wang and Ping-Feng Pai
Micromachines 2022, 13(8), 1305; https://0-doi-org.brum.beds.ac.uk/10.3390/mi13081305 - 12 Aug 2022
Cited by 2 | Viewed by 1756
Abstract
For electronic products, printed circuit boards are employed to fix integrated circuits (ICs) and connect all ICs and electronic components. This allows for the smooth transmission of electronic signals among electronic components. Machine learning (ML) techniques are popular and employed in various fields. [...] Read more.
For electronic products, printed circuit boards are employed to fix integrated circuits (ICs) and connect all ICs and electronic components. This allows for the smooth transmission of electronic signals among electronic components. Machine learning (ML) techniques are popular and employed in various fields. To capture the nonlinear data patterns and input–output electrical relationships of analog circuits, this study aims to employ ML techniques to improve operations from modeling to testing in the analog IC packaging and testing industry. The simulation calculation of the resistance, inductance, and capacitance of the pin count corresponding to the target electrical specification is a complex process. Tasks include converting a two-dimensional circuit into a three-dimensional one in simulation and modeling-buried structure operations. In this study, circuit datasets are employed for training the ML model to predict resistance (R), inductance (L), and capacitance (C). The least squares support vector regression (LSSVR) with Genetic Algorithms (GA) (LSSVR-GA) serves as an ML model for forecasting RLC values. Genetic algorithms are used to select parameters of LSSVR models. To demonstrate the performance of LSSVR models in forecasting RLC values, three other ML models with genetic algorithms, including backpropagation neural networks (BPNN-GA), random forest (RF-GA), and eXtreme gradient boosting (XGBoost-GA), were employed to cope with the same data. Numerical results illustrated that the LSSVR-GA outperformed the three other forecasting models by around 14.84% averagely in terms of mean absolute percentage error (MAPE), weighted absolute percent error measure (WAPE), and normalized mean absolute error (NMAE). This study collected data from an IC packaging and testing firm in Taiwan. The innovation and advantage of the proposed method is using a machine approach to forecast RLC values instead of through simulation ways, which generates accurate results. Numerical results revealed that the developed ML model is effective and efficient in RLC circuit forecasting for the analog IC packaging and testing industry. Full article
(This article belongs to the Special Issue Hardware-Friendly Machine Learning and Its Applications)
Show Figures

Figure 1

11 pages, 1488 KiB  
Article
Using Algorithmic Transformations and Sensitivity Analysis to Unleash Approximations in CNNs at the Edge
by Flavio Ponzina, Giovanni Ansaloni, Miguel Peón-Quirós and David Atienza
Micromachines 2022, 13(7), 1143; https://0-doi-org.brum.beds.ac.uk/10.3390/mi13071143 - 19 Jul 2022
Cited by 1 | Viewed by 1131
Abstract
Previous studies have demonstrated that, up to a certain degree, Convolutional Neural Networks (CNNs) can tolerate arithmetic approximations. Nonetheless, perturbations must be applied judiciously, to constrain their impact on accuracy. This is a challenging task, since the implementation of inexact operators is often [...] Read more.
Previous studies have demonstrated that, up to a certain degree, Convolutional Neural Networks (CNNs) can tolerate arithmetic approximations. Nonetheless, perturbations must be applied judiciously, to constrain their impact on accuracy. This is a challenging task, since the implementation of inexact operators is often decided at design time, when the application and its robustness profile are unknown, posing the risk of over-constraining or over-provisioning the hardware. Bridging this gap, we propose a two-phase strategy. Our framework first optimizes the target CNN model, reducing the bitwidth of weights and activations and enhancing error resiliency, so that inexact operations can be performed as frequently as possible. Then, it selectively assigns CNN layers to exact or inexact hardware based on a sensitivity metric. Our results show that, within a 5% accuracy degradation, our methodology, including a highly inexact multiplier design, can reduce the cost of MAC operations in CNN inference up to 83.6% compared to state-of-the-art optimized exact implementations. Full article
(This article belongs to the Special Issue Hardware-Friendly Machine Learning and Its Applications)
Show Figures

Figure 1

12 pages, 2941 KiB  
Article
Support Vector Machine–Based Model for 2.5–5.2 GHz CMOS Power Amplifier
by Shaohua Zhou, Cheng Yang and Jian Wang
Micromachines 2022, 13(7), 1012; https://0-doi-org.brum.beds.ac.uk/10.3390/mi13071012 - 27 Jun 2022
Cited by 5 | Viewed by 1188
Abstract
A power amplifier (PA) is the core module of the wireless communication system. The change of its specification directly affects the system’s performance and may even lead to system failure. Furthermore, change in the PA specification is closely related to changes in temperature. [...] Read more.
A power amplifier (PA) is the core module of the wireless communication system. The change of its specification directly affects the system’s performance and may even lead to system failure. Furthermore, change in the PA specification is closely related to changes in temperature. To study the influence of PA specification change on the system, we used a support vector machine (SVM) to model the temperature characteristics of PA. For SVM modeling, the question of how much experimental data should be used for modeling to meet the requirements is a constant problem. To address this issue, we investigate the effect of different amounts of training data on the modeling of SVM models. The results show that only 75% of the experimental data needs to be used in the modeling process to satisfy the modeling requirements of the SVM model. The number of measurement points required in the PA specification degradation experiment can be reduced by 25%. The results of this paper serve as a guide for planning the number of experimental measurement points and reducing the measurement cost and measurement time. Full article
(This article belongs to the Special Issue Hardware-Friendly Machine Learning and Its Applications)
Show Figures

Figure 1

10 pages, 1822 KiB  
Article
Motor Imagery EEG Classification Based on Transfer Learning and Multi-Scale Convolution Network
by Zhanyuan Chang, Congcong Zhang and Chuanjiang Li
Micromachines 2022, 13(6), 927; https://0-doi-org.brum.beds.ac.uk/10.3390/mi13060927 - 10 Jun 2022
Cited by 8 | Viewed by 2017
Abstract
For the successful application of brain-computer interface (BCI) systems, accurate recognition of electroencephalography (EEG) signals is one of the core issues. To solve the differences in individual EEG signals and the problem of less EEG data in classification and recognition, an attention mechanism-based [...] Read more.
For the successful application of brain-computer interface (BCI) systems, accurate recognition of electroencephalography (EEG) signals is one of the core issues. To solve the differences in individual EEG signals and the problem of less EEG data in classification and recognition, an attention mechanism-based multi-scale convolution network was designed; the transfer learning data alignment algorithm was then introduced to explore the application of transfer learning for analyzing motor imagery EEG signals. The data set 2a of BCI Competition IV was used to verify the designed dual channel attention module migration alignment with convolution neural network (MS-AFM). Experimental results showed that the classification recognition rate improved with the addition of the alignment algorithm and adaptive adjustment in transfer learning; the average classification recognition rate of nine subjects was 86.03%. Full article
(This article belongs to the Special Issue Hardware-Friendly Machine Learning and Its Applications)
Show Figures

Figure 1

13 pages, 2355 KiB  
Article
Modeling of Key Specifications for RF Amplifiers Using the Extreme Learning Machine
by Shaohua Zhou, Cheng Yang and Jian Wang
Micromachines 2022, 13(5), 693; https://0-doi-org.brum.beds.ac.uk/10.3390/mi13050693 - 28 Apr 2022
Cited by 5 | Viewed by 1310
Abstract
The amplifier is a key component of the radio frequency (RF) front-end, and its specifications directly determine the performance of the system in which it is located. Unfortunately, amplifiers’ specifications degrade with temperature and even lead to system failure. To study how the [...] Read more.
The amplifier is a key component of the radio frequency (RF) front-end, and its specifications directly determine the performance of the system in which it is located. Unfortunately, amplifiers’ specifications degrade with temperature and even lead to system failure. To study how the system failure is affected by the amplifier specification degradation, it is necessary to couple the amplifier specification degradation into the system optimization design. Furthermore, to couple the amplifier specification degradation into the optimal design of the system, it is necessary to model the characteristics of the amplifier specification change with temperature. In this paper, the temperature characteristics of two amplifiers are modeled using an extreme learning machine (ELM), and the results show that the model agrees well with the measurement results and can effectively reduce measurement time and cost. Full article
(This article belongs to the Special Issue Hardware-Friendly Machine Learning and Its Applications)
Show Figures

Graphical abstract

27 pages, 16043 KiB  
Article
A Heterogeneous Architecture for the Vision Processing Unit with a Hybrid Deep Neural Network Accelerator
by Peng Liu, Zikai Yang, Lin Kang and Jian Wang
Micromachines 2022, 13(2), 268; https://0-doi-org.brum.beds.ac.uk/10.3390/mi13020268 - 07 Feb 2022
Cited by 2 | Viewed by 1883
Abstract
The vision chip is widely used to acquire and process images. It connects the image sensor directly with the vision processing unit (VPU) to execute the vision tasks. Modern vision tasks mainly consist of image signal processing (ISP) algorithms and deep neural networks [...] Read more.
The vision chip is widely used to acquire and process images. It connects the image sensor directly with the vision processing unit (VPU) to execute the vision tasks. Modern vision tasks mainly consist of image signal processing (ISP) algorithms and deep neural networks (DNNs). However, the traditional VPUs are unsuitable for the DNNs, and the DNN processing units (DNPUs) cannot process the ISP algorithms. Meanwhile, only the CNNs and the CNN-RNN frameworks are used in the vision tasks, and few DNPUs are specifically designed for this. In this paper, we propose a heterogeneous architecture for the VPU with a hybrid accelerator for the DNNs. It can process the ISP, CNNs, and hybrid DNN subtasks on one unit. Furthermore, we present a sharing scheme to multiplex the hardware resources for different subtasks. We also adopt a pipelined workflow for the vision tasks to fully use the different processing modules and achieve a high processing speed. We implement the proposed VPU on the field-programmable gate array (FPGA), and several vision tasks are tested on it. The experiment results show that our design can process the vision tasks efficiently with an average performance of 22.6 giga operations per second/W (GOPS/W). Full article
(This article belongs to the Special Issue Hardware-Friendly Machine Learning and Its Applications)
Show Figures

Figure 1

Back to TopTop