Research

9 pages, 5927 KiB

Open AccessArticle

Embedded GPU Implementation for High-Performance Ultrasound Imaging

by Stefano Rossi and Enrico Boni

Electronics 2021, 10(8), 884; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics10080884 - 08 Apr 2021

Cited by 1 | Viewed by 2634

Methods of increasing complexity are currently being proposed for ultrasound (US) echographic signal processing. Graphics Processing Unit (GPU) resources allowing massive exploitation of parallel computing are ideal candidates for these tasks. Many high-performance US instruments, including open scanners like ULA-OP 256, have an [...] Read more.

Methods of increasing complexity are currently being proposed for ultrasound (US) echographic signal processing. Graphics Processing Unit (GPU) resources allowing massive exploitation of parallel computing are ideal candidates for these tasks. Many high-performance US instruments, including open scanners like ULA-OP 256, have an architecture based only on Field-Programmable Gate Arrays (FPGAs) and/or Digital Signal Processors (DSPs). This paper proposes the implementation of the embedded NVIDIA Jetson Xavier AGX module on board ULA-OP 256. The system architecture was revised to allow the introduction of a new Peripheral Component Interconnect Express (PCIe) communication channel, while maintaining backward compatibility with all other embedded computing resources already on board. Moreover, the Input/Output (I/O) peripherals of the module make the ultrasound system independent, freeing the user from the need to use an external controlling PC. Full article

(This article belongs to the Special Issue Advanced Embedded HW/SW Development)

► Show Figures

Figure 1

21 pages, 4612 KiB

Open AccessFeature PaperArticle

Customizable Vector Acceleration in Extreme-Edge Computing: A RISC-V Software/Hardware Architecture Study on VGG-16 Implementation

by Stefano Sordillo, Abdallah Cheikh, Antonio Mastrandrea, Francesco Menichelli and Mauro Olivieri

Electronics 2021, 10(4), 518; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics10040518 - 23 Feb 2021

Cited by 6 | Viewed by 3060

Abstract

Computing in the cloud-edge continuum, as opposed to cloud computing, relies on high performance processing on the extreme edge of the Internet of Things (IoT) hierarchy. Hardware acceleration is a mandatory solution to achieve the performance requirements, yet it can be tightly tied [...] Read more.

Computing in the cloud-edge continuum, as opposed to cloud computing, relies on high performance processing on the extreme edge of the Internet of Things (IoT) hierarchy. Hardware acceleration is a mandatory solution to achieve the performance requirements, yet it can be tightly tied to particular computation kernels, even within the same application. Vector-oriented hardware acceleration has gained renewed interest to support artificial intelligence (AI) applications like convolutional networks or classification algorithms. We present a comprehensive investigation of the performance and power efficiency achievable by configurable vector acceleration subsystems, obtaining evidence of both the high potential of the proposed microarchitecture and the advantage of hardware customization in total transparency to the software program. Full article

(This article belongs to the Special Issue Advanced Embedded HW/SW Development)

► Show Figures

Figure 1

12 pages, 598 KiB

Open AccessArticle

Algorithmic-Level Approximate Tensorial SVM Using High-Level Synthesis on FPGA

by Hamoud Younes, Ali Ibrahim, Mostafa Rizk and Maurizio Valle

Electronics 2021, 10(2), 205; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics10020205 - 17 Jan 2021

Cited by 13 | Viewed by 3120

Abstract

Approximate Computing Techniques (ACT) are promising solutions towards the achievement of reduced energy, time latency and hardware size for embedded implementations of machine learning algorithms. In this paper, we present the first FPGA implementation of an approximate tensorial Support Vector Machine (SVM) classifier [...] Read more.

Approximate Computing Techniques (ACT) are promising solutions towards the achievement of reduced energy, time latency and hardware size for embedded implementations of machine learning algorithms. In this paper, we present the first FPGA implementation of an approximate tensorial Support Vector Machine (SVM) classifier with algorithmic level ACTs using High-Level Synthesis (HLS). A touch modality classification framework was adopted to validate the effectiveness of the proposed implementation. When compared to exact implementation presented in the state-of-the-art, the proposed implementation achieves a reduction in power consumption by up to 49% with a speedup of 3.2×. Moreover, the hardware resources are reduced by 40% while consuming 82% less energy in classifying an input touch with an accuracy loss less than 5%. Full article

(This article belongs to the Special Issue Advanced Embedded HW/SW Development)

► Show Figures

Figure 1

29 pages, 474 KiB

Open AccessArticle

Singular Value Decomposition in Embedded Systems Based on ARM Cortex-M Architecture

by Michele Alessandrini, Giorgio Biagetti, Paolo Crippa, Laura Falaschetti, Lorenzo Manoni and Claudio Turchetti

Electronics 2021, 10(1), 34; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics10010034 - 28 Dec 2020

Cited by 8 | Viewed by 4092

Abstract

Singular value decomposition (SVD) is a central mathematical tool for several emerging applications in embedded systems, such as multiple-input multiple-output (MIMO) systems, data analytics, sparse representation of signals. Since SVD algorithms reduce to solve an eigenvalue problem, that is computationally expensive, both specific [...] Read more.

Singular value decomposition (SVD) is a central mathematical tool for several emerging applications in embedded systems, such as multiple-input multiple-output (MIMO) systems, data analytics, sparse representation of signals. Since SVD algorithms reduce to solve an eigenvalue problem, that is computationally expensive, both specific hardware solutions and parallel implementations have been proposed to overcome this bottleneck. However, as those solutions require additional hardware resources that are not in general available in embedded systems, optimized algorithms are demanded in this context. The aim of this paper is to present an efficient implementation of the SVD algorithm on ARM Cortex-M. To this end, we proceed to (i) present a comprehensive treatment of the most common algorithms for SVD, providing a fairly complete and deep overview of these algorithms, with a common notation, (ii) implement them on an ARM Cortex-M4F microcontroller, in order to develop a library suitable for embedded systems without an operating system, (iii) find, through a comparative study of the proposed SVD algorithms, the best implementation suitable for a low-resource bare-metal embedded system, (iv) show a practical application to Kalman filtering of an inertial measurement unit (IMU), as an example of how SVD can improve the accuracy of existing algorithms and of its usefulness on a such low-resources system. All these contributions can be used as guidelines for embedded system designers. Regarding the second point, the chosen algorithms have been implemented on ARM Cortex-M4F microcontrollers with very limited hardware resources with respect to more advanced CPUs. Several experiments have been conducted to select which algorithms guarantee the best performance in terms of speed, accuracy and energy consumption. Full article

(This article belongs to the Special Issue Advanced Embedded HW/SW Development)

► Show Figures

Figure 1

25 pages, 2751 KiB

Open AccessArticle

Automatic Method for Distinguishing Hardware and Software Faults Based on Software Execution Data and Hardware Performance Counters

by Jihyun Park and Byoungju Choi

Electronics 2020, 9(11), 1815; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics9111815 - 02 Nov 2020

Cited by 1 | Viewed by 2330

Abstract

Debugging in an embedded system where hardware and software are tightly coupled and have restricted resources is far from trivial. When hardware defects appear as if they were software defects, determining the real source becomes challenging. In this study, we propose an automated [...] Read more.

Debugging in an embedded system where hardware and software are tightly coupled and have restricted resources is far from trivial. When hardware defects appear as if they were software defects, determining the real source becomes challenging. In this study, we propose an automated method of distinguishing whether a defect originates from the hardware or software at the stage of integration testing of hardware and software. Our method overcomes the limitations of the embedded environment, minimizes the effects on runtime, and identifies defects by obtaining and analyzing software execution data and hardware performance counters. We analyze the effects of the proposed method through an empirical study. The experimental results reveal that our method can effectively distinguish defects. Full article

(This article belongs to the Special Issue Advanced Embedded HW/SW Development)

► Show Figures

Figure 1

21 pages, 878 KiB

Open AccessArticle

Sw/Hw Partitioning and Scheduling on Region-Based Dynamic Partial Reconfigurable System-on-Chip

by Qi Tang, Biao Guo and Zhe Wang

Electronics 2020, 9(9), 1362; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics9091362 - 21 Aug 2020

Cited by 2 | Viewed by 2213

Abstract

A heterogeneous system-on-chip (SoC) integrates multiple types of processors on the same chip. It has great advantages in many aspects, such as processing capacity, size, weight, cost, power, and energy consumption, which result in it being widely adopted in many fields. The SoC [...] Read more.

A heterogeneous system-on-chip (SoC) integrates multiple types of processors on the same chip. It has great advantages in many aspects, such as processing capacity, size, weight, cost, power, and energy consumption, which result in it being widely adopted in many fields. The SoC based on region-based dynamic partial reconfigurable (DPR) FPGA plays an important role in the SoC field. However, delivering its powerful capacity to the consumer depends on the efficient Sw/Hw partitioning and scheduling technology that determines the resource volume of the DPR region, the mapping of the application to the DPR region and other processors, and the schedule of the task and its reconfiguration. This paper first proposes an exact approach based on the mixed integer linear programming (MILP) for the Sw/Hw partitioning and scheduling problem. The proposed MILP is able to solve the problem optimally; however, its scalability is poor, despite that we carefully designed its formulation and tried to make it as concise as possible. Therefore, a multi-step hybrid method that combines graph partitioning and MILP is proposed, which is able to reduce the time complexity significantly with the solution quality being degraded marginally. A set of experiments is carried out using a set of real-life applications, and the result demonstrates the effectiveness of the proposed methods. Full article

(This article belongs to the Special Issue Advanced Embedded HW/SW Development)

► Show Figures

Figure 1

17 pages, 2780 KiB

Open AccessArticle

The L3Pilot Data Management Toolchain for a Level 3 Vehicle Automation Pilot

by Johannes Hiller, Sami Koskinen, Riccardo Berta, Nisrine Osman, Ben Nagy, Francesco Bellotti, Ashfaqur Rahman, Erik Svanberg, Hendrik Weber, Eduardo H. Arnold, Mehrdad Dianati and Alessandro De Gloria

Electronics 2020, 9(5), 809; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics9050809 - 15 May 2020

Cited by 9 | Viewed by 3963

Abstract

As industrial research in automated driving is rapidly advancing, it is of paramount importance to analyze field data from extensive road tests. This paper investigates the design and development of a toolchain to process and manage experimental data to answer a set of [...] Read more.

As industrial research in automated driving is rapidly advancing, it is of paramount importance to analyze field data from extensive road tests. This paper investigates the design and development of a toolchain to process and manage experimental data to answer a set of research questions about the evaluation of automated driving functions at various levels, from technical system functioning to overall impact assessment. We have faced this challenge in L3Pilot, the first comprehensive test of automated driving functions (ADFs) on public roads in Europe. L3Pilot is testing ADFs in vehicles made by 13 companies. The tested functions are mainly of Society of Automotive Engineers (SAE) automation level 3, some of them of level 4. In this context, the presented toolchain supports various confidentiality levels, and allows cross-vehicle owner seamless data management, with the efficient storage of data and their iterative processing with a variety of analysis and evaluation tools. Most of the toolchain modules have been developed to a prototype version in a desktop/cloud environment, exploiting state-of-the-art technology. This has allowed us to efficiently set up what could become a comprehensive edge-to-cloud reference architecture for managing data in automated vehicle tests. The project has been released as open source, the data format into which all vehicular signals, recorded in proprietary formats, were converted, in order to support efficient processing through multiple tools, scalability and data quality checking. We expect that this format should enhance research on automated driving testing, as it provides a shared framework for dealing with data from collection to analysis. We are confident that this format, and the information provided in this article, can represent a reference for the design of future architectures to implement in vehicles. Full article

(This article belongs to the Special Issue Advanced Embedded HW/SW Development)

► Show Figures

Figure 1

21 pages, 6127 KiB

Open AccessFeature PaperArticle

Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search

by Daniele Jahier Pagliari, Francesco Daghero and Massimo Poncino

Electronics 2020, 9(2), 337; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics9020337 - 15 Feb 2020

Cited by 4 | Viewed by 3340

Abstract

Sequence-to-sequence deep neural networks have become the state of the art for a variety of machine learning applications, ranging from neural machine translation (NMT) to speech recognition. Many mobile and Internet of Things (IoT) applications would benefit from the ability of performing sequence-to-sequence [...] Read more.

Sequence-to-sequence deep neural networks have become the state of the art for a variety of machine learning applications, ranging from neural machine translation (NMT) to speech recognition. Many mobile and Internet of Things (IoT) applications would benefit from the ability of performing sequence-to-sequence inference directly in embedded devices, thereby reducing the amount of raw data transmitted to the cloud, and obtaining benefits in terms of response latency, energy consumption and security. However, due to the high computational complexity of these models, specific optimization techniques are needed to achieve acceptable performance and energy consumption on single-core embedded processors. In this paper, we present a new optimization technique called dynamic beam search, in which the inference complexity is tuned to the difficulty of the processed input sequence at runtime. Results based on measurements on a real embedded device, and on three state-of-the-art deep learning models, show that our method is able to reduce the inference time and energy by up to 25% without loss of accuracy. Full article

(This article belongs to the Special Issue Advanced Embedded HW/SW Development)

► Show Figures

Figure 1

20 pages, 20221 KiB

Open AccessFeature PaperArticle

Open Vision System for Low-Cost Robotics Education

by Julio Vega and José M. Cañas

Electronics 2019, 8(11), 1295; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics8111295 - 06 Nov 2019

Cited by 13 | Viewed by 4602

Abstract

Vision devices are currently one of the most widely used sensory elements in robots: commercial autonomous cars and vacuum cleaners, for example, have cameras. These vision devices can provide a great amount of information about robot surroundings. However, platforms for robotics education usually [...] Read more.

Vision devices are currently one of the most widely used sensory elements in robots: commercial autonomous cars and vacuum cleaners, for example, have cameras. These vision devices can provide a great amount of information about robot surroundings. However, platforms for robotics education usually lack such devices, mainly because of the computing limitations of low cost processors. New educational platforms using Raspberry Pi are able to overcome this limitation while keeping costs low, but extracting information from the raw images is complex for children. This paper presents an open source vision system that simplifies the use of cameras in robotics education. It includes functions for the visual detection of complex objects and a visual memory that computes obstacle distances beyond the small field of view of regular cameras. The system was experimentally validated using the PiCam camera mounted on a pan unit on a Raspberry Pi-based robot. The performance and accuracy of the proposed vision system was studied and then used to solve two visual educational exercises: safe visual navigation with obstacle avoidance and person-following behavior. Full article

(This article belongs to the Special Issue Advanced Embedded HW/SW Development)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Advanced Embedded HW/SW Development

Share This Special Issue

Special Issue Editor

Special Issue Information

Published Papers (9 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI