1. Introduction
Internet of Things (IoT) are ubiquitous devices with limited functionality and computational resources, enabled with networking features and connectivity to the Internet. These devices have a longer life cycle, where the updates/changes are deployed through software or firmware updates. Software updates can be provided either manually or physically by using cables and programmer interfaces. Physical access to a device for firmware upgrade is always not possible and requires the configuration of the devices in the field through secure channels. The connected nature of IoTs makes them accessible remotely, and firmware updates for devices can be provided using Over the Air (OTA) firmware updates [
1,
2].
Software firmware updates change the functionality of the software; however, the scope is limited by the hardware capabilities and its architecture. Depending on the application and the expected life of a product, reconfigurable hardware architectures can improve the device updates and secure boot processes. Reconfigurability allows the manufacturer to update the hardware design while the machine is in the field. Using OTA, hardware updates allow for updating the device hardware configuration without the need for physically replacing it.
Over the air (OTA) updates are critical in the embedded system consumer domain such as the cellular phone and automotive industries. The requirements for availability and quality of service are high in safety-critical applications. For the vehicular domain, the requirement for availability is directly tied to safety. Furthermore, it is not feasible for a car owner to take the car to a service station whenever there is a software update available. Instead, firmware updates can be provided remotely with OTA updates that can be transferred via the cellular network directly by the manufacturer [
3].
Root of Trust is an anchor for implementing trust in a device [
4]. Maintaining Root of Trust (RoT) is crucial once a device has been deployed in an untrusted field. Tainted firmware updates can break the trust. The software domain employs various techniques for maintaining Root of Trust. Some popular examples include Universal Extensible Firmware Interface (UEFI) secure boot extensions [
5] and Microsoft Windows’ Secure Boot [
6]. The former maintains Root of Trust at the boot level and the latter extends it to the operating system (OS) level. We discuss a novel scheme for implementing Root of Trust using cryptographic processors, such as Trust Platform Modules (TPM).
Reconfigurable hardware has become pervasive in the Internet of Things domain, this is a requirement for extending the Root of Trust to the hardware. Current commercial FPGA vendors provide limited security to the programmable logic fabric and those security mechanisms are limited in application, as shown in
Section 2. Additionally, the provided methods have closed access which the users can use in their systems but cannot inspect themselves. There is a need for an open and reliable security structure for the programmable fabric that can be integrated into the device’s Root of Trust.
In this work, we present a framework to establish the Root of Trust for secure boot and OTA updates for reconfigurable hardware. Mutual trust is established between an FPGA device and the server or the content provider. A symmetric encryption key update mechanism is proposed to enable key provisioning for updates. The key update mechanism is tied to the integrity of the bitstream. If there is an unintended change in the integrity of the bitstream, the device will no longer able to successfully decrypt continuing updates from the server. The proposed scheme extends the integration of TPM with the FPGA boot process to assist in secure boot, key provisioning and secure communication. Additionally, we present schemes for runtime mitigation of malicious logic insertion. To the best of the authors’ knowledge, this is the first work that practically provides the integration between TPMs and FPGAs at the First Stage Boot Loader stage. The organization of the paper is as follows.
Section 2 discusses the relevant background studies.
Section 3 covers the proposed architecture and analysis of the methodologies.
Section 4 describes the implementation details and security analysis is discussed in
Section 5.
3. Threat Model for Secure Boot of FPGA Bitstreams
The FPGA market is dominated by proprietary tools, Intellectual Properties (IPs) and closed hardware implementation. There are a handful of vendors that provide varying architectures and interfaces to those architectures. The reconfigurable fabric on FPGAs is programmed using bitstreams. The bitstream configures the Look Up Tables (LUTs) in the logic fabric. These LUTs act as combinatorial logic and sequential data paths for the hardware design. Bitstreams also configure other fabric elements, e.g., on-chip memory, Digital Signal Processing (DSP), clocking blocks and wire connections. An attack on the bitstream can affect the entire system operation of a device on the field. This work focuses its efforts on providing bitstream security on the device and on providing security between a content provider and a device.
3.1. Bitstream Spoofing
Bitstream spoofing updates the victim device with an update that seem to come from an authorized source. Bitstream spoofing may compromise the security of the victim device using relay and replay attacks [
26,
27]. An adversary acts as a man in the middle between a bitstream content provider and a device. Once an authenticated session is set up between the two nodes, the adversary replaces the original bitstream with a malicious one. In the case where the victim device is using one single key for bitstream encryption, replay attacks can be used by an adversary. An attacker can forward an older copy of the bitstream which has limited functionality compared to the current version.
3.2. Runtime Malicious Modification
Once a bitstream has been programmed onto an FPGA programmable logic fabric, an FPGA device may provide interfaces to the outside world for readback and modification of the running bitstream [
28]. Using these interfaces, faults or trojans can be introduced in the design [
29]. Additionally, the same interfaces can be used to make unauthorized modifications to the original design. Our work focuses on mitigating malicious logic insertion in the bitstream during runtime.
3.3. Nonsecure Communication with Content Provider
For an FPGA device placed in the field, bitstream updates can be provided manually physically by an engineer, through a physical update mechanism or using remote updates over a network. If a content provider over a network is not secure, an adversary may spoof its identity to become a content provider. Therefore, an adversary may be able to push malicious updates to the client. On the other hand, an adversary can also impersonate a device on the field to download bitstream updates from a content provider not meant for it.
5. Results and Analysis
The proposed framework was implemented on a Xilinx Zedboard FPGA board equipped with a Zynq-7000 XC7Z020-CLG484 [
33]. The FPGA has an embedded ARM Cortex A9 hard processor. The FPGA is integrated with the Infineon TPM SLB9670, a secure coprocessor for the key management and secure boot processes [
34]. The processor is equipped with ARM’s TrustZone for Trusted Execution Environment (TEE). Additionally, the on-board QSPI memory is used for holding the SFSBL implementation and the bitstream package extracted from the bitstream update process. The experimental setup of the proposed framework is shown in
Figure 8.
The FPGA board communicates with the TPM via the Serial Peripheral Interface (SPI) over the dedicated MIO port. The proposed SFSBL implementation is the extension of the Xilinx provided First Stage Boot Loader (FSBL). The secure extensions are added to the existing FSBL that uses the custom device driver library written as part of this research.
The MIO ports are accessible through the secure world configured using ARM processor TrustZone [
35]. The total RAM on the Xilinx Zedboard is 512 MB. We configured upper 64 MB of the RAM space to be used by the secure world. This setting is controlled using the TZ_DDR_RAM register. Additionally, to limit access to the QSPI memory, the QSPI_S_APB bit in the SECURITY6_APB_SLAVES register is set to 0. Typically, for the configuration of peripherals in TrustZone, the value “0” signifies that a peripheral is set to be secure or only accessible from the secure world.
To incorporate the TPM with the FPGA board at the FSBL level, we implemented a device driver library. This library provides all necessary functions to set up the TPM and implements security functions on the TPM. To the best knowledge of the authors, this is the first library of its kind. The SPI interface accessible through the secure world implements the TPM transfer function driver to communicate with the TPM device. This communication is required at the FSBL level to perform secure boot, which is not supported until the second stage boot loader in the traditional design flow.
The device driver library is open source and is made available online for public use [
36]. The services provided by the library can be divided into two categories: device power-up services and cryptographic functions. TPM 2.0 architecture has five layers. These layers signify the boot stage for a target platform and are termed as localities. Each locality offers specific functionalities and implements privileges with restrictions of allowed functions, for example, the PCR registers are resources limited to specific localities. Our library implementation provides access to different secure boot specific functions at all localities. In the proposed framework implementation, the TPM is only accessible from the trusted secure world: the library exists in the scope of the secure world and is not accessible from the non-secure world.
Timing overhead for the proposed solution is dependent on the data rate of the SPI interface and the wait time for each operation. The data rate of the SPI interface is dependent on the host and the TPM device. The TPM2 specifications only specify a maximum timeout for a message transfer and timeout for primitive operations such as requesting a locality, checking the ownership of a locality, etc.
For each request sent to the TPM, the TPM can notify the host that it is busy using a wait state. The TPM can send a maximum of 100 wait states before a timeout can occur. In the case of a timeout, the host must send its request again.
Pseudocode of the TPM extend function driver implementation is given in Algorithm 1. The TPM extend computes the hash of the input stream in a sequence of fixed size blocks of 32 bytes. It enables block-by-block hash computation, which is the performance bottleneck for the bitstream processing. Algorithm 2 shows the optimized hash computation process, that supports data streaming with the block size 64 bytes. For the secure boot operation, the TPM structure TPM2_PCR_EXTEND reads data in chunks of 32 bytes to extend a PCR. The bitstream for the Xilinx Zedboard is 3.85 MB in size. Each TPM2_PCR_EXTEND operation takes an input of 32 bytes, and it takes a total of 126k hashing operations.
Algorithm 1: tpm_pcr_extend function |
Inputs: Locality (L), PCR Index I, Data Input D |
Output: PCR Based Hash (H) |
Buffer = [] If (!Current active locality is L) then Request locality L from TPM If (!Current active locality is L) then Raise Exception //Make TPM2_PCR_EXTEND Request Buffer += TPM2_PCR2_EXTEND Header with Locality (L), PCR Index(I) Buffer += DATA_LENGTH (32)
Buffer += D Send Buffer to TPM H = TPM2_PCR_READ(PCR=I) Return H |
Algorithm 2: ComputeHashLoc4 Function |
Inputs: Data Block (D), |
Output: Hash H |
If (!Current active locality is 4) then Request locality 4 from TPM If (!Current active locality is 4) then Raise Exception Send TPM_HASH_START Request to TPM For (I: each block of length 64 in D) If length(I) <64 then I += Padding Send TPM_HASH_DATA + I to TPM Send TPM_HASH_END Request to TPM H = TPM_READ_PCR(PCR=17) Return H |
To reduce the timing overhead for computing cumulative hash in real world applications, TPM 2.0 provides a separate locality, locality 4. It allows calling the three structures TPM_HASH_START, TPM_HASH_DATA and TPM_HASH_END. To compute the cumulative hash of the bitstream, firstly, the TPM_HASH_START structure is issued. It dictates the TPM to become ready to receive streaming data. Using the TPM_HASH_DATA structure, SFSBL streams the bitstream over to the TPM iteratively. Once the transfer is complete, the TPM_HASH_END structure is sent to denote the end of the data input. The computation time was observed to be 40 s for the target bitstream file of size 3.85 MB. Algorithm 2 shows the implementation of the function. In the SFSBL implementation, this function is made part of the image_mover.c file since this file is responsible for copying bitstream images between mediums.
The secure boot process loads the bitstream, as shown in
Figure 9. The bitstream is sent to the TPM for the integrity checking, where the cumulative hash is computed using the PCR registers with the scheme discussed in
Section 3. The TPM driver loads the PCR register with the cumulative hash for the 3.85 MB bitstream with the 256-bit segments along with the feedback from the PCR cumulative hash. The novel bare-metal TPM driver library uses locality 4 interface to access the PR registers and implements the TPM_PCR_READ structure with the function tpm_pcr_read. The hash value computed by the tpm_pcr_read function call is compared with the golden reference value stored on the temper resistant storage. In
Figure 9, the reference hash is not equal to the computed hash, therefore the boot process halts and results in jumping to the fallback process. In the fallback process, the device is only capable of bare-metal functionality with limited networking capabilities to mitigate the impact of the compromised device over the network.