DICE harder - A hardware implementation of the Device Identifier Composition Engine

DICE harder - A hardware implementation of the Device Identifier Composition Engine Lukas Jäger lukas.jaeger@sit.fraunhofer.de Fraunhofer Institute for Secure Information Technology Darmstadt, Germany Richard Petri richard.petri@sit.fraunhofer.de Fraunhofer Institute for Secure Information Technology Darmstadt, Germany ABSTRACT The specification of the Device Identifier Composition Engine (DICE) has been established as a minimal solution for Trusted Com- puting on microcontrollers. It allows for a wide range of possible implementations. Currently, most implementations use hardware that was not specifically designed for this purpose. These imple- mentations are reliant on black box MPUs and the implementation process has certain pitfalls due to the use of hardware that was not originally designed for the use in DICE. We propose a DICE architecture that is based on a microcon- troller equipped with hardware tailored to DICE’s requirements. Since DICE is intended to be a minimal solution for Trusted Com- puting, the architecture is designed to add as little overhead to a microcontroller as possible. It consists of minor modifications to the CPU’s processor pipeline, dedicated blocks of memory and modified interrupt and debug modules which makes it easy to implement. A prototype built on the VexRiscV platform, an open implementation of the RISC-V instruction set architecture, is created. It is synthe- sized for an FPGA and the increase in chip size and the impact on runtime due to the DICE extensions are evaluated. The goal is to demonstrate that with minimal changes to a microcontroller’s design a DICE can be implemented and used as a secure Root of Trust in environments such as IoT, Industrial and Automotive. KEYWORDS DICE, Trusted Computing, RISC-V, Root of Trust ACM Reference Format: Lukas Jäger and Richard Petri. 2020. DICE harder - A hardware imple- mentation of the Device Identifier Composition Engine. In The 15th In- ternational Conference on Availability, Reliability and Security (ARES 2020), August 25–28, 2020, Virtual Event, Ireland. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3407023.3407028 1 INTRODUCTION Trusted Computing, an assembly of various hardware and software techniques with the purpose of ensuring a certain state and behav- ior of the target device, has been established as powerful passive countermeasure against attacks that compromise the integrity of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. ARES 2020, August 25–28, 2020, Virtual Event, Ireland © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-8833-7/20/08. . . $15.00 https://doi.org/10.1145/3407023.3407028 a device. The Trusted Computing Group (TCG) has specified the Trusted Platform Module (TPM) [ 11 ] as the most notable example for these technologies. Nowadays it is available in nearly every modern PC and laptop where it is used as a basic building block for Secure Boot, the key management for hard drive encryption, the measurement of a software’s integrity, and many other pur- poses. Recently they are even deployed to embedded devices in automotive and industrial contexts. For smaller devices based on microcontrollers a fully-fledged TPM might not be feasible due to resource constraints. Also there are use cases where a TPM may not be necessary because only a small subset of its functionality is needed. For these contexts, the TCG specified the Device Identifier Composition Engine (DICE). This specification provides a basis for Trusted Computing with minimal system requirements. It merely requires the device to: • Store a read-only Unique Device Secret (UDS) • Feature a read-only boot code that is executed prior to any application • Provide a lock mechanism that allows access to the UDS only from the read-only boot code This is met by many devices, especially by those that are based on microcontrollers. However, ensuring that these three precondi- tions are met is a difficult process. Most commercial off-the-shelf devices provide no dedicated hardware for DICE. Therefore, the process of enabling DICE on such devices requires the use of hard- ware in DICE that was originally designed for a different purpose. Most mappings use memory sections originally intended for the storage of bootloaders and memory protection that should prevent accidental writes to crucial memory areas. The use of this hardware comes with certain pitfalls that have to be avoided for a secure implementation of DICE. One way to avoid these pitfalls is to design hardware specifi- cally for DICE. While this solution is not applicable to off-the-shelf microcontrollers it may add a considerable security advantage to microcontrollers that are yet to be designed. Since currently there is no microcontroller with dedicated DICE hardware available on the market, the impacts of this extension on the size of the chip design, the runtime and the overall security are unknown and need proper evaluation. Estimating these parameters with an FPGA-based pro- totype should be sufficient to determine, whether or not building DICE on dedicated hardware is feasible and advantageous. Building FPGA-based prototypes of microcontrollers with modi- fied hardware based on reverse-engineered AVR- or MSP430-cores has been possible for quite some time. Recently the free RISC-V instruction set architecture has gained a lot of momentum. It is backed both by academia and major hard- and software vendors and multiple free implementations for both FPGAs and ASICs are ARES 2020, August 25–28, 2020, Virtual Event, Ireland Lukas Jäger and Richard Petri available, including a toolchain. The instruction set supports both 32- and 64-bit instructions and is easily extendable. Therefore, a free RISC-V implementation is a good platform for a prototype implementation of DICE-specific hardware. In this paper we propose an implementation architecture of DICE based on hardware specifically tailored to DICE’s needs. This archi- tecture is designed with as little impact on the original hardware design as possible. It is implemented using the free RISC-V imple- mentation VexRiscV, synthesized for an FPGA and evaluated with already existing implementations and comparable Trusted Comput- ing solutions in mind. The aim of this DICE hardware is to provide a minimal Root of Trust for resource-constrained microcontroller. The remainder of the paper is structured as follows: In section 2, DICE and its position within the Robust Internet of Things (RIoT) architecture is explained briefly. Section 3 lists existing Trusted Computing solutions and DICE implementations for resource con- strained devices and evaluates their merits and drawbacks. Section 4 introduces existing RISC-V implementations that were examined in the course of this work, as well as Dairo, our extended instance of VexRiscV core. Section 5 explains the implementation architec- ture we propose based on Dairo. It introduces the attacker model, describes the underlying design principles and elaborates on the implementation details of the hardware and the firmware. An eval- uation of this implementation is given in section 6. The final section 7 provides a summary of the results and gives an outlook for future research based on this paper. 2 DICE IN A NUTSHELL The following section describes DICE and its place in the RIoT architecture and sketches Trusted Computing technologies that are based on it. 2.1 DICE and the RIoT architecture The Device Identifier Composition Engine (DICE) is a specification published by the Trusted Computing Group [ 12 ]. Its purpose is to anchor a Root of Trust in devices with minimal hardware overhead. Originally, it was a part of Microsoft’s RIoT concept [ 9 ]. RIoT divides the software running on a device into multiple layers. Each layer i is provisioned with a key K i when it is executed. It then computes a measurement of the next layer M i + 1 . This measurement is used as a proof of a software layer’s identity, therefore, a hash is usually a good candidate. Layer i computes the next layer’s key K i + 1 by applying a one-way function to the measurement M i + 1 and its own key K i . Layer i + 1 can now be started with its own key K i + 1 . This key • Provides evidence of the firmware integrity of layer i + 1 since the measurement M i + 1 is included in the key generation process. • Provides evidence of every piece of data that was introduced into the creation of the key K i If all the keys are derived like this, K i + 1 is also evidence for the integrity of all firmware layers below i + 1 . This allows the layers to form a chain of trust. If one of the layers is now compromised, K i + 1 will not be derived correctly. If K i + 1 was now used to derive cryptographic keys, these can no longer be derived correctly on the execution of the specific layer. Due to its position within this chain, the initial layer of software is not measured by any entity. Therefore, its layer key K 0 is not evidence of the layer’s integrity. This special property makes it difficult to detect manipulation or unspecified behavior. Since this layer bootstraps the whole chain of trust it is the foundation of its security. Therefore, some special measures must be taken to ensure the integrity of this layer and with that the overall security of the system. The specification of this initial layer with all necessary security measures in place is called Device Identifier Composition Engine (DICE). If manipulations of this layer can not be detected, it is reasonable to prevent them altogether. Therefore, DICE’s first requirement is to make the memory containing it read-only. Latest versions of the TCG’s specification weaken this requirement by allowing updatable DICE with a secure update mechanism but for the sake of clarity a non-modifiable DICE is assumed here. Since updates are difficult or impossible, it is important to keep DICE’s functionality to its bare minimum: Computing the key K 1 for the next layer. Keeping DICE as small as possible reduces the probability of a bug being included in it and makes it easier to verify. We’ve seen that in RIoT (and therefore in DICE) K 1 is computed by applying the one-way function to K 0 and a measurement of the next layer. Since K 0 is not provided by a previous layer, it must be provisioned to DICE in non- volatile memory. It must be unique and random for each device it is provisioned on. Furthermore, it must be kept secret or it would be trivial to derive keys for the device and use them for impersonation or forged attestations. In DICE terminology, K 0 is therefore called the Unique Device Secret (UDS). A non-volatile read-only memory that holds this UDS is DICE’s second requirement. If now any piece of firmware could access this UDS’ memory, its secrecy could not be guaranteed. In order to ensure the secrecy, the DICE specification imposes a third requirement: The UDS must only be readable for code within the DICE code memory. These requirements are not very specific about implementation details. This is deliberate and ensures a wide range of possible implementations. Figure 1 shows an overview of the RIoT-architecture, DICE’s position within it, and the chain of trust that is built from it. The one-way functions (OWF) derive the key for the subsequent levels, using the previous key (the initial level uses UDS accordingly) and a measurement of the next level as an input. The derived key is passed to the next level and key derivation function (KDF) may be used to derive cryptographic keys for the application. The KDF may be asymmetric depending on the nature of the keys derived from it. If they are symmetric, the KDF and the OWF can be the same. In principle, this key derivation process can be stacked infinitely, while a typical application might use four levels: The immutable DICE, DICE Core (a name given by the TCG to the first level of code after DICE, derives privacy-sensitive cryptographic keys and facilitates updates), a bootloader and the application code itself. In DICE’s terminology, K 1 is called Compound Device Identifier (CDI) to stress its purpose within a RIoT key chain. 2.2 DICE-based Trusted Computing Technologies One of the most common Trusted Computing concepts is Remote Attestation , the proof of certain properties of a device or software to A hardware implementation of DICE ARES 2020, August 25–28, 2020, Virtual Event, Ireland K 0 / U DS OWF K 1 / CDI Firmware Level 0 (DICE) OWF K 2 KDF Level- 1-Keys Firmware Level 1 (DICE Core) Measurement of Firmware Level 1 OWF K 3 KDF Level- 2-Keys Firmware Level 2 (Bootloader) Measurement of Firmware Level 2 ... Measurement of Firmware Level 3 Figure 1: An Example Architecture for DICE and RIoT a remote verifier. Two classes of Remote Attestation exist: Explicit attestation where the property is signed with a device’s key and im- plicit attestation, where the key and its correct derivation is proof of that property. Explicit attestation is in widespread use with TCG’s TPMs that feature means to measure properties for explicit attesta- tion securely. In the context of DICE and RIoT implicit attestation is used. Since all layer keys are proof of the device’s identity and the firmware’s integrity, keys derived from the layer keys can also be used for such proof. The simplest implicit attestation protocol com- bines a challenge-response-mechanism with a derived attestation key. If the response is computed correctly, integrity and identity of the device are verified. The TCG specified both asymmetric [ 13 ] and symmetric [ 14 ] implicit remote attestation protocols for the use with DICE. Another important Trusted Computing concept is Sealing , the encryption of data in a way that allows decryption only if the de- vice’s firmware is untampered. With DICE and RIoT this is achieved by deriving an encryption key K enc from a level key and encrypt- ing the data with it. If the firmware is modified, the level key will change and with it K enc . The encrypted data is unaccessible in that case. 3 RELATED WORK In order to demonstrate the relevance of a hardware-based DICE im- plementation, DICE is compared to a number of Trusted Computing Technologies that are available on the market or a topic of academia research. Furthermore available DICE implementations based on software and commercial off-the-shelf hardware are evaluated and compared to an approach, where dedicated DICE-hardware is used. 3.1 Available Trusted Computing Technologies 3.1.1 Commercial. The most widely known commercial solution for Trusted Computing is the Trusted Platform Module (TPM). A TPM is an additional chip that is included in devices and connected to the CPU of the device using a peripheral bus like SPI. This chip features a multitude of cryptographic algorithms and special techniques for integrity measurement and reporting. Like DICE it is specified by the Trusted Computing Group and implemented by multiple hardware vendors. Most modern PCs and Laptops are equipped with a TPM and Microsoft uses it for Secure Boot and the hard drive encryption BitLocker. While a TPM offers strong security, it is not suitable in resource-constrained devices. The additional chip needs to be placed on the PCB, which may violate design constraints for power or size. Furthermore, a TPM is not desirable in cost-sensitive applications if only a fraction of the large feature set of a TPM is used. The TCG recognized this and specified DICE as a technology for low-end and resource constrained alternatives to a TPM. 3.1.2 Academia. El Defrawy et al. proposed in [ 7 ] the Secure and Minimal Architecture for establishing a dynamic Root of Trust (SMART). While technically, SMART and a hardware DICE share many design principles and work similarly, SMART has a different scope and different implementation approach. SMART is supposed to dynamically attest software at any point in the device’s runtime. DICE is supposed to compute key material for trusted computing statically at boot time of the device. Its scope is not limited to attestation but also includes sealing, secure boot and many more. Another academia approach to trusted computing and remote attestation is Sancus. It was proposed by Noorman et al. in [ 17 ] and establishes a microcontroller based platform that runs multi- ple applications with memory isolation and integrated capabilities for remote attestation. While it achieves a high security level for microcontrollers, it requires significant changes to both the hard- ware and software that runs on it. The implementation expands a MSP430 processor by a hardware accelerator for hash operations, additional machine instructions and a Memory Access Logic (MAL) that enforces memory protection. The software must be compiled by a special compiler in order to use the provided hardware. The fact that DICE is a TCG standard that both companies and research institutes contribute to guarantees a wide audience and a broad range of applications for DICE-based technologies. The inclusion into this ecosystem is an important argument for the use of DICE instead of a non-standardized solution. 3.2 Available Software Implementations of DICE The most common implementations of DICE use microcontrollers with a Memory Protection Unit (MPU) to lock down partitions of memory that contain the UDS and DICE ROM after DICE was executed. This mapping is described by Hristozov et al. [ 15 ]. A less common approach is to include the UDS into the DICE ROM and allow no external access. An example was implemented by Jäger et al. [16]. ARES 2020, August 25–28, 2020, Virtual Event, Ireland Lukas Jäger and Richard Petri The advantages of both these software-based approach are the simplicity of implementation and applicability to COTS hardware. One of the major drawbacks is the use of device features that were not originally intended for DICE. This may lead to these features not being available for their original purpose. Jäger et al. use an ATMega328P’s bootloader ROM as DICE memory [ 16 ]. If now a bootloader is to be used on this device it must either be included into DICE, weakening the security guarantees of a minimal DICE implementation, or the bootloader must be placed in a different memory section. Locking down the memory with an MPU, as done by Hristozov et al. in [ 15 ], is a rather coarse-grained process. If the UDS is for example 32 bytes long but the MPU allows lockdown only for memory partitions of the size of a kilobyte, memory is wasted due to the technology not being adapted to the use in DICE. The use of features that are not adapted to DICE may also lead to pitfalls that are intricate to avoid. For example, Jäger et al. use the MCU’s memory protection fuses to block any access to DICE [ 16 ]. However the MCU does not forbid program jumps to DICE. And since the UDS is placed in DICE memory, it becomes trivial to read it from outside of DICE which effectively destroys DICE’s security model. Such pitfalls do not occur with hardware that is designed for use in DICE. Another drawback is the fact that the implementation of most MPUs remains the hardware vendor’s secret. Proposing specific DICE hardware and making the design principles publicly available helps building DICE implementations without the need to trust a hardware vendor. 4 EXTENDABLE RISC-V IMPLEMENTATIONS The initial efforts on the RISC-V [ 2 ] instruction set architecture (ISA) started in 2010. Goal was a free and open ISA suitable for real use (i.e. not simulation or binary translation), which neither targets a specific microarchitecture style, nor implementation technology. The result is a modular standard with a small base integer instruc- tion set with several standardized extensions, as well as designated space for future or custom extensions. Well known implementa- tions include the Rocket chip [ 3 ], the small PicoRV32 [ 1 ], Western Digital’s SweRV platform [ 8 ] or the VexRiscV core used in this work. The Rocket chip generator is one of the earliest implementations of the RISC-V ISA. The generator is written using the Chisel lan- guage, a hardware construction language. Unlike other high-level hardware description approaches, this language is not a high-level synthesis (HLS) tool, but rather a domain specific language for con- structing hardware. As this language is built on top of the Scala pro- gramming language, this approach enables highly parameterizable metaprogramming of hardware. As such, the rocket chip generator is customizeable, however, only offers a simple co-processor inter- face with the “Rocket Custom Coprocessor” (RoCC) for constructing customized extensions. Similarly, the much smaller PicoRV32 im- plementation, which is written in the Verilog language, offers the “Pico Co-Processor Interface” (PCPI) to extend the processor. For this work, the VexRiscV implementation was chosen, as it doesn’t offer a constrained extension interface but is extendable by design. At its core, VexRiscV [ 5 ] is an easily extendable pipeline, rather than an implementation of a CPU. The construction leverages the capabilities of the SpinalHDL hardware description language [ 4 ], a language very similar to Chisel. The basic building blocks are pipeline stages and stageables, i.e. input and output registers of pipeline stages. This framework is extended by “plugins”, which are able to define stageables and add processing logic to any pipeline stage to process stage inputs to outputs. Using this generic system, a RISC-V CPU is implemented with five pipeline stages: fetch, decode, execute, memory and writeback. Plugins manage the resources of the CPU, e.g. the registers, or provide services to other plugins, e.g. an instruction decoder or jump service. Some plugins also define external interfaces such as interrupts or memory interfaces. The VexRiscV core implementation comes with two example instances: The Briey SoC, a larger instance featuring most of the functionality of a microcontroller and, as well as the Murax SoC, a very basic SoC with limited functionality that can be synthesized on small FPGAs. While the VexRiscV core offers a data and instruction bus, the Murax SoC employs a simple arbiter favoring the data bus to form simple single-master memory architecture. Furthermore, the Murax SoC only features a single block of RAM, for program code and data alike. Consequently, it is difficult to introduce new or dedicated memory blocks into this architecture which DICE would require. Therefore, a custom instance called Dairo was created, based on the Murax SoC. The memory architecture was extended to a full multi-master interconnect to allow a more flexible way to add components such as memory blocks. 5 EXTENDING DAIRO WITH A HARDWARE BASED DICE This section explains, what kinds of attackers DICE in general and this implementation in specific has to deal with. The design principles are explained and the implementation itself is described. 5.1 Attacker Model DICE is supposed to be applied to inexpensive networking devices. Attacks against such devices tend to be inexpensive as well, because these devices usually do not contain information valuable enough to mount more expensive attacks. Attacks against the chip’s pack- age are ruled out by this. The attacker will most likely resort to more common and less intricate attacks such as monitoring net- work traffic and flashing a new firmware image. Due to the static nature of DICE’s protection, runtime attacks like return-oriented programming (ROP) are out of scope. DICE’s main purpose is the detection of firmware manipulations. Therefore, we consider an attacker who can modify the existing or flash a new firmware on the target device. 5.2 Design Principles Dairo is designed to be a very minimal microcontroller implementa- tion and DICE is supposed to be a low-overhead solution for Trusted Computing. Implementing a fully-fledged Memory Protection Unit (MPU), as featured by more powerful microcontrollers, is therefore out of the question. Furthermore, DICE-specific changes to the original Dairo design should not be made to the memory but rather to the CPU. FPGAs provide hardware block RAMs to map a design’s memory on. If the Dairo’s memory implements special DICE functionality, it may A hardware implementation of DICE ARES 2020, August 25–28, 2020, Virtual Event, Ireland not be mappable on block RAM anymore which would be an ineffi- cient design. Also some microcontrollers like Sifive’s Freedom E310 [ 20 ] use external flash memory that is connected via SPI or other interfaces. A DICE implementation with special requirements to the memory would not be usable in such a setup. An advantage of such a DICE implementation with no special requirements to the memory is the great flexibility when it comes to the addressing and technology to be used for DICE and the UDS. The original Dairo-implementation uses one block RAM for both code and data. This is suboptimal when it comes to measuring the firmware for DICE. The reason is that DICE measures a partition of memory with a fixed length and position. This can be achieved with linker file configurations or dedicated blocks of RAM for each purpose (code, data, ...). With regards to the extendability and a possible use with external memory, using dedicated memory blocks is the better solution. Therefore the single block RAM of the original Dairo is to be divided into multiple memory blocks. The design of the DICE extension for Dairo must ensure ad- herence to the requirements imposed by the DICE specification. Therefore each of the requirements must translate into security measures that are implemented in DairoDICE. Of course each of the three main requirements described in section 2 must be consid- ered. Furthermore, recent research has shown pitfalls that should be taken into account. Jäger et al. demonstrated that a combination of a jump to DICE memory and interrupts compromises the secrecy of the UDS on their implementation [ 16 ]. The original DICE spec- ification permits interrupts in DICE code explicitly and Jäger et al.’s work shows the importance of this requirement. This is why we give it the same importance in the design process as the three main requirements. Furthermore we avoid the pitfalls of DICE code being executable from code outside the memory region of DICE by permitting these executions. In summary, the DICE implementation must fulfill the following requirements: (1) DairoDICE must store a read-only Unique Device Secret (2) DairoDICE must provide a ROM for code that is guaranteed to be executed first on every device reset (3) The code in the DICE code ROM must be the only code that may access the UDS (4) The code in the DICE code ROM must not be interruptible (5) The code in the DICE code ROM must not be executed after initial execution It also should adhere to the following design principles: (1) DairoDICE should use minimal resources (2) DairoDICE should not modify the memory blocks (3) DairoDICE should feature separate memories for data and instructions 5.3 Implementation DairoDICE consists of several modifications of Dairo’s hardware with the purpose of protecting DICE memory from non-DICE code and a DICE firmware that computes the CDI for the next level of software. 5.3.1 Hardware. Read-only memory blocks for the UDS and DICE are the first step towards a hardware implementation of DICE. Furthermore, the original Dairo’s single RAM is suboptimal for use in DICE. Therefore, the Dairo’s RAM is divided into several memory blocks. One RAM serves as the code memory. The next RAM contains the runtime data of the program that is executed. Both are 4 kB large, which divides the original Dairo’s RAM of 8 kB effectively into two blocks. Following these, the DICE ROM is included. It is also 4 kB large. This size was taken from Jäger et al. where the DICE-bootloader-combination fits into 4 kB of bootloader memory [ 16 ]. The UDS ROM with a size of 32 bytes is included next. Finally, a separate piece of RAM is included. Its purpose is to store the CDI. It can also be used for higher level keys. The ROMs blocks for DICE and the UDS fulfill the requirements of dedicated read-only memory for DICE and the UDS. Detecting whether or not the CPU is executing DICE code is of vital importance for the UDS locking mechanism, the interrupt dis- abling and the detection of forbidden jumps to DICE. Consequently, the processor’s pipeline stage that fetches instructions is expanded by two comparators. One of them determines, whether or not the current instruction’s address lies within the DICE ROM’s memory range. This results in the IS_IN_DICE flag that is propagated to the subsequent pipeline stages. The other comparator determines, whether or not the next instruction will be within the DICE ROM’s memory range. If an instruction fetch causes a jump to DICE from non-DICE code, a jump to a trap vector is executed instead. All other instruction fetches are executed. This fulfills the requirement that code outside DICE must not be able to execute code in the DICE ROM. The protection of the UDS memory is enforced in the memory stage of the processor’s pipeline. It reads the propagated IS_IN_DICE - flag and determines, whether or not a memory access tries to read the UDS. If the memory stage is not in DICE and detects an access to the UDS, the access is blocked and a memory trap is triggered. This fulfils the requirement of the UDS being available only for DICE. Finally, the requirement of DICE not being interruptible is to be fulfilled. This is done with simple enable -lines for the respective plugins that are connected to the inverted IS_IN_DICE -flag of the CPU’s decode stage. Since the DICE firmware uses no interrupts and is not subject to debugging, the respective plugins can simply be disabled, when DICE code is executed. Figure 2 gives an overview of the Dairo’s architecture with DICE additions. Boxes in gray depict components that were added or changed for the use in DICE. 5.3.2 DICE Firmware. In order to keep the DICE hardware minimal, the cryptographic operations of DICE are implemented in firmware instead of hardware accelerators. A suitable one-way function is to be selected for the derivation of the CDI in the DICE firmware. The suitability depends on a number of factors. Since simplicity is a primary design principle of DairoDICE, it is not equipped with a possibility to update the DICE firmware. Consequently, the one-way function must be selected with long-term security in mind. Furthermore, the CDI must have a sufficient bit length. The DICE specification states, that the CDI should at least have a length of 256 bits [12]. Jäger et al. explore two possible choices for a one-way function: CBC-MAC based on AES and HMAC based on SHA-256 [ 16 ]. The ARES 2020, August 25–28, 2020, Virtual Event, Ireland Lukas Jäger and Richard Petri Is in DICE? Next Instruction is valid? Instruction Bus Master Instruction Fetch Instruction Decode Instruction Execute Memory Bus Master Memory Access Writeback CPU Pipeline Flash RAM DICE ROM UDS ROM CDI RAM Memories Debug Plugin CSR Plugin Plugins Figure 2: The Architecture of Dairo with DICE Extensions main advantage they attribute to the HMAC is the long-term se- curity. It is proven by Bellare et al. in [ 6 ] that an HMAC is secure against forgeries as long as the underlying hash function is still a pseudo-random function. This is true even for deprecated weakly collision-resistant hash functions. The European Agency for Net- work and Information Security (ENISA) stated in ther 2014 report [ 10 ] that HMACs based on MD5 and SHA-1 can still be considered secure against forgeries. This might give an idea, how long a hash function that is considered secure today can still be considered a secure building block for HMACs. An HMAC based on SHA-256 also fulfills the TCG’s requirement of the CDI being at least 256 bits long. Since every major crypto library is able to handle SHA-256 and HMACs based on it, HMAC-SHA-256 is used as the one-way function. Consequently the UDS is 256 bits long as well. The DICE firmware uses a heavily customized mbedTLS imple- mentation of SHA-256 and the HMAC. The original implementation contained many wrapper functions and structures for cryptoagility and object-oriented programming. These features are not required in the DICE firmware, therefore they are stripped away. The DICE firmware computes an HMAC from the next level firmware directly. Usually, the CDI is computed from the UDS and a measurement of the next level firmware’s integrity. This simplifies the DICE firmware because it saves one measuring operation. It is valid be- cause the firmware itself obviously contains all information about the firmware’s integrity. The CDI is computed from the UDS and the next level firmware and stored in the dedicated CDI RAM. Af- ter that, the memory objects are zeroized and the control flow is handed over to the next level firmware. The DICE firmware is compiled with a regular RISC-V variant of the GNU Compiler Collection (GCC). 6 EVALUATION 6.1 Hardware Synthesis The IcoBoard by Trenz Elektronik was selected as a target platform for the DairoDICE synthesis. This board comes with the iCE40- FPGA by Lattice, an FPGA with a bitstream format that was nearly completely reverse-engineered and allowed the creation of an Open Source toolchain. It is also one of the smallest FPGAs available and therefore an ideal target to demonstrate the feasibility of DairoDICE as a solution for resource-constrained devices. In order to evaluate the additional ressource consumption of a Dairo with DICE, two configurations are compared. The first synthesitzes Dairo without DICE for the IcoBoard, the second Dairo with added DICE. The results are compared in Table 1. We compare the number of LUTs (SB_LUT4) and the number of RAM blocks (SB_RAM40_4K), the most important values for the estimation of an FPGA design’s size. The number of block RAMs has grown as expected, since the original Dairo featured 8kB of memory and Dairo with DICE uses a little more than 12kB. Apparently, the IceStorm toolchain maps ROM to block RAM. The number of LUTs has grown by 27.1% which is a large increase. A closer examination shows that the majority of this increase results from the larger address space. Cutting the DICE and UDS ROMs and the CDI RAM from the design leads to a dramatically reduced LUT count of only 2568. Compared to the original Dairo design this is an increase of only 5.8%. This can be regarded as a good estimation how much resources the actual DICE logic (comparators that determine, whether an instruction is within the DICE control flow, interrupt A hardware implementation of DICE ARES 2020, August 25–28, 2020, Virtual Event, Ireland Components Dairo without DICE Dairo with DICE Total Increase Relative Increase Wires 2818 3717 353 31.9% Wire bits: 11775 15343 3568 30.3% Public wires: 951 1215 264 27.8% Public wire bits: 9327 11830 2503 26.8% Memories: 0 0 0 0% Memory bits: 0 0 0 0% Processes: 0 0 0 0% Cells total: 4148 5408 1260 30.4% Cells SB_CARRY 250 517 267 106.8% Cells SB_DFF 164 132 -32 -19.5% Cells SB_DFFE 871 1182 311 35.7% Cells SB_DFFER 118 122 4 3.4% Cells SB_DFFES 4 8 4 100% Cells SB_DFFESR 131 133 2 1.5% Cells SB_DFFESS 17 17 0 0% Cells SB_DFFR 116 120 4 3.4% Cells SB_DFFS 7 7 0 0% Cells SB_DFFSR 18 50 32 177.8% Cells SB_DFFSS 1 1 0 0% Cells SB_LUT4 2428 3086 658 27.1% Cells SB_PLL40_PAD 1 1 0 0% Cells SB_RAM40_4K 22 32 10 45.5% Table 1: Comparison of the hardware synthesis of Dairo without and with DICE and debug disable and the modified memory controllers) consumes. The larger address space seems to lead to a more intricate memory bus logic that results in more LUT consumption. A more efficient implementation of the memory controller will surely lead to better results. The place and route process gives us an estimate of the frequency Dairo with DICE can be run with. The original Dairo runs with a frequency of 29.69 MHz. By adding DICE logic, the critical paths becomes longer which lowers the possible frequency. Dairo with DICE runs with a frequency with 17.21 MHz. This is again a large decrease. With regards to the increase in LUT use, this is not sur- prising. The large number of LUTs result in a longer critical path. If we again synthesize the design without the additional ROMs and the CDI RAM, we get a maximum frequency of 22.64 MHz. This is a good estimation of the frequency decrease due to actual DICE logic. In this case it is the majority of the overall decrease (-7.05 MHz), while the large memory logic causes a smaller decrease of 5,43 MHz. It is difficult to compare the synthesis results with that of any of the mentioned Trusted Computing technologies. SMART is con- ceptionally the closest technology to DairoDICE. The synthesis results presented by El Defawry el al. are measured for a synthesis for an ASIC [ 7 ]. This is a more fine-grained process than synthe- sis for an FPGA and will therefore naturally result in a smaller increase. Also the synthesis for an ASIC allows for a simple count of logic gates which allows a more precise estimation of the re- source consumption of the design. SMART was synthesized for open source implementations of AVR- and MSP430-MCUs which are both 16-bit architectures while VexRiscV is a 32-bit architecture which results in more logic being required for the latter. And finally hardware synthesis result always depend on a number of parame- ters such as maximum clock frequency. These parameters were set differently for SMART and DairoDICE. For example SMART was synthesized for a frequency of 8MHz while DairoDICE can reach a frequency of 17.21MHz. For all these reasons, a direct comparison is not very meaningful. Nevertheless, SMART increases the count of logic gates by 10%, while our DICE implementation adds about 5.8% if the bus logic is subtracted. This shows that the increase has the same magnitude for both solutions. 6.2 DICE Firmware The DICE firmware is 3284 bytes large. This uses almost the whole 4 kB of DICE ROM. The firmware that Jäger et al. implemented for the Atmel ATmega328P used 3526 bytes of memory, included bootloader functionality and is only slightly larger than the Dairo DICE firmware [ 16 ]. They use a custom SHA-256-HMAC imple- mentation and inline assembly code for zeroization of used memory. The use of a customized mbedTLS implementation in Dairo’s DICE implementation may result in a increased code size and memory consumption. The impact on the runtime is measured by reading Dairo’s cycle count register at the beginning of the demo application’s main function. The measurement result is 441992 cycles. At 17 MHz (and assuming, one clock cycle completes one instruction), this leads to a runtime of 0.026 seconds. This is convergent with the observations from the prototype, where pressing the release button results almost immediately in the debug prints of the application, making DICE’s impact barely noticable. Implementation Number of Cycles Size of L-1- Firmware Clock Frequency Runtime This Implementation 441943 4K 17 MHz 26.0 ms Hristozov et al. 647420 8K 80 MHz 8.1 ms Jäger et al. - 30K 16 MHz 6 s Table 2: A comparison of the runtime of various DICE im- plementations Table 2 compares the runtime of the implementations of Hris- tozov et al. [ 15 ], Jäger et al. [ 16 ] and that proposed in this paper. Hristozov et al. use