





# Fault-Tolerant RISC-V Softcore: SRAM-Based FPGA Implementation and Reliability Testing



Eduardo Marañon Aguilar - edmaguilar@inf.ufrgs.br Fabio Benevenuti - fbenevenuti@inf.ufrgs.br <u>Fernanda Lima Kastensmidt</u> – fglima@inf.ufrgs.br

Institute of Informatics - UFRGS Microelectronics Graduate Program (PGMICRO) New space wants more and more to use opensource hardware and software because designers want to:

- reuse
- adaptive solutions: area, performance and power

-SRAM-based FPGAs -> configurability -Embedded softcore -> open-source, adaptivity processors (RISC-V) -> reuse







- In this work, we propose a mitigation technique method to add redundancy and voting in a softcore RISC-V processor.
- The mitigation is automatically implemented in the RTL level code of the processor.
- We synthesized this fault tolerant RISC-V into a SRAM-based FPGA from AMD/Xilinx.
- We evaluate under emulated fault injection different implementations of the TMR.
- Results show the tradeoff between area, number of voters and the reliability of the fault tolerant RISC-V processor.



- Case-study: RISC-V Steel SoC
- Radiation effects in SRAM-based FPGA
- Fault Tolerant RISC-V Steel SoC
- Fault Injection Methodology
- Results
- Conclusions

### Case-study: RISC-V Steel SoC

- Free, customizable for usage in FPGAs and embedded systems.
- Modules: Steel core, Timer, Interconnection Bus, UART, SPI, GPIO and RAM.
- Implements the RV32I ISA, the Zicsr extension and the Machine-mode privileged architecture of RISC-V.
- 32-bit processor with 3-stage of pipeline.
- Support for real-time operating system like FreeRTOS.



Source: De Oliveira, R. Design of Steel: a RISC-V Core. Lume-UFRGS. https://lume.ufrgs.br/handle/10183/219134







- Case-study: RISC-V Steel SoC
- Radiation effects in SRAM-based FPGA
- Fault Tolerant RISC-V Steel SoC
- Fault Injection Methodology
- Results
- Conclusions

## Radiation effects in SRAM-based FPGA

- SRAM-based FPGAs are very attractive to aerospace applications due to their reconfigurability.
- Softcore processors are common embedded into FPGAs to accelerate and control many types of applications.
- However, radiation can impact SRAM-based FPGAs by inducing faults leading to misconfiguration and malfunction.
- Most common type of faults are bitflips in the configuration memory bits (CRAM), which lead a persistent error in the bitstream, so faults must be masked until the reconfiguration of the FPGA.



Accumulated bitflips in CRAM (persistent effect) -> many ERRORs Corrected by RECONFIGURATION



- Case-study: RISC-V Steel SoC
- Radiation effects in SRAM-based FPGA
- Fault Tolerant RISC-V Steel SoC
- Fault Injection Methodology
- Results
- Conclusions

- **Triple Modular Redundancy (TMR)** with voters is one of the best solutions to mitigate faults in SRAM-based FPGAs, because it can mask the fault until the reconfiguration of the device.
- TMR can be implemented with different types of granularity:
  - Coarse-Grained TMR
  - Fine-Grained Distributed TMR
- TMR granularity impacts area, performance and reliability

• Scrubbing: continuous reload the original bitstream into the SRAM-based FPGA to correct the bitflips in the CRAM.



#### Fault Tolerant Mitigation Techniques











#### How many voters are enough?



Voting insertion is selected by analyzing the fan out (f.o.) of the flip-flops (FF)



So, f.o.  $\geq$  1 means that all the FF will have voters at the output

|           | FGDTMR                        |      |         |         |  |  |  |  |  |
|-----------|-------------------------------|------|---------|---------|--|--|--|--|--|
| Element   | $f.o \ge 4$ $f.o \ge 3$ $f.o$ |      | f.o ≥ 2 | f.o ≥ 1 |  |  |  |  |  |
| Registers | 4943                          | 4943 | 4943    | 4943    |  |  |  |  |  |
| LUTs      | 8134                          | 8362 | 11719   | 12550   |  |  |  |  |  |
| Voters    | 264                           | 492  | 3849    | 4680    |  |  |  |  |  |

#### Fault Tolerant RISC-V Steel SoC







- Case-study: RISC-V Steel SoC
- Radiation effects in SRAM-based FPGA
- Fault Tolerant RISC-V Steel SoC
- Fault Injection Methodology
- Results
- Conclusions

# Fault Injection Methodology



- Random bit-flips are injected in FPGA's CRAM to emulate radiationinduced SEUs during benchmark execution at no more than one fault injected per processing cycle.
- Utilizes the Internal Configuration Access Port (ICAP) to manipulate CRAM contents (ICAP Fault Injector).
- FI affects the configuration of:
  - CLB (without DFF), DSPs
  - BRAM,
  - routing paths





## Fault Injection Methodology



The errors are classified in:

- Silent data corruption (SDC)
- Time out





- Case-study: RISC-V Steel SoC
- Radiation effects in SRAM-based FPGA
- Fault Tolerant RISC-V Steel SoC
- Fault Injection Methodology
- Results
- Conclusions

## Case-study Applications:



Two applications are running concurrently in FreeRTOS.

3°: Tx de data through

the HW interface





#### **Results: Area**





Is the most fine grain TMR (FGDTMR f.o. > 1) the most reliable one?

The FGDTMR f.o. > 1 has more voters, so more errors can be masked in the design.

|                       | Configuration |              |              |              |               |               |  |
|-----------------------|---------------|--------------|--------------|--------------|---------------|---------------|--|
| Element               | Unhard.       | CGTMR        | FGDTMR       |              |               |               |  |
|                       |               |              | f.o. ≥ 4     | f.o. ≥ 3     | f.o. $\geq 2$ | f.o. ≥ 1      |  |
| Block memory          | 32            | 32           | 32           | 32           | 32            | 32            |  |
| Registers             | 1,830         | 4,943 (2.7×) | 4,943 (2.7×) | 4,943 (2.7×) | 4,943 (2.7×)  | 4,943 (2.7×)  |  |
| Carry                 | 183           | 397 (2.2×)   | 397 (2.2×)   | 397 (2.2×)   | 397 (2.2×)    | 397 (2.2×)    |  |
| Look-up tables (LUTs) | 3,073         | 7,879 (2.6x) | 8,134 (2.6x) | 8,362 (2.7×) | 11,719 (3.8×) | 12,550 (4.1×) |  |
| Voters                | —             | 9            | 264          | 492          | 3,849         | 4,680         |  |
| Essential bits (Mbit) | 0.8           | 2.0 (2.3×)   | 2.1 (2.3×)   | 2.1 (2.4×)   | 2.7 (3.0×)    | 2.8 (3.1×)    |  |

### **Results: Reliability**



#### Is scrubbing enough? No!



## Results: Reliability



Is **TMR enough**? The FGDTMR with voting selection shows different reliability!



## Results: Reliability



If we implement **TMR and scrubbing**... all the TMR shows similar results. **Best to choose the TMR with smallest area!** 



### Pelletron accelerator heavy ions experiment





Assembly of the decapsulated Zedboard (xc7z020clg484-1)



General assembly of the experiment





#### Fault Injection - SBU

#### Heavy ions radiation experiment (Pelletron Accelerator) - SBU/MBU





- Case-study: RISC-V Steel SoC
- Radiation effects in SRAM-based FPGA
- Fault Tolerant RISC-V Steel SoC
- Fault Injection Methodology
- Results
- Conclusions

### Conclusions



- An adapted TMR method with flip-flop selection offered insights into scaling FGDTMR while balancing area and reliability.
- The proposed method that selectively choosing where to add voters combined with scrubbing showed very good results on achieving high reliability and reduced area.
- The TMR and voters are added automatically.
- FI method can mimic SBU and MBU radiation data in the FPGA configuration memory.
- This method can fast perform design exploration fault tolerance on RISC-V softcore processors.







# Thank you very much!

# Questions?

Eduardo Marañon Aguilar - edmaguilar@inf.ufrgs.br Fabio Benevenuti - fbenevenuti@inf.ufrgs.br **Fernanda Lima Kastensmidt – fglima@inf.ufrgs.br** 

