

#### RISC-V IN SPACE WORKSHOP, 2-3 APRIL 2025, GOTHENBURG, SWEDEN



Università di Pisa





### Bringing together experts, engineers, and researchers

## **ENGAGE-V: A RERI-Compliant RISC-V Module for RAS in Space Applications**

Nicasio Canino, Daniele Rossi\*, Sergio Saponara

Department of information Engineering, University of Pisa, Italy

\*Contact author - daniele.rossi1@unipi.it



# Outline

|--|

Background and Motivation



Overview of ENGAGE-V IP Architecture



Design Space Exploration



Analysis of Synthesis Results



Conclusions and Future Work



RISC-V IN SPACE WORKSHOP, 2-3 APRIL 2025, GOTHENBURG, SWEDEN





### Reliability

Probability that the system produces correct outputs

### Availability

Ability of the system to be available at any time

### Serviceability

Ability of the system to provide information about the system failure occurred





### **Topic Discussion & Motivation**

Electronics systems for Space applications are exposed to extreme conditions, leading to a variety of HW errors, such as: SEU (Single-Event Upset), SET (Single-Event Transient), SEL (Single-Event Latch-up)

**Resilient & Fault-tolerant** computing systems can be obtained via:

- Redundancy in HW:
  - Spatial Redundancy (DMR, TMR, ...)
  - Information Redundancy (ECC, ...)







### **Topic Discussion & Motivation**

Electronics systems for Space applications are exposed to extreme conditions, leading to a variety of HW errors, such as: SEU (Single-Event Upset), SET (Single-Event Transient), SEL (Single-Event Latch-up)

**Resilient & Fault-tolerant** computing systems can be obtained via:

- Redundancy in HW:
  - Spatial Redundancy (DMR, TMR, ...)
  - Information Redundancy (ECC, ...)
- Error Logging & Reporting can further improve RAS





RISC-V IN SPACE WORKSHOP, 2-3 APRIL 2025, GOTHENBURG, SWEDEN

Triggering system SW

only when needed

Storing relevant errorrelated information



### **RAS Error Record Register Interface (RERI)** .

**RERI augments RAS** capabilities via standard error logging and reporting mechanisms for all RISC-V-based CPU and SoC designs:

- Error Taxonomy: classes and severity of detected HW errors
- Standard Logging Interface: ad-hoc memory-mapped register format
- Standard Reporting Interface: configurable error signaling

Logging and reporting features are extremely customizable and extensible No requirements on HW implementation





7







RISC-V IN SPACE WORKSHOP, 2-3 APRIL 2025, GOTHENBURG, SWEDEN



### **RERI: Error Register Specification**

**RERI specification** defines the **error logging** facilities to store all the information related to the detected HW error, comprising:

- Error Record: set of 64-bit registers accessible through memory-mapped accesses
  - 64-byte addressing space for each error record
  - Error logging and reporting features extremely configurable

#### Error Bank:

- 64-byte Header containing all the relevant info on the bank
- An array of N ≤ 63 error records, with info on the error (error\_msg, ...), the address of the erroneous location, # of CEs, etc.

| Offset       | Name            | Size | Description                                          |
|--------------|-----------------|------|------------------------------------------------------|
| 0            | vendor_n_imp_id | 8    | Vendor and implementation ID.                        |
| 8            | bank_info       | 8    | Error bank information.                              |
| 16           | valid_summary   | 8    | Summary of valid error records.                      |
| 24           | Reserved        | 32   | Reserved for future standard use.                    |
| 56           | Custom          | 8    | Designated for custom use.                           |
| 64 + 64*n    | control_i       | 8    | Control register of error record i.                  |
| 72 + 64*n    | status_i        | 8    | Status register of error record i.                   |
| 80 + 64*n    | addr_i          | 8    | Address register of error record i.                  |
| 88 + 64*n    | info_i          | 8    | Information register of error record i.              |
| 96 + 64*n    | suppl_info_i    | 8    | Supplemental information register of error record i. |
| 104 + 64*n   | timestamp_i     | 8    | Timestamp register of error record i.                |
| l I 2 + 64*n | Reserved        | 16   | Reserved for future standard use.                    |





### **ENGAGE-V: a RERI-compliant IP**







#### **Pre-Process Stage**

- Error Mux receives the error control signals generated by the ECC circuitry of the monitored HW-units
  - Error Synch. Interface synchronizes the error message with the address of the erroneous location coming from the Address Buffer
  - Internal circular FIFO buffer is the interface to the next stage









#### **Error Log & Report Stage**

#### Logging Controller retrieves the synchronized error info and determines in which error record:

- Log the new error info in an error record in the Error Record Bank
- Overwrite a stored error or Discard a new error

#### **IRQ Generator**

implements the Error Reporting feature via interrupt signals



### Design Space Exploration: # of ENGAGE-V Instances

For the targeted system, for a given number of HW-units to be monitored:

- The number of instances of ENGAGE-V IP (N<sub>IP</sub>), and how many HW-units to monitor (N<sub>HW-UNIT</sub>) with each IP, can be selected.
- Each IP instance has three parameters:
  - N<sub>ER</sub> to determine the number of error records within that bank,
  - Configuration of error records of that bank, and
  - N<sub>FIFO</sub> for the depth of the FIFO buffer in the pre-processing stage



RISC-V IN SPACE WORKSHOP, 2-3 APRIL 2025, GOTHENBURG, SWEDEN

DANIELE ROSSI, UNIVERSITY OF PISA <sup>13</sup>

Embedded

Application

HPC/Cloud



According to RERI specification:

- All error records in a bank have same log-report configuration
- Not all registers must physically implement 64-bit registers



Number of 1-bit registers (**REG<sub>TOT</sub>**) for a bank depends on the log-report configuration.

- Each error record requires 18 to 325
  1-bit registers
- Different implementations may be characterized by
  - very different error logging and reporting features
  - very different impact on area overhead



### **Area Impact of Main Design Parameters**





ARTIMENTO DI

RISC-V IN SPACE WORKSHOP, 2-3 APRIL 2025, GOTHENBURG, SWEDEN



### **Area Impact of N<sub>HW-UNIT</sub>**





RISC-V IN SPACE WORKSHOP, 2-3 APRIL 2025, GOTHENBURG, SWEDEN



### Area Impact of N<sub>ER</sub>



RISC-V IN SPACE WORKSHOP, 2-3 APRIL 2025, GOTHENBURG, SWEDEN

DANIELE ROSSI, UNIVERSITY OF PISA



### CONCLUSIONS

- RERI-compliant Error Logging & Reporting IP for RISC-V-based systems
- Design parameters of RAS IP should be constrained to the targeted space application
- Error record bank may impact area consumption of ENGAGE-V module
  - RERI flexibility helps designers tailor the ENAGAGE-V module to their requirements

### **FUTURE WORK**

- Further optimizations of the performance of the ENGAGE-V module
- Exploration of additional fault-tolerance techniques
- Exploration of error recovery solutions (e.g., rollback, checkpointing, etc.) to handle the reported error events





#### **RISC-V IN SPACE WORKSHOP, 2-3 APRIL 2025, GOTHENBURG, SWEDEN**





### Daniele Rossi

(daniele.rossi1@unipi.it)

# D dare