

# Evaluating the Reliability of Different Voting Schemes for Fault Tolerant Approximate Systems

Tiago R. Balen<sup>1</sup> · Carlos J. González<sup>1</sup> · Ingrid F. V. Oliveira<sup>2</sup> · Leomar S. da Rosa Jr<sup>2</sup> · Rafael I. Soares<sup>2</sup> · Rafael B. Schvittz<sup>3</sup> · Nemitala Added<sup>4</sup> · Eduardo L. A. Macchione<sup>4</sup> · Vitor A. P. Aguiar<sup>4</sup> · Marcilei A. Guazzelli<sup>5</sup> · Nilberto H. Medina<sup>4</sup> · Paulo F. Butzen<sup>1</sup>

Received: 30 August 2022 / Accepted: 2 June 2023 / Published online: 20 June 2023 © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023

## Abstract

This work presents a study on the reliability of voters for approximate fault tolerant systems in the context of single event effects and electromagnetic interference. A first case study analyses different topologies of single-bit majority voters for logic circuits employing fault injection by simulation. In these simulations, an analysis is first performed to identify the critical diffusion areas of the physical implementation according to the voter input vector. Additionally, as a second case study, practical heavy ion experiments on different architectures of software-based approximate voters for mixed-signal applications are also presented, and the cross section of each voter is evaluated. The system comprising the voters was irradiated in two distinct experiments with an  $^{16}O$  ion beam, producing an effective LET at the active region of 5.5 MeV/mg/cm<sup>2</sup>. As a complementary study, a conducted electromagnetic interference injection was also performed, considering two distinct voting schemes. Results of the case-studies allow identifying the most tolerant voter architectures (among the studied ones) for approximate computing applications under single event effects and electromagnetic interference.

**Keywords** Approximate computing  $\cdot$  Radiation  $\cdot$  Electromagnetic interference  $\cdot$  Analog-to-digital converters  $\cdot$  Voters  $\cdot$  Redundancy  $\cdot$  Fault tolerance

# 1 Introduction

Approximate computing (AC) is an emerging paradigm that has been attracting attention due to its tradeoff between the exactness of computations and performance gains. AC aims

Responsible editor: L. M. B. Poehls

☐ Tiago R. Balen tiago.balen@ufrgs.br

- <sup>1</sup> Universidade Federal do Rio Grande do Sul, Programa de Pós-Graduação em Microeletrônica, PGMICRO, Porto Alegre, RS, Brazil
- <sup>2</sup> Universidade Federal de Pelotas, Programa de Pós-Graduação em Computação, Pelotas, RS, Brazil
- <sup>3</sup> Universidade Federal do Rio Grande, Centro de Ciências Computacionais, Rio Grande, RS, Brazil
- <sup>4</sup> Universidade de São Paulo, Instituto de Física, São Paulo, SP, Brazil
- <sup>5</sup> Centro Universitário FEI, Departamento de Física, São Bernardo do Campo, SP, Brazil

to achieve benefits in terms of area, power, and delay at the expense of output quality. This approach is being explored in several error-resilient applications, such as multimedia applications and other ones based on inherently imprecise algorithms.

A significant concern for modern systems, specially in space applications and man-made radiation environments, is the system level effects of soft errors caused by ionizing particle strikes in sensitive areas of the integrated circuits [19, 24]. Triple Modular Redundancy (TMR) is a single-fault masking technique widely used to mitigate soft errors. However, TMR imposes an area overhead of at least 200%. Therefore, AC is also being applied in TMR solutions to explore the tradeoff between circuit costs (area and power) and fault coverage [23].

Approximate TMR (ATMR) may also rely on different algorithms or hardware architectures with distinct precision in the calculation, leading to different (though approximate) results in each module [23]. This can bring the benefit of the design diversity paradigm to ATMR systems, increasing the reliability of the systems due to the possibility of mitigating common-mode faults, which are those that occur in more than one copy of a redundant system, nearly simultaneously, and are generally associated to a single cause [18, 22]. Even though 100% single-fault coverage is no longer guaranteed in ATMR systems, the majority voters remain the most sensitive part of the architecture. In this way, investigating their robustness in a scenario where any input combinations can happen is essential to ensure the best compromise of ATMR systems.

The possible granularity levels of redundancy techniques may be divided into two categories: coarse grain and fine grain, as mentioned in [9]. Coarse grain TMR considers that the module to be replicated is the entire design or a given system functional module and the outputs of these modules are then voted. This approach may be suitable when the system integrator cannot modify the hardware of the modules, for instance. On the other hand, fine grain TMR replicates sub-modules, as the registers of a given design, with the voting being performed at the output of these fine grained sub-modules.

If considering coarse grain approximate redundant systems (with software diversity, for instance) the majority voters need to take into account small differences among the results of each computation. Such differences may be due to distinct precision or resolution of the algorithms and hardware copies. Indeed, "flexible voting" is an issue that has been addressed from many years by designers of redundant analog or mixed-signal systems (and in on-line analog test), because, analog signals and analog blocks are never exactly equal (due to variability, noise and mismatch, for instance), even when designed to be so [10].

In a previous work, the need to perform flexible (or approximate) voting in a fault tolerant Analog-to-Digital (A/D) interface of a microprocessor-based system, led us to propose two architectures of software-based word voters. The voting is made upon the output of three A/D converter modules with diversity redundancy (hardware and temporal diversity) [10]. In a further investigation, the system was tested under heavy ions [13]. Besides proving that the proposed diversity based redundant system is suitable to reduce the overall soft error rate of the proposed A/D interface, results showed that the reliability of the distinct approximate voters are also different, which has motivated one of the case studies presented in this paper. Therefore, a third voting architecture was implemented and tested in a second heavy ion experiment.

In addition to the heavy ion experiments presented in [7], the architectures employed as temporal voter in the casestudy design were also tested under conducted electromagnetic interference (EMI). Such kind of disturb may cause multiple signal deviations spread in time, which may lead to multiple errors occurring at the voter inputs at the same voting cycle. Therefore, both architectures are compared evaluating the most appropriate one to cope with multiple errors in an EMI context. Details of heavy ion and EMI experiments, as well as the tested architectures and results, are presented in Section 3. Besides the possible need for using such special flexible voters (also known as inexact voters [15]), well-known singlebit hardware voters have to be revisited when used in ATMR circuits. In TMR circuits, the single-bit voters have to deal with only two input vectors, 000 and 111, in fault-free conditions. All other input vectors characterize a faulty condition, and faults in one of the modules are masked by the majority voter. Different from TMR circuits, a single fault analysis in ATMR circuits must consider all possible input vector combinations in voter analysis. These issues bring the need to investigate single-bit voters reliability in the context of the approximate computing paradigm. An initial investigation has been performed with three single-bit voters topologies [7]. This paper extends the analysis considering the voters presented in [21].

The main contribution of this work is the reliability evaluation of voters used in fault tolerant approximate systems, considering radiation induced soft errors and electromagnetic interference. For this purpose, two case studies were considered, in different abstractions levels and domains. Firstly, a thorough analysis is performed on single-bit voter architectures for approximate logic circuits, aiming to investigate the most susceptible and robust architectures against soft errors. Then, approximate word voters for software-based applications are also studied. A mixed-signal case study is considered, in which a redundant system with different approximate voting schemes is irradiated with heavy ions, and the dynamic cross section (soft error susceptibility) of each voter is evaluated. Additional EMI experiments were performed to investigate the voters' ability to deal with multiple erroneous inputs. The experimental setup, results and discussion of each case-study are presented in Sections 2 and 3, respectively, while general conclusions are given in Section 4.

## 2 Single-Bit Voters for Approximate Logic Circuits

Single-bit voters implement the majority logic function. This function is widely used in various applications, starting with its utilization in logic synthesis through the majority-inverter graph, until its use as a voter in redundant circuits to increase fault tolerance [3, 4]. In fault-tolerant applications, the importance of evaluating voter behavior against radiation increases.

Several works investigate the radiation robustness of single-bit majority voters. Some of them have made this analysis at a logic level [6, 8], ignoring faults inside the logic blocks. This internal aspect is explored in works that deal with the transistor voter implementation [20]. Oliveira et al. investigate the robustness of single-bit voters at the physical level for traditional TMR circuits [21]. Tooba et al. explore permanent faults in ATMR architectures [5]. We investigate the single-bit voter robustness in the presence of radiation faults when used in ATMR solutions. First, we present the aspects evaluated in our analysis. We evaluate these aspects in the voters presented in [21]. Then we focus the analysis and discussion on the three single-bit voters, which have presented the best results regarding fault tolerance when applied in an ATMR architecture.

## 2.1 Evaluated Aspects

The proposed analysis investigates two important aspects of fault tolerance when applied to radiation effects. The first evaluated aspect is identifying the reverse-biased PN junctions, which are the ones able to collect charge during a single event. This first evaluated aspect reflects the circuit radiation susceptibility. Considering the digital characteristic of the majority voters, the reverse-biased PN junctions can be computed through a logic analysis for each possible input vector. The second consideration in the reverse-biased PN junctions identification explores diffusion sharing when physical voter implementation is considered. To illustrate the previous statement, Fig.1c shows the schematic and stick diagram (layout simplification) of CMOS Voter. The node n1 in the schematic is a single node, while this node has two diffusions (3 and 6) in its physical implementation. For an accurate analysis, the proposed evaluation has been performed at the physical level.

Another important aspect of the radiation robustness analysis is the energy deposition density of a particle. The Linear Energy Transfer (*LET*) is the average energy deposited per unit path length along the track of the ionizing particle. This second evaluated aspect reflects the circuit robustness in terms of of the amount of energy required to cause a disturbance. In this analysis, the *LET* Threshold (*LET*<sub>th</sub>) is the minimum *LET* required to cause a transient bit-flip in the node logic value. The *LET*<sub>th</sub> is the second parameter evaluated in our analysis, and it is obtained through electrical simulations.

### 2.2 Single-Bit Voter Analysis

In [20], Oliveira et. al. presented an investigation on different transistor topologies of single-bit majority voters. The work considers single-fault analysis in a TMR solution. As the investigation focuses on the single-bit voters, the single fault occurs in the voter, and the TMR modules are free of faults. With this assumption, only 000 and 111 input vectors are considered in the voter robustness analysis. Our work explores the uses of single-bit voters in ATMR architectures. In this situation, all possible input vectors have to be considered [5].

We illustrate the explored methodology and summarize the discussion considering fourteen single-bit voters presented in [21]. The transistor schematics and possible layouts (depicted as stick diagrams) of these fourteen singlebit voters topologies are illustrated in Fig. 1. The two first voters, illustrated in Fig. 1a and b, are the traditional ones built with basic NAND and NOR gates. The following two are designed considering the static CMOS logic family rules. The difference between them are the inverters. In the former, depicted in Fig. 1c, the inverter is in the output node. In the latter, shown in Fig. 1d, the inverters are in the inputs. This modification has been explored in [21] and presented good results considering traditional TMR analysis. The four following topologies are derived from [6]. The first two, illustrated in Fig. 1e and f, explore basic logic gates, and the following two, depicted in Fig. 1g and h, are designed considering static CMOS logic family rules. The final six, shown in Fig. 1i-n, are derived from [8]. That work proposes a voter using logic blocks, and six transistor arrangement variations are built from the logic specification.

Our analysis starts by computing the sensitive diffusion areas (*SDA*). We consider *SDAs* all the diffusion areas that are not connected to the power rails. These *SDAs* are the PN junctions that could be reversely biased, being able to collect charge during a single event. This computation is performed at the layout level, to increase the accuracy. As an example, the CMOS Voter, depicted in Fig. 1c, has diffusion areas 2, 7, 9, and 11 connected to the power rails. Then, the *SDAs* are 1, 3, 4, 5, 6, 8, 10, 12, 13, and 14. The *SDAs* of CMOS Voter are listed at the top of Table 1.

The second step evaluates the reverse-biased *SDAs* considering each of the possible input vectors. These nodes are called critical diffusion areas (*CDAs*) since they are the ones that effectively change their logic value when a particle with sufficient energy strikes them. Considering the cross-section as the sensitive area of the irradiated circuit element, the *CDAs* provides a technology independently parameter that is easily converted to the voter cross-section for a specific technology. From the exhaustive analysis of the CDAs for all possible input vectors, the designer has relevant information to design the modules to avoid input combinations that make the single-bit voter more susceptible to the effects of radiation.

The *CDAs* are computed through a logic analysis for each possible input vector. To illustrate the procedure, let us considered the input vector ABC = 010. It is one of the input vectors that presents the higher number of *CDAs* in CMOS voter. The *CDAs* for this vector are 1, 10, 13, and 14. In the other hand, the vector ABC = 011 presents only two *CDAs*: 5 and 8. Table 1 presents the *CDAs* for each possible input vector for CMOS voter. These data presented in detail reveal important considerations that could help designers choose the best option for their application. This information is relevant to designers who could choose the better option to use in their ATMR according to the module outputs probabilities.





Fig. 1 Voters schematics and respective stick diagrams [20]: a NANDs, b NORs, c CMOS, d CMOS2, e Bala, f Bala2, g Bala CMOS, h Bala CMOS2, i Ban, j Ban2, k Ban3, l Ban4, m Ban5, n Ban6

The  $LET_{th}$  analysis is obtained through electrical simulation. The voters netlists are described using a 32 nm CMOS bulk technology. The supply voltage of this technology is 0.9 volts. The transistor sizing follows the logic effort approach, and the pass transistors have a minimum size (L = 32 nm and W = 64 nm). Four minimal inverters are used as load. The pulse generated by the particle strike is modeled in Eq. 1 as a double exponential current source.  $Q_{coll}$  is the charge collected (in fC),  $\tau_{\alpha}$  is the charge collection time constant,  $\tau_{\beta}$  is the time constant relating ion track generation. It was first suggested in [16] and has since been widely used by researchers:

$$I(t) = \frac{Q_{coll}}{\tau_{\alpha} - \tau_{\beta}} \left( e^{\frac{-t}{\tau_{\alpha}}} - e^{\frac{-t}{\tau_{\beta}}} \right)$$
(1)

**Table 1** Critical DiffusionAreas (CDAs) of CMOS single-bit voter for each input vector

| Input Vector | Sensitive Diffusion Areas |   |   |   |   |   |    |    |    |    |
|--------------|---------------------------|---|---|---|---|---|----|----|----|----|
|              | 1                         | 3 | 4 | 5 | 6 | 8 | 10 | 12 | 13 | 14 |
| 000          | X                         |   |   |   |   |   |    |    | X  |    |
| 001          | Х                         |   |   |   |   |   |    |    | Х  |    |
| 010          | Х                         |   |   |   |   |   | Х  |    | Х  | Х  |
| 011          |                           |   |   | Х |   | Х |    |    |    |    |
| 100          | Х                         |   |   |   |   |   |    | Х  | Х  |    |
| 101          |                           | Х |   | Х | Х | Х |    |    |    |    |
| 110          |                           |   | Х | Х |   | Х |    |    |    |    |
| 111          |                           |   |   | Х |   | Х |    |    |    |    |

Equation 2 define the charge collection, where Linear Energy Transfer (LET) is the amount of charge deposited per unit length, *L* represents the depth of the charge collection, and the constant 10.8 fC is the charge that a particle with LET = 1 MeV/mg/cm<sup>2</sup> deposit for every 1  $\mu$ m [11].

$$Qcoll = 10.8 \times (L) \times (LET)$$
<sup>(2)</sup>

The minimum current pulse that causes a flip in the logic value of the evaluated node is used to compute the  $LET_{th}$  according to [16]. In this comparative analysis, it is considered that the injected charge is entirely collected by the junction electric field. The  $LET_{th}$  for each voter is presented in Table 2.

Table 2 summarize the obtained results. The second column presents the number of transistors used for each design. The CMOS, Ban4, and Bala CMOS are the three single-bit voters versions that present the fewest number of transistors between the evaluated circuits. The third column presents the total amount of *SDAs*. The Ban4 voter presents the same transistor

 Table 2
 Sensitive/Critical Diffusion Areas and LET<sub>th</sub> results

| Voter       | Trans. | SDA | Sum of <i>CDA</i> for all vector | LET <sub>th</sub><br>(MeV/mg/<br>cm <sup>2</sup> ) |
|-------------|--------|-----|----------------------------------|----------------------------------------------------|
| NANDs       | 18     | 14  | 33                               | 0.74                                               |
| NORs        | 18     | 14  | 33                               | 0.69                                               |
| CMOS        | 12     | 10  | 22                               | 0.68                                               |
| CMOS 2      | 16     | 14  | 26                               | 0.68                                               |
| Bala        | 18     | 15  | 28                               | 0.68                                               |
| Bala 2      | 18     | 18  | 31                               | 0.69                                               |
| Bala CMOS   | 14     | 12  | 24                               | 0.68                                               |
| Bala CMOS 2 | 18     | 16  | 28                               | 0.69                                               |
| Ban         | 18     | 18  | 32                               | 0.39                                               |
| Ban2        | 24     | 20  | 34                               | 0.68                                               |
| Ban3        | 24     | 20  | 32                               | 0.68                                               |
| Ban4        | 12     | 14  | 28                               | 0.37                                               |
| Ban5        | 18     | 17  | 32                               | 0.39                                               |
| Ban6        | 18     | 17  | 26                               | 0.39                                               |

amount as the CMOS voter, but due to the pass transistors presented in its design, it has four more *SDAs*. The fourth column presents the sum of *CDAs* for all eight possible input vectors. Again, the pass transistors presented in Ban4 voter penalize it in this criteria. Finally, the last column presents the *LET*<sub>th</sub>. As the *LET*<sub>th</sub> values have a strong relationship with the transistor sizing, the values presented in the table have higher significance when compared between them rather than observing the absolute value itself. For this analysis the particle track length was considered as  $2\mu$ m. The *LET*<sub>th</sub> is related to the output node in all voters. The CMOS and Bala voters present the same *LET*<sub>th</sub> because they explore the traditional Static CMOS design characteristics. The Ban voter uses pass transistors, and this characteristic compromises the obtained *LET*<sub>th</sub>.

Considering the presented data, we conclude that the traditional voter, designed in Static CMOS style, presents the same  $LET_{th}$  value as the single complex gate version of [6]. However, CMOS presents a slightly smaller sum of *CDAs* when all input vectors are considered. Additionally, the voter presented in [8] was proposed at the logic block level, and the original analysis proved its robustness against a single fault in the internal node when used in TMR architectures. Our investigation presents the worst results among the investigated topologies. This statement shows the importance of revisiting the singlebit voter designs when used in ATMR architectures.

## 3 Software-Based Approximate Voters Under Heavy Ion Irradiation and EMI Injection

The second case study performed in this work was carried out to analyze the reliability of software-based coarse-grain approximate voters in system level applications under real environmental threads (EMI and heavy ions). Differently from the first case study, this experiment considers the situation in which the designer does not have the choice to implement a specific hardware voter, as the ones studied in the first part of this work. Therefore, this study considers a higher abstraction level, with coarse grain diversity TMR, and presents additional insights to system level integrators of critical systems.

#### 3.1 Case Study System

The simplified block diagram of the case study circuit is depicted in Fig. 2. It is a redundant Data Acquisition System (DAS), comprising three 8-bit Analog-to-Digital Converters (ADCs) operating in parallel: two Successive Approximation Register (SAR) converters and a  $\Sigma\Delta$  converter. With two distinct converter architectures, hardware diversity is achieved, and temporal diversity is implemented by using different sampling rates of the SAR ADCs (740 *ksps* and 74 *ksps* (*kilo samples per second*)). The system is implemented in a programmable SoC (PSoC 5, from Cypress Semiconductor), manufactured in a 130nm CMOS technology, which also comprises a 32-bit ARM Cortex-M3 CPU, besides several analog and digital programmable resources.

The system also comprises two software-based approximate voters: one main spatial voter and a temporal voter (SAR ADC voter). This voter also performs a coarse synchronization, needed due to the different sampling frequencies of the ADCs. A fine synchronizer is also used, to cope with the different latencies of each diverse ADC module, as shown in Fig. 3, which also shows additional implementation details. To store the converted and voted data (into the PSoC SRAM), 5 DMA channels are used as circular buffers, allowing to perform a constant monitoring of all signals during the experiments. The buffers content is sent to an external computer (by means of an UART-RS232 interface) whenever a fault is detected by the voters [10]. Then, the system is reset, reprogramming it, in a way that the error is corrected. The size of each buffer was defined in order to store two complete cycles of the test signal (one before and another after the error detection). With this scheme, by postprocessing the experiment data, we are able to check all the signals, identifying if the error was originated at the ADCs or if it is due to an erroneous voting.

A signal generator block (composed of an 8-bit DAC) was programmed into the PSoC, generating a 120 Hz triangular wave, swinging among the full scale limits of the converters (0 to 2 V), in order to serve as input test signal of all ADCs.



Fig. 2 Diversity TMR data acquisition system architecture



Fig. 3 Details of the full implementation of the DAS in the PSoC device with internal test signal

Fig. 3 shows the overall block diagram of the system implemented in the Device Under Test (DUT).

In order to cope with hangs in the overall system an Auxiliary Equipment (AE), programed in a secondary board, is used as a watchdog, which monitors an "alive" signal sent by the DUT. If this signal remains inactive for more than 30 seconds, this AE resets the DUT.

Majority voting presents a challenge when applied to mixed-signal or approximate TMR systems due to the intrinsic difference of output results of each module, which is the case of the system under study, since the AD conversion presents intrinsic linearity errors. In the digital domain, the implementation of a majority voting can be done by using bit-by-bit voting or word voting, each with proper advantages and drawbacks [15, 17]. The first version of the SAR temporal voter was implemented using the bit-by-bit voting technique, which is suitable to voters with many inputs [15]. This specific voter has 9 inputs. The faster SAR ADC generates 10 samples while one is produced by the others, within a main voting cycle. One sample is discarded to have an odd number of inputs, avoiding a possible tie.

As illustrated in Fig. 4a, the bit-by-bit voter consists on a counter and a decision element for each bit position, as well as a single word assembler. For each bit position the number of ones among all words is counted, and this information is taken to the decision element, that votes upon each bit position. The word assembler generates the output word, based on the voting results. The part of the software implementing this voter is depicted in Fig. 5a.

The main voter of this case study system is based on word voting principle [17], performing mutual subtractions between the signals, generating three error signals that allow to select the correct one to be the current system output. A tolerance window is considered for the error signal, in order to cope with the approximate results of each module. In this experiment the tolerance window is 5 (decimal), representing near 2% of the full scale for an 8-bit converter and is able to cope with the non-idealities and divergences between the modules' outcomes. The architecture of the main voter is depicted in Fig. 4b and the part of the code which implements this voter is shown in Fig. 5b.



Fig. 4 Architecture of the studied voters: a main word voter; b bit-by-bit temporal voter (first version) and c second version of the temporal voter with a cascade of word voters identical to (a)

The second version of the SAR temporal voter was built with a cascade of word voters (V1 to V4), identical to the main voter, as depicted in Fig. 4c.

## 3.2 Heavy Ion Irradiation Setup

Two irradiation experiments were performed on the redundant DAS. The Device Under Test (DUT) was programmed with the above described system, considering both temporal voter versions (one in each experiment). The irradiation was performed at the Laboratório Aberto de Física Nuclear at the Universidade de São Paulo (LAFN-USP), Brazil [1], with ion beams produced and accelerated by the São Paulo 8UD Pelletron Accelerator, generating a  $^{16}O$  ion beam with 22  $\mu$ m penetration into the silicon. The heavy-ion beam was accelerated up to 36 MeV and reduced its intensity to hundred particles/s/cm<sup>2</sup> (as recommended by the European Space Agency (ESA) for SEU tests [12]) using magnetic defocusing techniques and two thin gold foils (near 1 mg/ cm<sup>2</sup>) in the SAFIIRA system [2]. This new system was built to study heavy-ion beam effects in electronic components. SAFIIRA is a Portuguese acronym that stands for Ion-beam Application and Irradiation System.

The DUT (with top package removed) was irradiated at  $0^{\circ}$  angle, producing an effective LET at the active region of 5.5 MeV/mg/cm<sup>2</sup>. The effective LET was estimated using SRIM software [25] considering the beam energy and the device passivation layer. The average flux of each experiment was 350 and 430 particles/s/cm<sup>2</sup>, during 246 and 288 min, respectively. The total fluence to which each voter was submitted is shown in Table 3, which also summarizes the results.

The overall experiment setup is described in Fig. 6, along with a picture of the DUT inside the vacuum chamber. The auxiliary equipment acts as a watchdog, monitoring the DUT activity and resetting it if a hang is identified. The Control and remote computer are used for controlling and data storage. A triangular signal ranging between the full scale limits of the converter was generated by an internal DAC (Digital-to-Analog Converter) of the PSoC to be applied to the redundant DAS input.

## 3.3 Results of Heavy Ion Experiments

In this analysis the errors originated in the voters are computed and compared, considering the different voter architectures. While the main word voter showed no error in both experiments (even experiencing the higher total fluence), 4 errors were recorded at the temporal voter built with 4 word voters (Fig. 4c), while 18 errors were observed in the bit-by-bit version (Fig. 4a), as shown in Table 3. The uncertainties were calculated considering a Poisson distribution of the observed errors. This table also shows the calculated dynamic cross section for each voter (number of errors divided by fluence), the fluence (integral of flux over time), the experiment time, and the time each voter takes to perform the voting task.

Comparing both versions of the same voter (temporal voter), the bit-by-bit version is less reliable (higher cross-section). This is due to the higher voting time and complexity, because it uses several loops and iterations, and more memory resources, as can be seen in Figs. 5 and 7) shows an example of error observed in the SAR ADC temporal voter (bit-by-bit version) even when no error occurred at the 9 voted samples originated by the SAR @740 ksps module.

Besides employing different voting architectures, efficiency of coding may also impact voter reliability. Taking the bit-by-bit voter of this investigation as an example, if another code is implemented, in such a way that the number of CPU and memory operations is reduced (keeping the same functional behavior), the voting time, and consequently, the voter dynamic cross section, are also expected to reduce.

Finaly, it is important to mention that no errors were observed in multiple modules of the DTMR or both voters



#### **(a)**

void main\_voter

{

}

error1 = abs (SAR\_ADC\_voter - Module\_2[1]); error2 = abs (Module\_2[1] - Module\_3[1]); error3 = abs (Module\_3[1] - SAR\_ADC\_voter);

```
if (error1 <= 5)
    system_output = SAR_ADC_voter;
else if (error2 <= 5)
    system_output = Module_2[1];
else if (error3 <= 5)
    system_output = Module_3[1];
else
    system_output = SAR_ADC_voter;</pre>
```

```
if ((error1 > 5) || (error2 > 5) || (error3 > 5))
    main_voter_error_det = 1;
```

**(b)** 

```
Fig. 5 Software code of the bit-by-bit temporal voter for the oversampled SAR ADC \left(a\right) and code of the main word voter \left(b\right)
```

at the same time. Therefore, as no error occurred at the main system voter, 100% of the observed errors, including those occurring in the temporal voter, were tolerated by the proposed system.

# 3.4 Testing Temporal Voters Under Conducted EMI Injection

The redundant scheme was also tested under Electromagnetic Interference, considering the bit-by-bit voter and the cascade of word voters used as temporal voting element. For this purpose, the SAR converter operating @740 ksps was





Fig. 6 Experimental setup block diagram (a) and picture of the DUT into the vacuum chamber of Pelletron accelerator (b)

the target of the EMI injection campaign, in order to evaluate the voters ability to cope with multiple erroneous signal samples coming from this converter. The test was based on the application of the standard IEC 62132-4 [14], which describes a method to measure the immunity of integrated circuits in the presence of conducted RF disturbances (which may be originated from radiated RF disturbances).

The Direct Power Injection (DPI) experiment was performed using an Agilent N9310A RF signal generator. The test signal consists in an AM modulated signal (80% modulation index), whit the envelope signal set at 1 KHz, and the carrier ranging, initially, from 1 MHz to 1 GHz. In this first phase, it was identified a higher vulnerability of the system in the frequency band from 1 MHz to 10 MHz. For this reason, the actual experiment, in which the voters were compared, was done considering this frequency band. The prospecting tests also pointed that the most vulnerable system port to RF disturbances is the voltage Reference ( $V_{REF}$ ) pin of the ADCs, so the test was also focused on this pin as injecting point. For this experiment no bypass capacitor was connected to this pin. The power of the signal was set as 17dBm as recommended by the IEC 62132-4 standard, for microprocessors and memory ICs [14]. Figure 8 show the DPI test setup.

 Table 3 Experiment details and results for each tested voter

|                                        | Bit-by-Bit Voter     | Cascade of word voters | Main Voter          |
|----------------------------------------|----------------------|------------------------|---------------------|
| Errors                                 | $(18 \pm 4)$         | $(4 \pm 2)$            | Zero                |
| Voting time                            | 95 <i>µ</i> s        | 11.75 <i>µ</i> s       | 3.75 <i>pmu</i> s   |
| <b>Cross Section</b> (× $e - 06cm^2$ ) | $(5.3 \pm 1.2)$      | $(0.70 \pm 0.31)$      |                     |
| Fluence (particles/cm <sup>2</sup> )   | $5.08 \times 10^{6}$ | $7.32 \times 10^{6}$   | $12.40\times10^{6}$ |
| Irrad. time                            | 246 min              | 288 min                | 534 min             |



Fig. 7 Error at the temporal voter causing a semi-permanent malfunction



Fig. 8 DPI test setup



Fig. 9 Code histogram of SAR @740 ksps converter

 Table 4
 Standard deviation of code (in decimal) values at the output of the SAR ADC and voters

| Block            | Code Std. Dev. |
|------------------|----------------|
| SAR@740          | 31.4           |
| Bit-by-bit Voter | 37.1           |
| Casc. Word Voter | 18.2           |

The total experiment time was 60 minutes, varying the frequency each 6 minutes, in steps of 1 MHz. At the the analog input of the system a fixed DC signal of 1.024 V (mid-scale) was applied, which would produce the value 128 in decimal representation. The output of the SAR @740 ksps converter, as well the voter output were analyzed, considering both architectures of the temporal voter.

#### 3.5 Results of EMI Injection

The injected interference at  $V_{REF}$  pin of the ADC causes the converted words to deviate from the expected one. Figure 9 shows the histogram of code occurrence at the SAR converter output, considering the 9 samples generated at each main voting cycle of the system. With no EMI injected, in an ideal case, all the samples would be expected to fall within the mid-scale bin (128). It can be seen in Fig. 9 a significant spreading of codes around the expected one when injecting the EMI signal.

It can be seen in Fig. 10 that the bit-by-bit voter worsen this code spreading. The reason for that is the ability of EMI to modify multiple samples within the nine that are voted by this temporal voter. This can lead to the majority of the bits in a given position to be voted to an erroneous value. Though this event being more likely to occur between the least significant bits (due to the intensity of the disturbing signal) the overall deviation may be larger when multiple bits are affected. A code spreading is also observed when the cascade of word



Fig. 10 Code histogram after voting with the bit-by-bit architecture as temporal voter



Fig. 11 Code histogram after voting with the cascade of word voters architecture

voters is used as temporal voter, as it can be seen in Fig. 11. However the deviation from the expected code is smaller than the deviation observed for the bit-by-The reason for this is probably the temporal correlation of the samples, since they are voted in blocks of three adjacent samples in time, reducing the probability of double errors in each individual 3-to-1 word voter.

Table 4 shows the standard deviation of the code values for each considered block, confirming the trend observed in Figs. 9 to 11. While the bit-by-bit voter increases the standard deviation of the voted signal, the deviation is reduced from 31.4 (12% of the full scale) to 18.5 (7% of the full scale) when using the cascaded word voters.

## 4 Conclusion

This work presents relevant aspects to the analysis of voters when applied to approximate computing. The single-bit voter analysis highlights the importance of evaluating all possible input vectors since the obtained behavior does not follow the traditional TMR approach. Fourteen different topologies were evaluated regarding sensitive areas, and the threshold LET for a 32nm technology. This analysis is important to help designers implement ATMR systems at the logical level with singlebit hardware voters. Compared to previous works exploring traditional TMR circuits, voter choice is different when ATMR architecture's peculiarities are considered. The significance of this revision could be observed in the difference of 55% between the best and worst choices considering sensitive areas and the two times difference in threshold LET.

Additionally, in a second case study, the reliability of voters of a fault tolerant data acquisition system programmed into a mixed-signal SoC was investigated. Heavy ions irradiation campaigns have shown that different approximate softwarebased voting schemes present distinct reliability, with voting time and algorithm complexity being responsible for this behavior. The voting time depends on the number of inputs and algorithm complexity and impacts the voter reliability. The higher is this time, the higher is the probability of this task being affected by an ion impact. The approximate softwarebased word voter tested in the heavy ion campaign showed to be more reliable than the bit-by-bit voter counterpart, with a significant difference in the obtained cross-sections.

When considering the application of the investigated coarse-grained voters in an environment susceptible to electromagnetic interference, the word voter is also more reliable. An EMI experiment was performed, injecting interference in the  $V_{REF}$  pin of the ADC, affecting multiple converted words. Though the EMI does not directly affect the voter functioning, it causes multiple inputs to present wrong values, which may lead the voter to take a wrong decision. Our experiments have shown that the bit-by-bit voting may gather multiple bit errors of different input words in the same word assembled by the voter, worsening the error in the final voted value. This way, the bit-by-bit architecture must be avoided, or considered carefully, in EMI prone applications. On the other hand, the 9-input temporal word voter is able to reduce the error magnitude on the converted signal.

Acknowledgements This research is financed in part by Fundação de Amparo a Pesquisa do Rio Grande do Sul (FAPERGS, Brazil), by Conselho Nacional de Desenvolvimento Científico e Tecnológico -(CNPq, Brazil), and by the Coord. de Aperfeiçoamento de Pessoal de Nível Superior - (CAPES, Brazil) - Finance Code 001.

**Data Availability** The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

#### Declarations

**Competing Interests** The authors have no competing interests to declare that are relevant to the content of this article.

# References

- Aguiar V, Added N, Medina N, Macchione E, Tabacniks M, Aguirre F, Silveira M, Santos R, Seixas L (2014) Experimental setup for single event effects at the são paulo 8ud pelletron accelerator. Nucl Instrum Methods Phys Res, Sect B 332:397–400
- Aguiar VAP, Medina NH, Added N, Macchione ELA, Alberton SG, Leite AR, Aguirre FR, Ribas RV, Perego CC, Fagundes LM, Terassi JC, Brage JAP, Simões RF, Morais OB, Almeida EA, Joaquim PM, Souza MS, Cecotte AFM, Martins R, Duarte JG, Scarduelli VB, Allegro PRP, Escudeiro R, Leistenschneider E, Oliveira RAN, Servelo WA, Silva MT, Sarmento VE, Carreira CA, Abreu JC, Silva SC, Santos HC, Rodrigues CL, Assis RF, Silva TF, Tabacniks MH, Joaquim AS, Minas JHP, Kashinsky D, Guazzelli MA, Seixas LE, Finco S, Benevenutti F (2020) SAFIIRA: A heavy-ion multi-purpose irradiation facility in brazil. Rev Sci Instrum 91:053301
- Aguiar Y, Wrobel F, Autran J-L, Leroux P, Saigné F, Pouget V, Touboul A (2020) Design exploration of majority voter architectures based on the signal probability for TMR strategy

optimization in space applications. Microelectronics Reliability 114:113877

- Amaru L, Gaillardon P-E, Micheli GD (2016) Majority-inverter graph: A new paradigm for logic optimization. IEEE Trans Comput Aided Des Integr Circuits Syst 35:806–819
- 5. Arifeen T, Hassan A, Lee J-A (2019) A fault tolerant voter for approximate triple modular redundancy. Electronics 8:332
- 6. Balasubramanian P, Prasad K (2016) A fault tolerance improved majority voter for TMR system architectures
- Balen TR, Gonzalez CJ, Oliveira IFV, Schvittz RB, Added N, Macchione ELA, Aguiar VAP, Guazzelli MA, Medina NH, Butzen PF (2021) Reliability evaluation of voters for fault tolerant approximate systems. In Proc 2021 IEEE 22nd Latin American Test Symposium (LATS) IEEE
- Ban T, de Barros Naviner LA (2010) A simple fault-tolerant digital voter circuit in TMR nanoarchitectures. in Proc. 8th IEEE International NEWCAS Conference 2010. IEEE
- Benites LAC, Kastensmidt FL (2018) Automated design flow for applying triple modular redundancy (TMR) in complex digital circuits. in Proc 2018 IEEE 19th Latin-American Test Symposium (LATS). IEEE
- Chenet CP, Tambara LA, de Borges GM, Kastensmidt F, Lubaszewski MS, Balen TR (2015) Exploring design diversity redundancy to improve resilience in mixed-signal systems. Microelectron Reliab 55:2833–2844
- de Aguiar Y, Zimpeck A, Meinhardt C, Reis R (2016) Permanent and single event transient faults reliability evaluation eda tool. Microelectronics Reliability 64:63–67. Proceedings of the 27th European Symposium on Reliability of Electron Devices, Failure Physics and Analysis
- 12. European Space Agency (2014) Single Event Effects Test Method and Guidelines, Document No. 25100, ESA/ESCC
- Gonzalez CJ, Chenet CP, Budelon M, Vaz RG, Goncalez O, Balen TR (2017) Evaluation of a mixed-signal design diversity system under radiation effects. In Proc 2017 18th IEEE Latin American Test Symposium (LATS). IEEE
- International Electrotechnical Commission (2006) IEC 62132-4:2006, Integrated circuits - Measurement of electromagnetic immunity, 150 kHz to 1 GHz - Part 4: Direct RF power injection method
- 15. Lorczak P, Caglayan A, Eckhardt D (1989) A theoretical investigation of generalized voters for redundant systems. In Proc The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers. IEEE Comput Soc Press
- Messenger GC (1982) Collection of charge on junction nodes from ion tracks. IEEE Trans Nucl Sci 29:2024–2031
- Mitra S, McCluskey E (2000) Word-voter: a new voter design for triple modular redundant systems. In Proc 18th IEEE VLSI Test Symposium. IEEE Comput Soc
- Mitra S, Saxena N, McCluskey E (1999) A design diversity metric and reliability analysis for redundant systems. In Proc International Test Conference. (IEEE Cat No.99CH37034) Int Test Conference
- Nicolaidis M (ed) (2011) Soft Errors in Modern Electronic Systems. Springer, US
- Oliveira I, Schvittz R, Butzen P (2019) Single event transient sensitivity analysis of different 32 nm cmos majority voters designs. Microelectron Reliab 100–101:113369
- Oliveira IFV, Pontes MF, Schvittz RB, Rosa LS, Butzen PF, Soares RI (2022) Fault tolerance evaluation of different majority voter designs. In Proc 2022 IEEE International Symposium on Circuits and Systems (ISCAS) IEEE
- Rahman MH (2017) A fault tolerant voter circuit for triple modular redundant system. J Electron Electr Eng 5(5):156
- 23. Rodrigues G, Fonseca J, Kastensmidt F, Pouget V, Bosio A, Hamdioui S (2019) Approximate TMR based on successive

approximation and loop perforation in microprocessors. Microelectron Reliab 100–101:113385

- Wang F, Agrawal VD (2008) Single event upset: An embedded tutorial. In Proc 21st International Conference on VLSI Design (VLSID 2008). IEEE
- 25. Ziegler JF, Ziegler M, Biersack J (2010) Srim the stopping and range of ions in matter (2010). Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms 268(11):1818–1823. 19th International Conference on Ion Beam Analysis

**Publisher's Note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

**Tiago R. Balen** received his Electrical Engineering degree, MSc and PhD degrees from Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil, in 2004, 2006, and 2010, respectively. His research interests include analog and mixed-signal test, built-in-self test, programmable analog devices, fault tolerant circuits and radiation effects on electronic systems. He has published more than 70 papers on these topics in important conferences and journals, receiving three "Best Paper Awards" for his contributions. Currently, he is associate professor at the Electrical Engineering department and head of the graduate program on Microelectronics (PGMICRO) of the Federal University of Rio Grande do Sul (UFRGS).

**Carlos J. González** is a Ph.D student at the graduate program on Microelectronics (PGMICRO) of the Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil. He received his Electronic Engineering degree from Corporación Unificada Nacional de Educación Superior (CUN), Bogotá, Colombia in 2011. He received the MSc degree in Microelectronics from Federal University of Rio Grande do Sul (UFRGS), in 2018. His research interests include analog and mixed-signal test, programmable analog devices, fault tolerant mitigation and radiation effects on electronic devices.

**Ingrid F.V. Oliveira** is a Ph.D. candidate in the Computer Science Graduate Program at the Federal University of Pelotas (UFPEL). She has a bachelors and masters degree in Computer Engineering from the Federal University of Rio Grande (FURG) in 2017 and 2020, respectively. Currently works in the fault tolerance area with a focus on evaluating the robustness of combinational circuits in the presence SET-type radiation faults.

Leomar S. da Rosa Jr holds a Ph.D. in Microelectronics from the Federal University of Rio Grande do Sul (2008) with a sandwich period at the University of Minnesota - USA (2005-2006), M.Sc. in Computer Science from the Federal University of Rio Grande do Sul (2004) and BS in Computer Science from the Federal University of Pelotas (2001). He is currently an Associate Professor at the Federal University of Pelotas and a supervisor in the Graduate Computer Science Program at the same university. Leader of the Research Group on Architectures and Integrated Circuits (GACI). Member of the Brazilian Society of Microelectronics (SBMicro), the Brazilian Computer Society (SBC), the Association for Computing Machinery (ACM), and the IEEE Computer Society. He has experience in computing and microelectronics, working mainly on the following topics: logic synthesis and CAD tools for integrated circuits. **Rafael I. Soares** is a Professor at the Federal University of Pelotas (UFPel). He completed his Ph.D. in Computer Science at the Pontifical Catholic University of Rio Grande do Sul (PUCRS) in 2010. He did his PhD sandwich at Laboratoire dInformatique, Robotique e Microélectronique de Montpellier (LIRMM) in France from 2007-2008. He holds a masters degree in Computer Science from PUCRS and a degree in Computer Engineering from Universidade Federal de Rio Grande (FURG) in 2004. He has experience in Computer Science, with emphasis on hardware, working mainly on the following topics: digital systems, FPGAs, rapid prototyping of digital systems, dynamic reconfiguration, reconfigurable architectures, non-synchronous circuit design, cryptography, Side-Channel Attacks (SCAs), and countermeasures to SCAs.

**Rafael B. Schvittz** is a Professor at the Center for Computational Sciences at the Federal University of Rio Grande (FURG). He holds BS and MS in Computer Engineering from the Federal University of Rio Grande in 2014 and 2017, respectively, and a Ph.D. in Computer Science from the Federal University of Pelotas in 2020. He works mainly in circuit evaluation at the logic gate level. Experience in power consumption evaluation (static and dynamic), aging effects (BTI), and types of failures in logic gates (permanent and transient). Currently works in circuit reliability with a focus on evaluating the susceptibility of logic gates in the presence of SET-type radiation faults.

**Nemitala Added** holds a Bachelors degree in Physics from the University of São Paulo (1981), a Masters degree in Physics from the University of São Paulo (1987) and a PhD in Physics from the University of São Paulo (1991). He is currently professor at the University of São Paulo. He has experience in Physics, with emphasis on Applied Nuclear Physics, working mainly on the following subjects: instrumentation, radiation effects, fusion and elastic scattering.

Eduardo L. A. Macchione holds a degree in Physics from the University of São Paulo (1984), a Masters degree in Physics from the University of São Paulo (1990) and a PhD in Physics from the University of São Paulo (1998). He is currently a physicist at the University of São Paulo. Has experience in Experimental Physics, with emphasis on Nuclear Instrumentation and time-of-flight mass spectrometry techniques.

Vitor A. P. Aguiar holds a PhD degree in Physics at the University of São Paulo, Brazil. He has experience with high resolution gamma spectrometry applied to studies of natural radiation with emphasis on dosimetry and geochronology. He also works with radiation effects on materials using heavy ion beams, focusing on the development of systems and instrumentation for these studies, phenomenology and semiconductor materials. He is currently a postdoctoral fellow at the University of São Paulo, where he works on the phenomenology of charge collection in semiconductors under heavy ion incidence.

Marcilei A. Guazzelli received the B.S., M.S. and Ph.D degrees in Physics from São Paulo University, São Paulo, Brazil, in 1994, 1999, and 2004, respectively. Since 2017 she is Full Professor of the Department of Physics at the Centro Universitário da FEI, São Bernardo do Campo, Brazil. She has coordinated projects related to Radiation Physics and develops research in Basic Nuclear Physics, Solid State Physics, Applied Nuclear Physics and Materials Characterization.

Nilberto H. Medina received the B.S., M.S. and Ph.D degrees in Physics from São Paulo University, São Paulo, Brazil, in 1984, 1988, and 1992, respectively. He had a postdoc position at the "Istituto Nazionale di Fisica Nucleare" (INFN), Sezione di Padova, Italy, from 1993 to 1995. Since 2011 he is Associate Professor in the Physics Institute of the São Paulo University, Brazil. His research interest include gamma–ray spectroscopy, nuclear structure, high-spin states, nuclear reactions, natural radioactivity, stopping power, radiation effects on electronic devices, and nuclear instrumentation.

**Paulo F. Butzen** is a Professor in the Department of Electrical Engineering at the Federal University of Rio Grande do Sul (UFRGS). He holds a bachelors degree in Computer Engineering (2004), a masters degree in Computer Science (2007), and a Ph.D. in Microelectronics (2012) from UFRGS. Postdoctoral fellow at École Nationale Supérieure des Télécommunications (Télécom ParisTech) Paris, France (2014-2015). He was a visiting researcher at the University of Minnesota from April to December 2006. Worked at Nangate do Brasil from March 2007 to February 2008, leading the cell library project group. Professor at the Center for Computational Sciences of the Federal University of Rio Grande (FURG) from October 2010 to July 2019. He was the Digital and Embedded Systems Group Coordinator at FURG from 2012 to 2019. Areas of interest are the design of digital integrated circuits and the development of CAD tools for microelectronics.