Coffee Break - Scientific Posters 1
Description
Affiliation: Delft University of Technology (NL)
Affiliations: 1 STMicroelectronics (FR), 2 University Grenoble Alpes,CNRS (FR)
As digital systems are continuously becoming more complex, new methods are required to ensure their resilience. Research and industry are working together to develop automated formal methods, and, recently, great progress has been made to overcome this challenge. This work describes a general procedure to quantitatively determine, by Model Checking, the resilience level of a digital block whose flip-flops are perturbed by bit-flips. The flow relies on the formal proof, and provides a rich variety of results with much improved performance and accuracy (boost of ∼ 300x and ∼ 30x in the two test cases). The resilience metric is the number of distinct counterexamples provided by the formal engine, for each fault target. Failure traces are differentiated in two ways, showing on the test cases the great enhancement over simulation.
Analyzing the Structural and Operational Impact of Faults in Floating-Point and Posit Arithmetic Cores for CNN Operations
Affiliation: Politecnico di Torino (IT)
This work reports a first attempt to evaluate the fine-grain impact of permanent faults in the structures of arithmetic hardware cores implementing two number formats (Posit and FP). We assess and analyze errors in the cores for two operations (Add, and Multiply), which are the most used in several modern applications, including machine learning. The results show that Posit cores are structurally more vulnerable to fault propagation and induce more output corruptions than FP cores (from 3.3% up to 6.2%). Moreover, we found that the average absolute error in faulty FP cores is higher by up to 2 orders of magnitude than in Posit ones.
Affiliations: 1 Jerusalem College of Technology (IL), 2 Bar-Ilan University (IL)
Bus encoding is a technique for decreasing the power consumption of a chip by reducing the number of bit transitions during data transmission over a bus or during memory write operations. This paper introduces a structured technique for hardening bus-encoders to enable single error correction (SEC) while maintaining power awareness. The method is based on expurgating the Hamming code in a specific manner.
Affiliations: 1 Siemens Digital Industries Software (US), 2 Karlsruhe Institute of Technology (DE), 3 Siemens Digital Industries Software (DE)
Magnetoresistive random access memory (MRAM) is an attractive option to replace eFlash. The recent demonstration of a nano-second write speed and a 10e14 endurance are compelling performances even as an embedded MRAM for cache replacement. Both eFlash and cache applications often use large array sizes, which require tight defect control. The unique defects in MRAMs that are not easily detectable with traditional memory test algorithms can potentially cause test escapes. Test escapes will not only delay the manufacturing process but also cause reliability issues, which is fatal for safety-critical applications such as automotive. This paper presents effective ways of screening hard-to-find defects related to oxide surface quality. The devices with minor oxide degradation have properties in the grey zone, which spec out some of the properties, although they pass the functional test. We introduce a new test method to screen those spec out cells using read reference trimming.
Affiliations: 1 National Central University (TW), 2 National Cheng Kung University (TW)
As semiconductor processes advance, circuit aging becomes prominent. One of the most severe aging effects is Negative Bias Temperature Instability (NBTI), which increases the threshold voltage and the propagation delay of PMOS transistors. To mitigate NBTI, aging mitigation methods such as Internal Node Control (INC) and Input Vector Control (IVC) have been proposed. INC applies designed logic gates, while IVC uses appropriate input patterns during circuit idle. However, INC leads to extra area overhead and power consumption, and the circuit structure limits the controllability of IVC. Although various approaches have proposed aging tolerance methods with INC or IVC, only a few of them consider co-optimization. In this paper, we introduce a GNN-based INC and IVC co-optimization framework to minimize aging-induced delay. The key concept of our framework is using GNN to identify serious-aged gates in a circuit, and then using INC and IVC to mitigate the aging effect under an area overhead constraint. The experimental results indicate that our method reduces aging-induced delay and area by 2.16 times and 29.5%, respectively, compared to previous work.
Affiliations: 1 Bosch Corporate Research,Robert Bosch GmbH (DE), 2 Forschungszentrum Ju ̈lich GmbH (DE), 3 Newcastle University (UK), 4 RPTU Kaiserslautern-Landau (DE)
In-Memory Computing (IMC) introduces a new paradigm of computation that offers high efficiency in terms of latency and power consumption for AI accelerators. However, the non-idealities and defects of emerging technologies used in advanced IMC can severely degrade the accuracy of inferred Neural Networks (NN) and lead to malfunctions in safety-critical applications. In this paper, we investigate an architectural-level mitigation technique based on the coordinated action of multiple checksum codes, to detect and correct errors at run-time. This implementation demonstrates higher efficiency in recovering accuracy across different AI algorithms and technologies compared to more traditional methods such as Triple Modular Redundancy (TMR). The results show that several configurations of our implementation recover more than 91% of the original accuracy with less than half of the area required by TMR and less than 40% of latency overhead.
Affiliation: National Central University (TW)
Spin-transfer-torque magnetic random access memory (STT-MRAM) is a candidate for next-generation memory to cope with scaling challenges of conventional memories. However, the STT-MRAM has a small on/off resistance ratio which poses challenges in designing the reference resistance. An effective approach to cope with this issue is to design a trimmable reference resistance that allows post-production control of the reference resistance. A trimming test should be used in the production test to find an appropriate reference resistance. In this paper, a parallel-check trimming test (PCTT) approach aimed at significantly reducing the trimming test time is proposed. In comparison with the existing binary-judge-based search test approach, the proposed PCTT approach drastically reduces trimming test time with nearly the same read yield.
Affiliations: 1 Siemens EDA (DK), 2 Infineon Technologies (DE), 3 Infineon Technologies (US)
This paper introduces a concept of provably detected defects that are identified through topology analysis and AMS verification database inquiry. The main purpose is to associate detected status to such defect without simulation, thereby reducing the size of defect simulation campaign. The proposed approach is a combination of fundamental and empirical rule-based set targeting relevant steps that are part of regular design sign-off. The feasibility is illustrated on an industrial product that is currently undergoing final stages of the tape-out, with the results complying to the current status of upcoming IEEE P2427 standard in development.
Affiliations: 1 University of Bremen (DE), 2 DFKI (DE), 3 Siemens Electronic Design Automation GmbH (DE)
IEEE Std. 1687 (IJTAG) introduces reconfigurable scan networks that implement an effective test access in highly complex designs. Designing an optimized network, that provides access to the instruments, meets the non-functional constraints, and preserves a minimized routing effort, area overhead and test access time, forms a non-trivial optimization problem. This paper tackles the IJTAG network topology design challenge by proposing an evolutionary approach to synthesize reconfigurable scan networks with optimized routing and area overhead while minimizing the overall test time.
Affiliations: 1 Tallinn University of Technology (EE), 2 University of Zanjan (IR), 3 Malardalen University (SE)
Multiplication is the most resource-hungry operation in the neural network’s processing elements. In this paper, we propose an architecture of a novel adaptive fault-tolerant approximate multiplier tailored for ASIC-based DNN accelerators. AdAM employs an adaptive adder relying on an unconventional use of the leading one position value of the inputs for fault detection through the optimization of unutilized adder resources. The proposed architecture uses a lightweight fault mitigation technique that sets the detected faulty bits to zero. The hardware resource utilization and the DNN accelerator’s reliability metrics are used to compare the proposed solution against the triple modular redundancy (TMR) in multiplication, unprotected exact multiplication, and unprotected approximate multiplication. It is demonstrated that the proposed architecture enables a multiplication with a reliability level close to the multipliers protected by TMR utilizing 63.54% less area and having 39.06% lower power-delay product compared to the exact multiplier.
Training Large Language Models for System-Level Test Program Generation Targeting Non-functional Properties
Affiliations: 1 University of Stuttgart (DE), 2 Technical University of Munich (DE), 3 Advantest Europe (DE)
System-Level Test (SLT) has been an integral part of integrated circuit test flows for over a decade and continues to be significant. Nevertheless, there is a lack of systematic approaches for generating test programs, specifically focusing on the non-functional aspects of the Device under Test (DUT). Currently, test engineers manually create test suites using commercially available software to simulate the end-user environment of the DUT. This process is challenging and laborious and does not assure adequate control over non-functional properties. This paper proposes to use Large Language Models (LLMs) for SLT program generation. We use a pre-trained LLM and fine-tune it to generate test programs that optimize non-functional properties of the DUT, e.g., instructions per cycle. Therefore, we use Gem5, a microarchitectural simulator, in conjunction with Reinforcement Learning-based training. Finally, we write a prompt to generate C code snippets that maximize the instructions per cycle of the given architecture. In addition, we apply hyperparameter optimization to achieve the best possible results in inference.
Affiliations: 1 Institute of Computing Technology CAS (CN), 2 University of Chinese Academy of Sciences (CN), 3 Binary Semiconductor (CN)
Elliptic curve cryptography (ECC) is widely used in public key encryption, but its high-speed deployment faces challenges due to algorithmic and arithmetic complexity. In this paper, we present a high-performance ECC processor for the elliptic curve point multiplication (ECPM) of NIST P-256. Our approach employs a fully pipelined architecture featuring a 7-stage, 256-bit multiplier operating at a high frequency. To manage the data flow of the ECPM operation process, we devise a controller equipped with configurable instructions, which provides ECPM operations with higher flexibility to meet diverse contextual requirements. Additionally, we introduce a compact pipeline schedule to reduce ECPM computation clock cycles. The proposed LUT-based design achieves ECPM computation in 0.039 ms on FPGA (Virtex-7 platform) and 0.037 ms on ASIC (90nm technology), requiring only 10712 clock cycles.