2.1. The 11T DAM SRAM Cell
As shown in
Figure 3, in the proposed DAM SRAM cell design, the 5T aXNOR structure achieves physical separation between the computation and data storage, while maintaining logical synergy. The inclusion of computation transistors generates analog computation currents, denoted as unit current IMAC, which do not engage in any direct signal interaction or electrical pathway interference with the original data stored in the conventional 6T SRAM cell. During high-speed computational cycles, there is no electrical coupling effect between the computation transistors executing the computational tasks and the storage transistors holding the data; this ensures that fluctuations in the current generated by the computation transistors cannot propagate through word lines or bit lines into the storage region. Consequently, this design effectively prevents unintended state flips from occurring in the data in neighboring storage cells due to transient voltage variations on the word or bit lines, thereby fundamentally avoiding the appearance of unnecessary data write errors caused by potential interference factors during word line read operations.
Under normal operations, the 11T SRAM cell benefits from its unique decoupled design of storage word lines and computation word lines, enabling dual-mode functionality: the standard SRAM storage mode and computation mode. In the traditional SRAM storage mode, the 11T SRAM cell retains the fundamental storage and retrieval characteristics of a 6T SRAM unit, storing critical weight information W in a binary form for neural networks or other data-intensive applications. This ensures that these weights can be accessed and updated rapidly and reliably in response to read requests from a host processor or associated computational components.
In the computation mode, arrays of 11T SRAM units enable large-scale parallel computing capabilities to be achieved. Via the integrated aXNOR computation transistor structure, they perform dynamic analog Multiply–Accumulate (MAC) operations on external input signals X and internally stored weight data W. When receiving the real-time varying external input signal X, each 11T SRAM cell immediately performs parallel computation against its stored weight W; this fuses the storage and computation functions to generate a unit current output that is directly proportional to the MAC operation on the pair (X, W). Before delving into a detailed analysis of the MAC current generation, let us discuss the re-encoding logic applied to Feature port input signals. The proposed analog multiplication design redefines the binary representation of the complementary signal pair {X, XB}: when {X, XB} takes the value {1, 0}, the Feature signal represents a positive attribute with a Feature value of +1; conversely, when {X, XB} is {0, 1}, the Feature value is defined as −1. Similarly, the Weight storage values in the DAM SRAM cell adopt a comparable binary redefinition strategy, where the complementary signal pair {Q, QB} yields a Weight value of +1 when {Q, QB} is {1, 0}, and a weight value of −1 when {Q, QB} is {0, 1}.
In the SRAM storage mode, the dynamic aXNOR enable signal is passed through the COM_EN port, setting the CEN signal to 0; this causes the MOSFET controlled by the CEN to be turned off, thus preventing any effective current path from being established and ensuring that the DAM SRAM unit does not execute any analog MAC computations. However, upon switching to the computation mode, the CEN signal is set to 1; this turns on the MOSFET and allows the Feature input signal X to interact with the Weight signal Q stored in the DAM SRAM cell for analog multiplication. As shown in
Figure 4, four distinct signal configuration combinations exist, and each corresponds to a unique multiplication result.
(1) When the Feature signal pair {X, XB} has a value of {1, 0} (corresponding to +1) and the Weight signal pair {Q, QB} has a value of {1, 0} (also +1), the analog multiplication produces a result of +1, generating a mirrored current IC that is proportional to the analog MAC current IM on RBL.
(2) If the Feature signal pair {X, XB} has a value of {0, 1} (representing −1) and the Weight signal pair {Q, QB} has a value of {0, 1} (also −1), although both are negative, their product is still +1, resulting in a mirrored current IC on RBL.
(3) When the Feature signal pair {X, XB} has a value of +1 and the Weight signal pair {Q, QB} has a value of −1, the analog multiplication result becomes −1, theoretically leading to no generation of an analog multiplication result current; hence, no corresponding mirrored current is observed on RBL.
(4) Lastly, if the Feature signal pair {X, XB} has a value of −1 and the Weight signal pair {Q, QB} has a value of +1, the analog multiplication again results in −1; similarly, no reflected analog current would represent the multiplication outcome on RBL.
The truth table of the 11T SRAM cell exhaustively records all possible combinations of the input signal X and weight data W, as well as the corresponding unit current output values for each combination.
2.2. The Voltage Gradient Quantization Circuit
To address the non-linearity issues [
6] that arise from accumulated analog currents, this design further incorporates a voltage gradient quantization circuit. In practical operations, the MAC results in the manifestation of the analog domain as accumulated analog currents, which, if left unprocessed, may introduce non-linearity into the output current and compromise the accuracy of the final computational results due to source voltage drift and other factors. The voltage gradient quantization circuitry converts these analog currents in real-time into stable voltage gradients, ensuring a high level of output precision throughout the computational process.
A system consisting of 64 voltage gradient quantization circuit units, designed based on the aXNOR-based in-memory computing principle of the DAM SRAM array, is arranged in a matrix-like column layout, closely aligned and corresponding one-to-one with each column in the DAM SRAM array. This array efficiently translates the large-scale analog cumulative currents that are generated during the parallel distributed computation of the DAM SRAM array into a high-fidelity sequence of voltage gradient signals. As shown in
Figure 5, each voltage gradient quantization circuit consists of a quantization resistor (Rg), a set of voltage divider MOSFETs (NM1 and NM2), and a self-stabilizing loop (SSL, ZFET1, and ZFET2. During multi-row MAC operations, individual memory cells within the same column perform multiplication with their stored states against the accumulating result, with the resultant currents progressively summed up on the RBL; this eventually forms voltage changes that are representative of data states and appear as voltage drops across the RBL and its complementary line RBLB. As an increasing number of inputs participate in the MAC operations and their corresponding weight coefficients, the signal margin of the system exhibits a marked exponential tendency to decay, which affects the accuracy of the analog readout when the margin falls below the inherent offset threshold of the analog readout circuits, sense amplifiers (SAs), or analog-to-digital converters (ADCs).
The voltage gradient quantization circuit proposed in this paper addresses the non-linear attenuation issue of voltage gradients caused by the effect of current accumulation on RBL through a self-stabilizing operating mechanism. While the self-stabilizing loop does not directly solve for the voltage drift phenomenon on the RBL current accumulation bus, it compensates for the non-linearity of the MAC currents by elevating the bias voltage of ZFET1 via the following steps: (1) As shown in
Figure 6a, as the number of contributing inputs and their corresponding weight coefficients increase during MAC operations, the non-linear attenuation effect of the accumulated current intensifies; this causes a non-linear drift in the voltage VRBL on the current accumulation bus RBL, as depicted by the dashed line in
Figure 6d. (2) The bias voltage of ZFET2 originates from the voltage VRBL on the current accumulation bus RBL. As shown in
Figure 6b, when the VRBL undergoes non-linear drift downward, the current IST in the right half of the self-stabilizing circuit branch decreases correspondingly, which raises the source voltage VT1 of ZFET2, as shown in
Figure 6c. (3) As illustrated in
Figure 6a, the bias voltage of ZFET1 originates from the source voltage VT1 of ZFET2. When VT1 is increased, the MAC current IMAC on the current accumulation bus RBL receives non-linear compensation, maintaining a linearly increasing trend, as depicted in
Figure 6d; this solves the problem regarding the non-linear attenuation of the output voltage gradient Vg. The gradient voltage signals processed by the voltage gradient quantization circuits are fed into a gradient voltage encoding module, which meticulously analyzes and accurately quantizes these signals, thereby generating multi-bit-wide digital output signals. This voltage gradient quantization circuit significantly enhances the ability of the in-memory computing system to perform linearization for analog current accumulation results, thereby improving the overall computational accuracy and long-term stability of the system.
2.3. The Pre-Processing Strategy of Data Storage
Before executing a convolutional layer operation, the DAM SRAM core enters the SRAM mode, loading image-relevant weight data column-wise into the DAM SRAM core array. Both the weights stored in the DAM SRAM cells and input signals are represented in binary format, consisting only of 1 s and 0 s; meanwhile, the original convolution kernel elements use signed binary values of +1 and −1. A pre-processing strategy, depicted in
Figure 7a, is proposed; here, the convolution kernel is vectorized and decomposed. Using the mathematical transformation Kernel = IN1 − IN2, the kernel is converted into two complementary parts, IN1[8:0] and IN2[8:0], to meet the operational requirements of the DAM SRAM core. These two subsets, IN1 and IN2, are independently fed into the DAM SRAM array in a nine-line parallel manner and stored in nine consecutive rows within the same column, for example, rows 8 to 0; this is followed by the performance of synchronous weighted accumulation operations based on XOR logic upon input data. After the weight loading phase, the DAM SRAM system transitions from the storage mode to the computation mode. The overall computational process is detailed in
Figure 7b.
The first step involves the input of feature data F [8:0] from external I/O interfaces and the performance of pre-processing by the CWL driver circuitry. This circuit converts the feature data F [8:0] into a pair of complementary binary sequences, namely X [8:0] and XB [8:0], which are then separately sent to the DAM SRAM array to match the rows storing IN1 and IN2, respectively. In the second step, the CWL driver circuitry, along with the address decoder, assigns addresses to X [8:0] and XB [8:0]; this spatially pairs the complementary binary data sequences {X [8:0], XB [8:0]} with the corresponding two parts of the decomposed convolution kernel {IN1[8:0], IN2[8:0]} stored in the DAM SRAM array. The third step sees the simultaneous activation of the computation enable signals for rows 8 through 0. Setting the CEN signal high via the COM_EN port triggers the execution of MAC operations in the respective rows, thus performing row-wise multiplication and accumulation operations between {X [8:0], XB [8:0]} and IN1[8:0] and IN2[8:0]. Finally, the current-based results obtained from the Multiply–Accumulate operations are converted into digital outputs via integrated voltage gradient quantization and the gradient voltage decoding circuits. The subtraction of these digital values yields the final four-bit digital output result O [3:0].
2.4. The ADC and Output Stage
As shown in
Figure 8, the implementation of a gradient voltage encoding circuit is employed for both the ADC and output stage, comprising two high-speed clocked latch-type comparators, a dual-bit Vin voltage encoding register (R1 and R2), a reference voltage selector, and a “2-bit parallel-to-serial output” encoder. The high-speed clocked latch-type comparators serve to output the comparison result between the selected reference voltage and Vin during each cycle. The Vin voltage encoding registers, R1 and R2, temporarily store the comparator outputs and provide voltage selection enable signals to the reference voltage selector. The reference voltage selector utilizes the voltage selection signals from the encoding registers to choose the appropriate reference voltage for the high-speed comparators. Lastly, the “2-bit parallel-to-serial output” encoder converts the numerical values stored in the voltage encoding registers into a serial output format.
At the onset of a complete conversion cycle within the gradient voltage encoding circuit, the Vin voltage encoding registers undergo reset and initialization. During this initialization process, all bits within the registers adhere to a binary-split principle, configuring themselves to an intermediate reference voltage level encoding value that is symmetrically distributed relative to the full-scale reference voltage. This intermediate level corresponds to one-half of the quantized reference voltage range, effectively presetting an initial reference voltage. This preconfiguration serves to swiftly bracket the likely dynamic range of the analog input voltage Vin during the subsequent analog-to-digital conversion process. Once the initial reference voltage benchmark is set in the Vin voltage encoding registers, the two high-speed clocked latch-type comparators commence alternating operations. Using the reference voltage selected during initialization, these comparators periodically compare the target analog input voltage Vin. If the comparison indicates that Vin is higher than the initially locked reference voltage threshold, the state-holding register R1 maintains a logical high state (“1”), signifying that the current reference voltage estimate is lower than the actual Vin. Consequently, the reference voltage Vref needs to be incremented towards a higher voltage domain. Conversely, if Vin is lower than the chosen reference voltage Vref, R1 is driven to a logical low state (“0”), indicating that the current Vref selection is too high and requires downward adjustment. With each clock pulse, Vin is successively compared against COMP1 and COMP2 comparator units in an alternating fashion. The resulting comparison outputs drive alternating updates in the states of R1 and R2. Based on the previous state decision made by R1, the reference voltage selection circuit selects the reference input voltage level for the next comparison cycle.
2.5. The Low-Power Design of DAM SRAM
The computation structure of the DAM SRAM unit, particularly the aXNOR portion, incorporates an in-unit current isolation mechanism that significantly reduces the system’s consumption of computational power. On one hand, this programmable in-unit current isolation mechanism enables the DAM SRAM array to perform flexible block dormancy, thereby decreasing the consumption of static power in the DAM SRAM CORE blocks that do not participate in computations. On the other hand, the high-speed dynamic nature of the in-unit current isolation mechanism effectively minimizes the consumption of computational power in DAM SRAM blocks during the computation mode, realizing the low-power operation of the DAM SRAM CORE.
In the block dormancy mode, as shown in
Figure 9, when Block 1 in the DAM SRAM array is instructed to enter dormancy, the entirety of Block 1’s COM_EN port sets the CEN signal to a low level, thus disabling the internal computational pathways of all DAM SRAM units within the block. The node voltage VK within the aXNOR structure is pulled down, preventing the formation of a current path in the left branch of the voltage gradient quantization circuit; this leaves only a small static current IST in the right branch, thus enabling the block to achieve a low-power sleep mode.
In the computation mode, when the DAM SRAM array performs computations, the COM_EN control ports corresponding to the rows with active data are prompted to set the CEN signal to a high level. This action activates the computational pathways within the DAM SRAM units of the engaged block. The node voltage VK within the aXNOR logic structure of each DAM SRAM unit rises to a preset higher voltage level, consequently activating the in-unit current mirror connected to it. The current mirror regulates the unit current intensity during analog computations. Simultaneously, the left branch of the voltage gradient quantization circuit forms a continuous current path, generating gradient voltages according to the state differences within the unit data. In the actual computation mode, the DAM SRAM array can simultaneously activate up to 16 independent rows for calculations. Assuming that all DAM SRAM units in a given row contribute a computational result of +1 during a calculation cycle, the left branch of the voltage gradient quantization circuit will accumulate basic computation currents from 16 units, resulting in a total current output of up to 160 uA. In order to conserve energy and prevent the unnecessary accumulation of power, the system dynamically controls the COM_EN port to revoke the high-level CEN signal during the calculation cycle; this restores it to a low level and thereby cuts off the computational current path in the left branch of the voltage gradient quantization circuit. As shown in
Figure 10, by ensuring that the computational period is only 1/10th of the total cycle, the dynamic enabling signal can reduce the array’s consumption of computational power to merely 10%.