# Logarithmic Dissipation Shift Register

Technical Note (Report) ZF006 v1.01

Erik P. DeBenedictis Zettaflops, LLC Albuquerque, NM 87112 erikdebenedictis@zettaflops.org

Abstract—This note discusses an energy efficient adiabatic memory based on several new design principles, further discussing the implications of those principles. Adiabatic transistor circuits, such as SCRL, <sup>1</sup> 2LAL, <sup>2</sup> and S2LAL <sup>3</sup> have been touted for low-energy logic circuits. <sup>4</sup> They have also been proposed for cryogenic logic, such as the classical control system of a quantum computer. <sup>5</sup> This note elaborates on a novel design concept used in cryogenic waveform storage. The design principles include using a ladder of multiple clock rates to exploit the variable energy efficiency of adiabatic circuits as a function of clock rate—yet also accounting for the cost of the clock change circuitry. In addition to providing a useful memory, the design principles may lead to a fairer method of comparing CMOS and adiabatic transistor circuits in general.

Keywords—SCRL, 2LAL, S2LAL, CMOS, cryo CMOS, reversible computing, adiabatic computing

#### I. Introduction

Fig. 1 shows a storage subsystem for a digitized waveform,<sup>5</sup> or an instance of a sequential access memory. The example shows the bits stored in a series of large but slow shift registers whose outputs are combined into a smaller number of faster streams.

The ladder of clock rates would not have much value for a CMOS circuit, but adiabatic circuits become more energy efficient as the clock rate slows down, as illustrated in fig. 2. The figure shows the energy per transistor drops quadratically as the clock period lengthens, i. e. inverse clock rate. The large-capacity shift registers in fig. 1 are clocked at 4 MHz,

where fig. 2 shows they dissipate about  $10^{-7}$  as much heat as CMOS at full speed. The multiplexers in fig. 1 that raise the clock rate will have more dissipation per device per the curve in fig. 2. However, there are fewer transistors running at the higher dissipation levels. As will be detailed later, the left of fig. 1 has many bits while the right has high speed, giving the effect of many bits at high speed.

The logarithmic dissipation shift register that is the topic of this note has advantages, but the complete example in ref. 5 also includes a second, Josephson junction-based technology for even higher speed. The hybrid technology is out of the scope of this note and the reader is referred to ref. 5.

This note introduces several new adiabatic design principles:

- To the best of the author's knowledge, there are no adiabatic circuits in the literature that include multiple clocks at different speeds.
- This note further describes how to use a ladder of different speed vs. energy efficiency tradeoffs as a design technique for adiabatic transistor circuits.
- The logarithmic dissipation shift register illustrates how to use adiabatic transistor circuits designed for logic to create a memory that should be superior to CMOS for certain applications.
- The shift register can be compared to a CMOS register of the same length based on chip area, speed, and energy dissipation, which is a more comprehensive comparison than has been possible in the past.
- This document includes an appendix with ngspice simulation code.

# II. LOGARITHMIC SHIFT REGISTER

In lieu of the clock period steps of 250 ns, 25 ns, and 2.5 ns in fig. 1, let us consider a homogeneous n-level structure. For simplicity, let us assume the clock rate change is  $2 \times$  between each pair of levels.

Fig. 3a illustrates a  $2 \times$  clock rate changer for data in shift registers. Its operation is based on a point in the clocking of all



Fig. 1. Hybrid subsystem for sequential storage. The adiabatic transistors are physically small but muse be slowed down to increase energy efficiency. However, the multiplexers speed up the data rate. The Josephson junctions on the right are physically large and hence a poor choice for storage, but they are fast and energy efficiency, making them suitable for the final multiplexing step.

# Power/device vs. freq., TSMC 0.18, CMOS vs. 2LAL



Fig. 2. Comparison of circuit efficiency for standard CMOS (top) and an adiabatic circuit 2LAL (bottom), showing a maximum  $10^{7}\times$  power advantage about 1 MHz. The relative positions of the curves are due to the circuit, but the absolute positions will vary by the specific transistor parameters. The upward slope of the curves is due to a combination of transistor parameters and instability in Spice. If a transistor were optimized for 2LAL, the downward sloping section could continue.

adiabatic transistorized shift registers (that the author is aware of) where the data is entirely contained in a stage. At this point,

all the inputs, outputs, and clocks are at a voltage independent of the stored data. It is possible to logically move the shift register stage and the information it holds using pass gates. Since the voltages are independent of stored data, the relocation can be performed without any voltages changing and therefore without dissipation. If the relocation is implemented as the swapping of two shift register stages, including their data, the resulting system will naturally obey higher level properties required for reversible and adiabatic logic design. The diagram illustrates a binary tree with a 2× clock rate change per stage, but it should be clear how to extend the fanout to other values.

Fig. 3b uses the clock rate changer to create a hierarchical shift register. Each of the circles is a 3-bit cyclic shift register, with each colored square being one adiabatic shift register stage, holding one bit, and clocked at the rate shown in the legend.

In the absence of any bit exchanges by the circuit in fig. 3a, the entire system in fig. 3b would be a series of independent 3-bit cyclic shift registers clocked at the rates shown in the legend.

Now consider each cyclic shift register exchanging one bit with the register above it between each shift. Furthermore, after every second shift, two adjacent bits are exchanged with the two cyclic shift registers below. Thus, the bits originally in the green squares will no longer complete a cycle in three shifts, but each bit will go down either the left or right subtree until reaching a leaf node and then climb back up. Since both the left and right subtrees have the same structure, each pair of bits shifted from green to red will return at the same time—thus preserving the order of bits.





# (d) Energy efficiency k=2

| Level       | Bits       | Energy                      |
|-------------|------------|-----------------------------|
| 0           | 3          | 3                           |
| 1           | 3×2        | $3\times2\times\frac{1}{2}$ |
| <i>n</i> -1 | 3×4        | $3\times4\times\frac{1}{4}$ |
| Total:      | $3(2^n-1)$ | 3 <i>n</i>                  |
|             |            |                             |

Fig. 3. The utility of clock speed shifting. (a) The clock-rate/parallelism changing circuit. (b) Imagine the green blocks comprise a cyclic shift register. The shift register would become longer if in every other clock cycle two adjacent bits were swapped with a red block in one of two additional cyclic shift registers, which are clocked at ½ the speed. This could repeat for blue blocks that are part of four cyclic shift registers, which are clocked at ½ the speed, etc. (c) A 2D layout of the structure with four levels. (d) The energy efficiency will be better than CMOS in some cases. Each level of the hierarchy has more bits than the previous but higher energy efficiency, so dissipation per level is the same. This would lead to energy per shift being logarithmic in the total number of bits.

From the perspective of the green register at the top of the tree, the entire structure below it simply serves to make the register longer. If the tree has n levels of fanout k, meaning the cyclic registers are of length k+1, the effective length of the register would be  $(k+1)(k^n-1)$ .

Fig. 3c shows a more intelligent 2D layout. It should be noted that the number of bits is exponential in the number of levels, so beyond a point, there will not be enough room in the 2D plane to accommodate the tree without long wires to connect shift register stages.

Fig. 3d computes the number of bits stored in an n-level structure and the energy per shift. The key point is that the number of cyclic registers at each level is the same as the energy efficiency increase due to the slower clock, so the total energy of each level is the same. Thus, for a register of length N, the dissipation per shift is  $O(\log N)$ . By comparison, a standard CMOS register has dissipation O(N) and a 2D array structure such as a DRAM or SRAM would be  $O(\sqrt{N})$ .

#### III. TEST CIRCUIT

The top two levels of the circuit in fig. 3a have been coded in ngspice based on S2LAL, with the code appearing in the appendix. The output traces are illustrated in fig. 4.

The S2LAL circuits are powered by a 2× clock in green and a 1× clock in purple, with the lower traces labeled swap causing an interchange of the bit values. In this circuit, bits in the blue and yellow traces are swapped so they are serialized in the red trace. The circuit is the top two levels of fig. 3b, which

12.0 Fast 10.0 8.0 6.0 4.0 Slow clock 2.0 Swap 200.0 0.0 50.0 100.0 150.0 time

Fig. 4. Simulation of clock rate change circuit. Fast clock, slow clock, and swap lines are labeled. The red top data trace at rate  $2\times$  is synthesized from blue and yellow data traces at rate  $1\times$ . The diagram shows two parallel bits of data at rate  $1\times$  swapping into two serial bits of data at rate  $2\times$ , demonstrating the key step in this note.

have a total of eight stored bits, so the data pattern in the red trace would be expected to repeat with a period of eight (which it does).

## IV. CONCLUSIONS

Is a logarithmic dissipation shift register better or worse than a CMOS one? Claims of superiority of reversible logic normally come from circuits such as multiplier array. A reversible multiplier array becomes more energy efficient as the clock slows down, but this causes the throughput to go down as well. While a CMOS multiplier is a specific circuit that will have a specific speed and energy dissipation, the reversible counterpart is a tradeoff space, making comparisons incomplete. Even after a comparison is performed, a human intent on proving one is better that the other can pick a point in the tradeoff space that makes their claim correct.

The approach in this note may yield more satisfying results. According to fig. 2, the logarithmic dissipation shift register can be as fast as CMOS (both circuits operate at 1 GHz), so it makes sense to make the comparison at CMOS's natural external interface speed. This note introduces the concept of a ladder of adiabatic clock frequencies that enable the adiabatic shift register to have internal components running at the low frequencies where adiabatic circuits yield their benefit. Thus, the approach in this note allows the adiabatic circuit to be as fast as CMOS and as energy efficient as the advocates of adiabatic circuitry claim it could be. However, this note includes the "clock rate converter" so a correct comparison must include the cost of that converter, hence yielding correct

conclusions for all points in the trade space.

The author makes no claim to have made such a rigorous comparison. As mentioned, according to a standard adiabatic energy model, an N-bit adiabatic shift register has  $O(\log N)$  dissipation compared to  $O(\sqrt{N})$  for a CMOS memory. This favors the adiabatic circuit. However, both the adiabatic circuit and CMOS would have overheads. More information and analysis would be required to tell which overhead would be greater.

This note is also based on a subset of the ideas in ref. 5. That document replaces the last, fastest, and least energy efficient multiplexer with a circuit based on Josephson junctions. Josephson junctions are not good for everything, but they are quite energy efficient for simple, fast logic.

## REFERENCES

 Saed G. Younis. Asymptotically Zero Energy Computing Using Split-Level Charge Recovery Logic. No. AI-TR-1500. Massachusetts Institute of Technology Artificial Intelligence Laboratory, 1994.

- [2] V. Anantharam, M. He, K. Natarajan, H. Xie, and M. P. Frank. "Driving fully-adiabatic logic circuits using custom high-Q MEMS resonators," in Proc. Int. Conf. Embedded Systems and Applications and Proc. Int. Conf VLSI (ESA/VLSI). Las Vegas, NV, pp. 5-11.
- [3] Frank, Michael P., et al. "Reversible Computing with Fast, Fully Static, Fully Adiabatic CMOS." *arXiv preprint arXiv:2009.00448* (2020).
- [4] Zulehner, Alwin, Michael P. Frank, and Robert Wille. "Design automation for adiabatic circuits." *Proceedings of the 24th Asia and South Pacific Design Automation Conference*. 2019.
- [5] DeBenedictis, Erik P. "Quantum Computer Control using Novel, Hybrid Semiconductor-Superconductor Electronics." arXiv preprint arXiv:1912.11532 (2019).

## A. Appendix: ngspice files

The ngspice simulation code is intended to match the top two levels or fig. 3b and is shown in fig. 5 using the same color scheme.



Fig. 5. Diagram of simulated circuit. It is very nearly the top two levels of fig. 3b. Shown here with corresponding colors.

### s2lal.cir (ratex.cir)

```
Ratex

* S2LAL initial test setup. Demonstrates a 2x rate change.

* S2LAL circuit from:

* Frank, Michael P., et al. "Reversible Computing with Fast, Fully Static, Fully Adiabatic CMOS." arXiv preprint arXiv:2009.00448 (2020).

* Contains Athas's adiabatic amplifier from:

* Athas, W. C., et al. "Low-power digital systems based on adiabatic-switching principles." IEEE Transactions on VLSI Systems 2.4 (1994): 398-407

* Tested with ngspice-30 (creation date Dec 28, 2018, from ngspice-30.64.zip 8,687,648 bytes)

* (NOT TESTED RECENTLY) Also works with WRSPICE, except that the .control block is different for the two and has to be switched back and forth

* For tutorial docs: no tabs; comments start column 61; 169 character maximum line length
 .param WRSPICE PROGRAM=0
.if (WRSPICE PROGRAM=1)
.MODEL pl pmos (LEVEL=49 version=3.3.0)
.MODEL nl mos (LEVEL=49 version=3.3.0)
.endif
.if (WRSPICE_PROGRAM=0)
.MODEL pl pmos (LEVEL=49 version=3.3.0)
.MODEL nl nmos (LEVEL=49 version=3.3.0)
.endif
                                                                                                                                                          \$ From WRspice manual: This enables users to include WRspice-specific input in SPICE files... \$ WRspice builtin
                                                                                                                                                          $ ngspice builtin
 .param CLAMP=1
.param FULLPASS=0
.param ACAP=2e-12
.param QQCAP=0e-12
.param MUXCAP=1e-12
                                                                                                                                                         $ clamp transistor of Athas's adiabatic amplifier, set to 0 to disable
$ other transistor to make the clamp a full pass gate
$ capacitive load on the data line
$ capacitive load on the internal QQ node
$ capacitive load on the MUX output
  *** SUBCIRCUIT DEFINITIONS
M4 GND AC T nsub n1
M5 PWR AT C psub p1
.endif
.if (FULLPASS=1)
                                                                                                                                                          $ clamp
 M6 GND AT T psub p1
M7 PWR AC C nsub n1
   endif.
  .ENDS AAMP
* Figure 5 in arXiv:2009.00448
.SUBCRT LATCH AT AC QT QC piT piC pjT pjC GND PWR
+ nsub psub tap0 tap1 tap2 tap3 ini='gg'
N1 tap5 QT 1
X1 AT AC T C piT piC GND PWR nsub psub AAMP ini='ini'
M1 T pjT QT nsub n1
M2 T pjC QT psub p1
M3 C pjT QC nsub n1
M4 C pjC QC psub p1
C1 AT 0 ACAP
C2 AC 0 ACAP
                                                                                                                                                          $ One phase of the 2LAL shift register. Args: AT/C QT/C clockOT/C clock1T/C
                                                                                                                                                         $ substrate supplies
$ circuit taps for debugging
                                                                                                                                                         $ Frank's latch
                                                                                                                                                         $ Frank's latch
 C2 AC 0 ACAP
```

```
C3 T 0 QQCAP
C4 C 0 QQCAP
    ENDS LATCH
  * Figure 6 in arXiv:2009.00448, except this is just the first stage; shift clocks for subsequent stages
.SUBCKT PHASE SOT SOC SIT SIC

$ One stage of the 2LAL shift register. Args: AT/C QT/C

pOT pOC plt plc p2T p2C p3T p3C GND PWR nsub psub

t ap0 tap1 tap2 tap3 tap4 tap5 tap6 tap7 ini='ng'

X0 SOT SOC SIT SIC plt plc pOT pOC GND PWR nsub psub tap0 tap1 tap2 tap3 LATCH ini=ini
X10 SIT SIC SOT SOC p2T p2C p3T p3C GND PWR nsub psub tap4 tap5 tap6 tap7 LATCH ini=ini
ands PHASE
  * Figure 6 in arXiv:2009.00448, except this is all 8 stages .SUBCKT SDELAY SOT SOC S8T S8C
                                                                                                                                                                   $ Four phases that just delay. Args: 2*{ data<n>T/C }
 SUBLAY SUELAY SUT SUC SAT SAC $ Four phases that just delay.

+ pOT pOC pIT p1C p2T p2C p3T p3C $ clocks/power supplies

+ p4T p4C p5T p5C p6T p6C p7T p7C

+ GND PWR nsub psub

+ tap0 tap1 tap2 tap3 tap4 tap5 tap6 tap7 tap8 tap9 tap4 tap8 tapC tapD tapE tapE ini='gg'

R0 tap0 SOT 1 $ circuit taps for debugging

R1 tap1 SOC 1
  R2 tap2 S1T 1
R3 tap3 S1C 1
  R3 tap3 S1C 1
R4 tap4 S2T 1
R5 tap5 S2C 1
R6 tap6 S3T 1
R7 tap7 S3C 1
R8 tap8 S4T 1
  R9 tap9 S4C 1
   RA tabA S5T 1
  RB tapB S5C
RC tapC S6T
RD tapD S6C
RE tapE S7T
 RE tapE S7T 1

XO SOT SOC S1T S1C pOT pOC p1T p1C p2T p2C p3T p3C GND PWR nsub psub t100 t101 t102 t103 t200 t201 t202 t203 PHASE ini=gg

X1 S1T S1C S2T S2C p1T p1C p2T p2C p3T p3C P4T P4C GND PWR nsub psub t110 t111 t112 t113 t210 t211 t212 t213 PHASE ini=ini

X2 S2T S2C S3T S3C p2T p2C p3T p3C P4T P4C P5T P5C GND PWR nsub psub t120 t121 t122 t123 t220 t221 t222 t223 PHASE ini=ini

X3 S3T S3C S4T S4C p3T p3C P4T P4C P5T P5C P6T P6C GND PWR nsub psub t130 t131 t132 t133 t230 t231 t232 t233 PHASE ini=ini

X4 S4T S4C S5T S5C P4T P4C P5T P5C P6T P6C P7T P7C GND PWR nsub psub t130 t131 t132 t133 t230 t231 t232 t233 PHASE ini=ini

X5 S5T S5C S6T S6C P5T P5C P6T P6C P7T P7C GND PWR nsub psub t140 t141 t142 t143 t240 t241 t242 t243 PHASE ini=ini

X6 S6T S6C S7T S7C P6T P6C P7T P7C P0T P0C P1T P1C GND PWR nsub psub t150 t151 t152 t153 t250 t251 t252 t253 PHASE ini=ini

X7 S7T S7C S8T S8C P7T P7C P0T P0C P1T P1C P1C P1C PND PWR nsub psub t170 t171 t172 t173 t270 t271 t272 t273 PHASE ini=gg
   .ENDS SDELAY
  $ 2-input bi-directional MUX built with 2-rail address and pass gates
.SUBCKT STR in0 inl adrT adrC out0 out1 nsub psub $ inputs in0 inl adrT/C out; connect in[adr] to out
M1 in0 adrT out0 psub p1 $ adr = 0 --> in0 connects to out
.if (0)
  R1 in0 out0 1
  R2 in1 out1 1
 R2 in1 out1 1
.else
M2 in0 adrC out0 nsub n1
M3 in1 adrC out0 psub p1
M4 in1 adrT out0 nsub n1
M5 in1 adrT out1 psub p1
M6 in1 adrC out1 nsub n1
M7 in0 adrC out1 psub p1
M8 in0 adrC out1 psub p1
.endif
C1 out0 0 MUXCAP
C2 out1 0 MUXCAP
.ENDS STR
                                                                                                                                                                   \$ adr = 0 --> in0 connects to out \$ adr = 1 --> in1 connects to out \$ adr = 1 --> in1 connects to out \$ adr = 0 --> in0 connects to out \$ adr = 0 --> in0 connects to out \$ adr = 0 --> in1 connects to out \$ adr = 1 --> in1 connects to out \$ adr = 1 --> in1 connects to out \$ adr = 1 --> in1 connects to out
  * Two stages with clock rate swap. Actually, it's the data that swaps
.SUBCKT RATEX ATI ACI BTI BCI pOTI pOCI pITI pICI p2TI p2Ci p3TI p3Ci p4TI p4Ci p5TI p5Ci p6TI p6Ci p7TI p7Ci

+ CTI CCI DTI DCI q0TI q0Ci q1TI q1Ci q2Ti q2Ci q3TI q3Ci q4TI q4Ci q5TI q5Ci q6TI q6Ci q7TI q7Ci

+ G1 G2 GND PWR nsub psub iniA=0 iniB=0 S DCI Supply substrate supplies

X1 ATO ACO BTO BCO p0TO p0Co p1To p1Co p2To p2Co p3To p3Co p4To p4Co p5To p5Co p6To p6Co p7To p7Co GND PWR nsub psub t300 t301 t302 t303 t304 t305 t306 t307 t308 t309 t30A
t30B t30Ci t30D t30E t30F SDELAY ini=iniA
X28 p7Ti q7Ti G1 G2 p7To q7To nsub psum STR
X29 p7Ci q7Ci G1 G2 p7Co q7Co nsub psum STR
.ENDS RATEX
  *** POWER-CLOCKS
  .param gg= 0V
.param vv= 9.99V
  .param ticks=199
                                                                                                                                                                    $ number of ticks in the simulation
   .param tick=1000NS
.param tstep=24NS
.param ttn=18000ns
                                                                                                                                                                    $ time of a tick
$ time of a simulation step, so number of steps is tick*ticks/tstep
$ integration time for energy
  *** CLOCKS -- Original 4 clock phases and inverses (total four unique signal), but with Sw and fast phase 1's (total six unique signals)
   .param Ramp=0.80*tick
  .param PTPI=0.10*tick $ one PPT at beginning and end of sequence, two of these PPTs between ramps $ Extra delay to split phi0 into a fast and slow clock; if Fast=0, the clocks become the same $ See Saed G. Younis. Asymptotically Zero Energy Computing Using Split-Level Charge Recovery Logic. No. AI-TR-1500. MIT AI Laboratory, 1994. param Fast=PPT+Ramp+PPT
  $ The clocks comprise a series transistions (separated by PPTs). Starting at the beginning of the three-phase cycle, the clock are computed by repeatedly $ incrementing the time by the length of a transition and a PPT.
   $ incrementing the time by 1
param f0uS=PPT
.param f0uF=f0uS+Fast
.param f1up=f0uF+Ramp+2*PPT
.param f2up=f1up+Ramp+2*PPT
.param f3up=f2up+Ramp+2*PPT
```

```
.param f0dn=f3up+Ramp+2*PPT
 .param fUdn=fJup+Ramp+2*PPT
.param fldn=fJdn+Ramp+2*PPT
.param f2dF=fldn+Ramp+2*PPT
.param f3dn=f2dF+Fast
.param f3dn=f2dS+Ramp+2*PPT
.param epoc=f3dn+Ramp+PPT
                                                                                                                                                                                                                                                                                   'epoc' 'gg' r='0')
'epoc' 'gg' r='0')
'epoc' 'gg' r='0')
'epoc' 'gg' r='0')
'epoc' 'vv' r='0')
VphiOP 110 0 PWL('0' 'gg'
VphiOf 510 0 PWL('0' 'gg'
VphiIP 111 0 PWL('0' 'gg'
Vphi2P 112 0 PWL('0' 'gg'
Vphi2F 512 0 PWL('0' 'gg'
Vphi3P 113 0 PWL('0' 'gg'
Vphi3P 113 0 PWL('0' 'gg'
                                                                                                                                   'f0uS' 'gg' 'f0uS+Ramp'
                                                                                                                                                                                                           'f0dn' 'vv' 'f0dn+Ramp' 'gg'
                                                                                                                                 | fous | gg | fous+Ramp |
fouF | gg | fous+Ramp |
fouF | gg | fous+Ramp |
filp | gg | flup+Ramp |
fous | vv | fous+Ramp |
fous | vv | fous+Ramp |
filp | vv | flup+Ramp |
                                                                                                                                                                                                                                    'fOdn+Ramp' 'gg'
'fOdn+Ramp' 'gg'
'f1dn+Ramp' 'gg'
'f2dS+Ramp' 'gg'
'f2dF+Ramp' 'gg'
'f3dn+Ramp' 'yg'
'f0dn+Ramp' 'vv'
                                                                                                                                                                                                           'f0dn' 'vv'
'f1dn' 'vv'
                                                                                                                                                                                                          Vphi4f 514 0 PWL('0' 'vv'
                                                                                                                                                                                      'gg'
'gg'
'gg'
 Vphi4P 114 0 PWT.('0' 'vv'
                                                                                                                                                                                                                                                                'vv'
 Vphi4P 114 0 PWL('0' 'vv'
Vphi5P 115 0 PWL('0' 'vv'
Vphi6P 116 0 PWL('0' 'vv'
Vphi7P 117 0 PWL('0' 'vv'
                                                                                                                                                                                      'gg'
                                                                                                                                                                                                                                                                                    'epoc' 'gg' r='0')
'epoc' 'vv' r='0')
ViiP 118 0 PWL('0' 'gg'
ViiN 119 0 PWL('0' 'vv'
                                                                                                                                  'f0uS' 'gg' 'f0uS+Ramp' 'vv' 'f0uS' 'vv' 'f0uS+Ramp' 'gg'
                                                                                                                                                                                                           'f2dS' 'vv' 'f2dS+Ramp' 'gg'
'f2dS' 'gg' 'f2dS+Ramp' 'vv'
 .param gOuS=2*PPT
.param gOuF=gOuS+2*Fast
.param g1up=gOuF+2*Ramp+4*PPT
.param g2up=g1up+2*Ramp+4*PPT
.param g3up=g2up+2*Ramp+4*PPT
 .param g3up=g2up+2*Ramp+4*PPT
param g0dn=g3up+2*Ramp+4*PPT
.param g1dn=g0dn+2*Ramp+4*PPT
.param g2dF=g1dn+2*Ramp+4*PPT
.param g2ds=g2dF+2*Fast
.param g3dn=g2dS+2*Ramp+4*PPT
.param gpoc=g3dn+2*Ramp+2*PPT
 Vphj0P 810 0 PWL('0' 'gg'
                                                                                                                                                                                                                                                                                     dboc,
dboc,
dboc,
dboc,
dboc,
dboc,
dboc,
                                                                                                                                                             'g0uS+Ramp'
Vphj0P 810 0 PWL('0' 'gg'
Vphj0P 910 0 PWL('0' 'gg'
Vphj1P 811 0 PWL('0' 'gg'
Vphj2P 812 0 PWL('0' 'gg'
Vphj2P 912 0 PWL('0' 'gg'
Vphj3P 813 0 PWL('0' 'gg'
Vphj4P 914 0 PWL('0' 'vv'
Vphj4P 814 0 PWL('0' 'vv'
                                                                                                                                                            'gOuS+Ramp'
'gOuF+Ramp'
'g1up+Ramp'
'g2up+Ramp'
'g2up+Ramp'
'g3up+Ramp'
                                                                                                                                                                                                           'g0dn'
'g0dn'
'g1dn'
'g2ds'
'g2dF'
'g3dn'
'g0dn'
                                                                                                                                                                                                                                      'g0dn+Ramp'
'g0dn+Ramp'
'g1dn+Ramp'
'g2dS+Ramp'
'g2dF+Ramp'
'g3dn+Ramp'
                                                                                                                                                                                                                                                                                                  'gg' r='0'
'gg' r='0'
'gg' r='0'
'gg' r='0'
'gg' r='0'
'yg' r='0'
'vv' r='0'
'vv' r='0'
                                                                                                                                   'g0us'
'g0uF'
'g1up'
'g2up'
'g2up'
'g3up'
'g0uF'
                                                                                                                                                                                                                                                                .da
.da
.da
                                                                                                                                                 'gg'
'gg'
                                                                                                                                                                                                                                                                'gg'
                                                                                                                                                                                       'gg'
'gg'
'gg'
'gg'
                                                                                                                                                             'g0uF+Ramp'
                                                                                                                                                                                                                           'gg'
                                                                                                                                                                                                                                       g0dn+Ramp'
                                                                                                                                                                                                          'g0dn' 'gg'
'g0dn' 'gg'
'g1dn' 'gg'
'g2dF' 'gg'
'g2dS' 'gg'
'g3dn' 'gg'
                                                                                                                                                 'vv'
                                                                                                                                                             'q0uS+Ramp'
                                                                                                                                                                                                                                      'q0dn+Ramp'
 Vphj5F 815 0 PWL('0' 'vv'
Vphj5F 815 0 PWL('0' 'vv'
Vphj6F 816 0 PWL('0' 'vv'
Vphj7P 817 0 PWL('0' 'vv'
                                                                                                                                  'glup' 'vv' 'glup+Ramp'
'g2up' 'vv' 'g2up+Ramp'
'g2up' 'vv' 'g2up+Ramp'
'g3up' 'vv' 'g3up+Ramp'
                                                                                                                                                                                                                                     'gldn+Ramp' 'vv'
'g2dF+Ramp' 'vv'
'g2dS+Ramp' 'vv'
'g3dn+Ramp' 'vv'
                                                                                                                                                                                                                                                                                    dboc, An,
                                                                                                                                                                                                                                                                                                             r='0'
                                                                                                                                                                                        'gg'
                                                                                                                                                                                                                                   '2*gpoc-PPT' 'gg'
'2*gpoc-PPT' 'vv'
               PPC 0 PWL('0' 'vv'
PPT 0 PWL('0' 'qq'
                                                                                                                         'gpoc-PPT' 'vv' 'gpoc' 'gpoc-PPT' 'gg' 'gpoc'
                                                                                                                                                                                                                                                                               '2*gpoc' 'vv' r='0')
'2*gpoc' 'gg' r='0')
 Va1
                                                                                                                                                                                        'gg'
              200 0 DC 'gg'
201 0 DC 'vv'
 VGND
*** TOP-LEVEL CIRCUIT
X2 BAT BAC FAT FAC 110 114 111 115 112 116 113 117 114 110 115 111 116 112 117 113

+ SXT SXC SYT SYC 810 814 811 815 812 816 813 817 814 810 815 811 816 812 817 814 910 815 811 816 812 817 813 PPC PPT 200 201 200 201 RATEX iniA=vv iniB=vv

X3 SYT SYC SYT SYC 810 814 811 815 812 816 813 817 814 810 815 811 816 812 817 813 200 201 200 201 v320 v321 v322 v324 v325 v326 v327 v328 v329 v328 v329 v320 v320 v320
 X3 SYT SYC SZT SZC 810 814 811 815 812 816 813 817 814 810 815 811 816 812 817 813 200 201 200 201 V320 V321 V322 V323 V324 V325 V326 V327 V328 V329 V32A V32E V32C V32D V32E V32F SDELAY ini=gg

X4 SZT SZC SXT SXC 810 814 811 815 812 816 813 817 814 810 815 811 816 812 817 813 200 201 200 201 w320 w321 w322 w323 w324 w325 w326 w327 w328 w329 w32A w32B w32C w32D w32E w32F SDELAY ini=gg
 * power and energy calculation
          (WRSPICE_PROGRAM=0)
                                                                                                                                    $ ngspice builtin
 ### 16 16 V = 0

### 10 16 V = 0

### 1 (vphi0P) *v(110) +I (vphi1P) *v(111) +I (vphi2P) *v(112) +I (vphi3P) *v(113) +I (vphi4P) *v(114) +I (vphi5P) *v(115) +I (vphi6P) *v(116) +I (vphi7P) *v(117) +I (Viv) *v(200) +I (VPMR) *v(200) +V (VPMR) *v(201)
Al 16 17 power tally
 .model power tally int(in offset=0.0 gain=1.0 out lower limit=-le12 out upper limit=le12 limit range=1e-9 out ic=0.0)
 .endif
 .option noinit acct
 $ NGSPICE CONTROL AREA
 .TRAN 'tstep' 'ticks*tick'
  .control
 pre_set strict_errorhandling
unset ngdebug
 set color0=white
 set xbrushwidth=3
set xgridwidth=1
plot v(16)
+ ylimit -25m 25m
plot v(17)
+ ylimit 0 350n
                                                                                                                                     $ plot instantaneous energy consumption
                                                                                                                                     $ plot accumulated energy dissipation
 $ WRSPICE CONTROL AREA
 $ .control
$ tran 'tstep' 'ticks*tick'
 ......
$ END CONTROL AREA
plot title "S2LAL clock and gated clock" ylimit 0 12 xlimit 0 200u $ gnuplot ylimit 0 12 xlimit 0 300u + v(110)/9.99*0.9+10.55 + v(PAT)/9.99*0.9+ 8.55 + v(SAT)/9.99*0.9+ 6.55
```

- + v(SXT)/9.99\*0.9+ 4.55 + v(810)/9.99\*0.9+ 2.55 + v(PPT)/9.99\*0.9+ 0.525 + v(PPC)/9.99\*0.9+ 0.55

\$ set fn=file\$&loop\*x.png
\$ gnuplot gp/\$fn v(a)-1 v(24)/2 v(25)/2 v(26)/2+.5 v(27)/2+.5 v(b2)+1.5 v(22)/2+2.5 v(23)/2+2.5 v(18)/2+3 v(19)/2+3 v(b1)+4 v(20)/2+5 v(21)/2+5 v(16)/2+5.5 v(17)/2+5.5 v(b)+6.5 v(24)/2+7.5 v(25)/2+7.5 v(14)/2+8 v(15)/2+8 v(a)+9 v(22)/2+10 v(23)/2+10 v(12)/2+10.5 v(13)/2+10.5 v(13)/2+10.5 v(20)/2+12.5 v(21)/2+12.5 v(10)/2+12.5 v(10)/2+13 v(11)/2+13 v(a)+14 \$ \* \* \* v(11) -2 v(12) -3 v(11) -4 v(12) -5 10000000\*v(42) -6 10000000\*v(40) -7 10000000\*v(41) -8 \$ \* title "Curves: \$&stick s tick, \$&stick ticks, \$&sttn s total, \$&sloadc F 1d, wid x \$&swidx, \$&svv V/2" ylimit -9 15 \$ \*gnuplot gp/\$fn v(a)-1 v(24)/2 v(25)/2 v(26)/2+.5 v(27)/2+.5 v(26)/2+1.5 v(22)/2+2.5 v(23)/2+2.5 v(18)/2+3 v(19)/2+3 v(19)/2+5 v(21)/2+5 v(16)/2+5.5 v(17)/2+5.5 v(b)+6.5 v(24)/2+7.5 v(25)/2+7.5 v(14)/2+8 v(15)/2+8 v(22)/2+10 v(12)/2+10.5 v(13)/2+10.5 v(13)/2+10.5 v(21)/2+12.5 v(10)/2+13 v(11)/2+13 v(a)+14 v(11)-2 v(12)-3 v(11)-4 v(12)-5 10000000\*v(42)-6 10000000\*v(40)-7 10000000\*v(41)-8 title "step=\$&stick s time=\$&stick s time=\$&stick s plit v=\$&svv V" ylimit -9 15