





Search Requests











New Search Refine Search Search Results

Clone Request Edit Request Cancel Request

#### Search Detail

#### **Submittal Details**

Document Info

Title: Reversible Logic for Supercomputing

Document Number: 5231909 SAND Number: 2005-2689 C

Review Type : Electronic Status : Approved

Sandia Contact : DEBENEDICTIS, ERIK P. Submittal Type : Conference Paper

Requestor: DEBENEDICTIS, ERIK P. Submit Date: 04/22/2005

Peer Reviewed?: N

Author(s)

DEBENEDICTIS, ERIK P.

Event (Conference/Journal/Book) Info Name: Computing Frontiers 2005

City: Ischia State: Country: Italy

Start Date: 05/04/2005 End Date: 05/06/2005

Partnership Info

Partnership Involved: No

Partner Approval : Agreement Number :

Patent Info

Scientific or Technical in Content: Yes

Technical Advance: No TA Form Filed: No

SD Number:

Classification and Sensitivity Info

Additional Limited Release Info: None.

**DUSA**: None.

#### **Routing Details**

| Role                           | Routed To          | Approved By         | Approval Date |
|--------------------------------|--------------------|---------------------|---------------|
|                                |                    |                     |               |
| Derivative Classifier Approver | SUMMERS,RANDALL M. | SUMMERS, RANDALL M. | 04/22/2005    |
| Conditions:                    |                    |                     |               |

1 of 2 1/2/2008 9:23 PM

| Classification Approver | WILLIAMS, RONALD L. | WILLIAMS,RONALD L. | 04/25/2005 |
|-------------------------|---------------------|--------------------|------------|
| Conditions:             |                     |                    |            |
| Manager Approver        | PUNDIT,NEIL D.      | PUNDIT,NEIL D.     | 04/29/2005 |
| Conditions:             |                     |                    |            |
| Administrator Approver  | LUCERO, ARLENE M.   | KRAMER,SAMUEL      | 06/05/2007 |

Created by WebCo Problems? Contact CCHD: by email or at 845-CCHD (2243).

For Review and Approval process questions please contact the **Application Process Owner** 

2 of 2 1/2/2008 9:23 PM



How to save the Earth with Reversible Computing

# Erik P. DeBenedictis Sandia National Laboratories

May 5, 2005





## **Applications and \$100M Supercomputers**



[Jardin 03] S.C. Jardin, "Plasma Science Contribution to the SCaLeS Report," Princeton Plasma Physics Laboratory, PPPL-3879 UC-70, available on Internet.
[Malone 03] Robert C. Malone, John B. Drake, Philip W. Jones, Douglas A. Rotman, "High-End Computing in Climate Modeling," contribution to SCaLeS report.
[NASA 99] R. T. Biedron, P. Mehrotra, M. L. Nelson, F. S. Preston, J. J. Rehder, J. L. Rogers, D. H. Rudy, J. Sobieski, and O. O. Storaasli, "Compute as Fast as the Engineers Can Think!" NASA/TM-1999-209715, available on Internet.

National

[SCaLeS 03] Workshop on the Science Case for Large-scale Simulation, June 24-25, proceedings on Internet a http://www.pnl.gov/scales/.

[DeBenedictis 04], Erik P. DeBenedictis, "Matching Supercomputing to Progress in Science," July 2004. Presentation at Lawrence Berkeley National Laboratory, also published as Sandia National Laboratories SAND report SAND2004-3333P. Sandia technical reports are available by going to http://www.sandia.gov and accessing the technical library.

#### **Objectives and Challenges**

- Could reversible computing have a role in solving important problems?
  - Maybe, because power is a limiting factor for computers and reversible logic cuts power
- However, a complete computer system is more than "low power"
  - Processing, memory, communication in right balance for application
  - Speed must match user's impatience
  - Must use a real device, not just an abstract reversible device



#### **Outline**

- An Exemplary Zettaflops Problem
- The Limits of Current Technology
- Arbitrary Architectures for the Current Problem
  - Searching the Architecture Space
  - Bending the Rules to Find Something
  - Exemplary Solution
- Conclusions



#### Simulation of Global Climate



"Simulations of the response to natural forcings alone ... do not explain the warming in the second half of the century"

Stott et al, Science 2000

"..model estimates that take into account both greenhouse gases and sulphate aerosols are consistent with observations over this\*period" - IPCC 2001







#### **FLOPS Increases for Global Climate**

|                | Issue                                                       | Scaling                    |
|----------------|-------------------------------------------------------------|----------------------------|
| 1 Zettaflops 🔻 | Ensembles, scenarios<br>10×                                 | Embarrassingly<br>Parallel |
| 100 Exaflops   | Run length<br>100×                                          | Longer Running<br>Time     |
| 1 Exaflops ◆   | New parameterizations 100×                                  | More Complex<br>Physics    |
| 10 Petaflops * | Model Completeness<br>100×                                  | More Complex<br>Physics    |
| 100 Teraflops  | Spatial Resolution $10^4 \times (10^3 \times -10^5 \times)$ | Resolution                 |
| 10 Gigaflops 4 | Clusters Now In Use (100 nodes, 5% efficient)               |                            |



#### **Outline**

- An Exemplary Zettaflops Problem
- The Limits of Current Technology
- Arbitrary Architectures for the Current Problem
  - Searching the Architecture Space
  - Bending the Rules to Find Something
  - Exemplary Solution
- Conclusions



# **Scientific Supercomputer Limits**

|                                 | Best-Case I<br>Logic                                      | Microprocessor<br>Architecture                  |                                                   | Physical<br>Factor                                | Source of Authority                           |
|---------------------------------|-----------------------------------------------------------|-------------------------------------------------|---------------------------------------------------|---------------------------------------------------|-----------------------------------------------|
| 2×10 <sup>24</sup> logic ops/s⁴ |                                                           | Reliability limit<br>750KW/(80k <sub>B</sub> T) | Esteemed physicists (T=60°C junction temperature) |                                                   |                                               |
|                                 |                                                           |                                                 |                                                   | Derate 20,000 convert logic ops to floating point | Floating point engineering (64 bit precision) |
| Expert<br>Opinion               | 100 Exaflops<br>← 125                                     | 800 Petaflops<br>:1 →                           |                                                   | Derate for manufacturing margin (4×)              | g Estimate                                    |
| Estimate                        | 25 Exaflops                                               | 200 Petaflops                                   |                                                   | Uncertainty (6×)                                  | Gap in chart                                  |
|                                 | 4 Exaflops                                                | 32 Petaflops                                    |                                                   | Improved devices (4×)                             | Estimate                                      |
|                                 | 1 Exaflops                                                | 8 Petaflops                                     |                                                   | Projected ITRS improvement to 22 nm               | ITRS committee of experts                     |
| •                               | n: Supercomputer                                          | 00 Toroflono                                    | <b>▼</b>                                          | (100×)                                            |                                               |
| JS\$100M k                      | est of Red Storm:<br>oudget; consumes<br>power; 750 KW to | 80 Teraflops                                    |                                                   | Lower supply voltage (2×)                         | ITRS committee of experts                     |
| active comp                     |                                                           | 40 Teraflops                                    | $\leftarrow$                                      | Red Storm                                         | contract Sandia                               |
|                                 |                                                           |                                                 | L                                                 |                                                   | Sani<br>Nati<br>Labo                          |

#### **Outline**

- An Exemplary Zettaflops Problem
- The Limits of Current Technology
- Arbitrary Architectures for the Current Problem
  - Searching the Architecture Space
  - Bending the Rules to Find Something
  - Exemplary Solution
- Conclusions



#### Supercomputer Expert System



## Sample Analytical Runtime Model

- Simple case: finite difference equation
- Each node holds n×n×n grid points

- Volume-area rule
  - Computing  $\propto n^3$
  - Communications  $\propto n^2$



#### **Expert System for Future Supercomputers**

- Applications Modeling
  - Runtime  $T_{run} = f_1(n, design)$
- Technology Roadmap
  - Gate speed =  $f_2$ (year),
  - chip density =  $f_3$ (year),
  - $-\cos t = (n, design), ...$
- Scaling Objective Function
  - I have \$C<sub>1</sub> & can wait T<sub>run</sub>=C<sub>2</sub> seconds. What is the biggest n I can solve in year Y?

 Use "Expert System" To Calculate:

Max n:  $\$< C_1$ ,  $T_{run} < C_2$  All designs

Report:

**Floating operations** 

T<sub>run</sub>(n, design)

and illustrate "design"



#### **Outline**

- An Exemplary Zettaflops Problem
- The Limits of Current Technology
- Arbitrary Architectures for the Current Problem
  - Searching the Architecture Space
  - Bending the Rules to Find Something
  - Exemplary Solution
- Conclusions



### The Big Issue

Initially, didn't meet constraints

More Parallelism

#### **Scaled Climate Model**

2D → 3D mesh, one cell per processor

Parallelize cloud-resolving model and ensembles

One Barely Plausible Solution

More Device Speed

Consider special purpose logic with fast logic and low-power memory

Consider only highest performance published nanotech device QDCA

Initial reversible nanotech

#### ITRS Device Review 2016 + QDCA

| Technology | Speed<br>(min-max)        | Dimension<br>(min-max) | Energy per<br>gate-op | Comparison              |
|------------|---------------------------|------------------------|-----------------------|-------------------------|
| CMOS       | <b>30 ps-1</b> μ <b>s</b> | 8 nm-5 μm              | 4 aJ                  |                         |
| RSFQ       | 1 ps-50 ps                | <b>300 nm- 1</b> μm    | 2 aJ                  | Larger                  |
| Molecular  | 10 ns-1 ms                | 1 nm- 5 nm             | 10 zJ                 | Slower                  |
| Plastic    | 100 μs-1 ms               | <b>100</b> μm-1 mm     | 4 aJ                  | Larger+Slower           |
| Optical    | 100 as-1 ps               | <b>200 nm-2</b> μm     | 1 pJ                  | Larger+Hotter           |
| NEMS       | 100 ns-1 ms               | 10-100 nm              | 1 zJ                  | Slower+Larger           |
| Biological | 100 fs-100 μs             | <b>6-50</b> μm         | .3 yJ                 | Slower+Larger           |
| Quantum    | 100 as-1 fs               | 10-100 nm              | 1 zJ                  | Larger                  |
| QDCA       | 100 fs-10ps               | 1-10 nm                | 1 yJ                  | Smaller, faster, cooler |



#### **Outline**

- An Exemplary Zettaflops Problem
- The Limits of Current Technology
- Arbitrary Architectures for the Current Problem
  - Searching the Architecture Space
  - Bending the Rules to Find Something
  - Exemplary Solution
- Conclusions



## **An Exemplary Device: Quantum Dots**

 Pairs of molecules create a memory cell or a logic gate









Ref. "Maxwell's demon and quantum-dot cellular automata," John Timler and Craig S. Lent, JOURNAL OF APPLIED PHYSICS 15 JULY 2003

National Laboratories

### **Not Specifically Advocating Quantum Dots**

- A number of posttransistor devices have been proposed
- The shape of the performance curves have been validated by a consensus of reputable physicists
- However, validity of any data point can be questioned
- Cross-checking appropriate; see →



Sandia

Ref. "Maxwell's demon and quantum-dot cellular automata," John T(m)ler and Craig S. Lent, JOURNAL OF APPLIED PHYSICS 15 JULY 2003.

Ref. "Helical logic," Ralph C. Merkle and K. Eric Drexler, Nanotechnology 7 (1996) 325-339.

#### **QCA Microprocessor Status**

- M. Niemier Ph. D. Thesis, University of Notre Dame
- 12 Bit μP
- CAD design tool principles
  - 10× circuit density of CMOS at same λ
- Applies to various devices
  - Metal dot 4.2 nm²
  - Molecular 1.1 nm²





#### Reversible Microprocessor Status

#### Status

- Subject of Ph. D. thesis
- Chip laid out (no floating point)
- RISC instruction set
- C-like language
- Compiler
- Demonstrated on a PDE
- However: really weird and not general to program with +=, -=, etc. rather than =



200,000 Transistors 18 Instructions

△32 power supplies
2 Person years for schematics and layout

PhD Thesis Defense

3-phase SCRL 50 mm<sup>2</sup> in HP14

180 Pins



### **CPU Design**

- Leading Thoughts
  - Implement CPU logic using reversible logic
    - High efficiency for the component doing the most logic
  - Implement state and memory using conventional logic
    - Low efficiency, but not many operations
  - Permits programming much like today

Reversible Logic

Irreversible Logic

CPU Logic

**CPU State** 

Conventional Memory

## **Atmosphere Simulation at a Zettaflops**



#### **Performance Curve**



#### **Outline**

- An Exemplary Zettaflops Problem
- The Limits of Current Technology
- Arbitrary Architectures for the Current Problem
  - Searching the Architecture Space
  - Bending the Rules to Find Something
  - Exemplary Solution
- Conclusions



#### **Conclusions**

- There are important applications that are believed to exceed the limits of irreversible logic
  - At US\$100M budget
  - E. g. solution to global warming
- Reversible logic & nanotech point in the right direction
  - Low power

- Device Requirements
  - Push speed of light limit
  - Substantially sub-k<sub>B</sub>T
  - Molecular scales
- Software and Algorithms
  - Must be much more parallel than today
- With all this, just barely works
- Conclusions appear to apply generally



# **Backup**



## \*\*\* This is a Preview \*\*\*

|                                 | Best-Case I<br>Logic                                                 | Microprocessor<br>Architecture                  |                                                   | Physical<br>Factor                                | Source of Authority                             |
|---------------------------------|----------------------------------------------------------------------|-------------------------------------------------|---------------------------------------------------|---------------------------------------------------|-------------------------------------------------|
| 2×10 <sup>24</sup> logic ops/s⁴ |                                                                      | Reliability limit<br>750KW/(80k <sub>B</sub> T) | Esteemed physicists (T=60°C junction temperature) |                                                   |                                                 |
|                                 |                                                                      |                                                 |                                                   | Derate 20,000 convert logic ops to floating point | Floating point engineering t (64 bit precision) |
| Expert<br>Opinion               | 100 Exaflops<br>← 125                                                | 800 Petaflops<br>:1 →                           |                                                   | Derate for manufacturing margin (4×)              | g Estimate                                      |
| Estimate                        | 25 Exaflops                                                          | 200 Petaflops                                   |                                                   | Uncertainty (6×)                                  | Gap in chart                                    |
|                                 | 4 Exaflops                                                           | 32 Petaflops                                    |                                                   | Improved devices (4×)                             | Estimate                                        |
|                                 | 1 Exaflops                                                           | 8 Petaflops                                     |                                                   | Projected ITRS improvement to 22 nm               | ITRS committee of experts                       |
| s size & co<br>JS\$100M b       | n: Supercomputer ost of Red Storm: budget; consumes power; 750 KW to | 80 Teraflops                                    |                                                   | (100×)  Lower supply voltage (2×)                 | ITRS committee of experts                       |
| ctive comp                      | •                                                                    | 40 Teraflops                                    | <del>-</del>                                      | Red Storm                                         | contract Sandia                                 |
|                                 |                                                                      |                                                 | •                                                 |                                                   | Nationa<br>Laborat                              |

#### Metaphor: FM Radio on Trip to in USA

- You drive to a distant listening to FM radio
- Music clear for a while, but noise creeps in and then overtakes music
- Analogy: You live out the next dozen years buying PCs every couple years
- PCs keep getting faster
  - clock rate increases
  - fan gets bigger
  - won't go on forever
- Why...see next slide

Details: Erik DeBenedictis, "Taking ASCI Supercomputing to the End Game," SAND2004-0959



#### FM Radio and End of Moore's Law



Driving away from FM transmitter→less signal Noise from electrons → no change



Increasing numbers of gates → less signal power Noise from electrons → no change

