High-Performance Stochastic Memristive Networks for Neurocomputing and Neurooptimization

> Dmitri Strukov UC Santa Barbara

PRiME October 2020 (virtual)

# **Artificial Neural Network Zoo**



UCSB

# **Noise in Biological and Artificial Neural Networks**

Molecular-level operations in the brain, e.g. neurotransmitter release in synaptic clefts and voltage gating of ion channels, are <u>stochastic</u>



## Example: fluctuations in K channel



#### Stochastic (binary) neuron



#### Stochastic neural networks:

- (Restricted) Boltzmann machines
- Stochastic Hopfield networks
- Deep believe networks
- Bayesian networks

- ...



# **Focus of This Talk**



# **Radical Improvement with Analog Computing**

Analog VMM:

...using the Ohm & Kirchhoff laws

## **Vector-by-Matrix-Multiplication (VMM)**:

basic neuromorphic operation...



## Features:

- physical-level (very compact) and in-memory computation  $\rightarrow$  fast and <u>very</u> energy-efficient
- proposed by Widrow in 1960s, popularized by Mead and his students (CalTech) in the 1980s
- no dense adjustable-conductance crosspoint devices until recently





# Long-Term Option: (3D) Passive Metal-Oxide Memristors

## • 64 × 64 passive crossbar circuit



H. Kim et al. arXiv 2019

Background work: *M. Prezioso et al., Nature 521, 61 2015, M. Prezioso et al. IEDM'15 p. 17.4.1, 2015, F. Merrikh Bayat et al. Nature Comm., 2018* 

Typical I-V characteristics

![](_page_6_Figure_6.jpeg)

#### **Details:**

- Al<sub>2</sub>O<sub>3</sub>/TiO<sub>2-x</sub> active bilayer by reactive sputtering
- CMOS-compatible CMP/dry etching process and TiN/Al electrodes for higher conductance
- ~250 nm wide lines, passive (0T1R) integration (e.g.
  >250x/10,000x better memristor / memory cell density compared to 1T1R work at comparable complexity and yield
- The largest functional analog-grade passive memristor crossbar circuit supported by proper statistics

![](_page_6_Picture_12.jpeg)

# Most Important Metric: Yield and Switching Threshold Variations in 64×64 Xbar

Raw data (voltage ramp) ...

![](_page_7_Figure_2.jpeg)

- - Switching threshold is defined as voltage at which current changes by > 10% when applying voltage ramp
  - Dark blue dots: ~1% devices that cannot be switched

![](_page_7_Picture_7.jpeg)

# **Conductance Tuning in 64×64 Memristor Crossbar**

## Desired pattern

![](_page_8_Figure_2.jpeg)

Actual pattern

- Color encoding: 256 levels from white (10  $\mu$ S) to black (100  $\mu$ S) @ 0.2V
- < 5% / < 3% absolute / relative tuning error using automated algorithm, with reserves for improvement

# **Near-Term Option: Floating-Gate Devices**

![](_page_9_Figure_1.jpeg)

#### Summary:

- 28x28 B/W input, 10-class output, >100,000 NOR flash synapses, 64 hidden layer CMOS neurons, 180-nm process with eFlash
- 94.65% experimental fidelity (96.5% theoretical)
- < 1-µs latency, < 20 nJ energy per pattern (reserves for improvement for both with better neuron design)</p>
- Much better in speed and energy efficiency over digital circuits at comparable MNIST fidelity (10<sup>6</sup> better energy-delay than IBM TrueNorth)
- Reproducible, temperature insensitive, no change in performance after 7 months shelf-time, without any cell retuning
- More recent work using 55-nm ESF3 NOR-flash technology (CICC'17, IEDM'18'19), scalable to 28 nm

![](_page_9_Picture_9.jpeg)

# New Result #1: Stochastic Analog Vector-by-Matrix Multiplier

## **Basic Idea**:

add intrinsic/extrinsic noise from memory array to dot-product current and feed it to comparator

![](_page_10_Figure_3.jpeg)

## **Two Implementation Options:**

0T1R memristor cell (works for 1T1R as well)

![](_page_10_Figure_6.jpeg)

Floating gate transistor

![](_page_10_Figure_8.jpeg)

![](_page_10_Picture_9.jpeg)

# New Result #1: Stochastic Analog Vector-by-Matrix Multiplier

## **Basic Idea**:

add intrinsic/extrinsic noise from memory array to dot-product current and feed it to comparator

![](_page_11_Figure_3.jpeg)

## Experimental Demo:

using 20×20 passive array with externally-injected noise from readout circuitry

![](_page_11_Figure_6.jpeg)

M.R. Mahmoodi et al. Nature Communications, 2019

## Features:

- Sigmoid slope (i.e. SNR or compute temperature T) controlled dynamically by the applied voltage  $V_{ON}$
- Some smearing of output probabilities due to input-dependent noise and device imperfections

# New Result #1: Stochastic Analog Vector-by-Matrix Multiplier

## Basic Idea:

add intrinsic/extrinsic noise from memory array to dot-product current and feed it to comparator

![](_page_12_Figure_3.jpeg)

## Experimental Demo:

using 180nm embedded ESF1 NOR-flash memory technology

![](_page_12_Figure_6.jpeg)

M.R. Mahmoodi et al. Nature Communications, 2019

## Features:

- Sigmoid slope (i.e. SNR or compute temperature *T*) controlled dynamically by the applied gate voltage

![](_page_12_Picture_10.jpeg)

# New Result #2: Restricted Boltzmann Machine Demo

## 10-input 8-hidden neuron RBM network

![](_page_13_Figure_2.jpeg)

![](_page_13_Figure_3.jpeg)

Experiment (solid) vs. simulation (dash-dot)

![](_page_13_Figure_5.jpeg)

#### **Details:**

- Hardware injected noise with software-emulated neuron functionality
- Random weights (from -32  $\mu$ S to + 32  $\mu$ S) mapped to 10×16 portion of memristor xbar
- Neuron input currents sampled at 1 MHz bandwidth after applying
  Random Input→ Visible → Hidden → Visible → Hidden → …

![](_page_13_Picture_11.jpeg)

# **Solving Optimization Problems with Hopfield Neural Network**

#### Combinatorial optimization problems

| Application                   | Problem            |  |  |  |
|-------------------------------|--------------------|--|--|--|
| Logistics / package delivery  | Traveling salesman |  |  |  |
| Power grid                    | Maximum flow       |  |  |  |
| Design automation             | Vertex cover       |  |  |  |
| Molecular dynamic simulations | Graph partitioning |  |  |  |

Solving TSP with Hopfield neural network

Traveling Salesman Problem: NP hard  $\rightarrow$  use heuristics, e.g.

- single route = specific neuron outputs
- finding optimal solution = minimizing "energy" function of neuron outputs
- dynamics of the recurrent network with *proper* weights minimizes energy function over time

![](_page_14_Picture_8.jpeg)

 Example of continuous time / binary neuron Hopfield network

![](_page_14_Figure_10.jpeg)

 $\Sigma$  = sum amp & comparator

![](_page_14_Picture_12.jpeg)

# Earlier Work: (Deterministic) Hopfield Network Experimental Demonstration with Discrete Memristors

Hopfield network for A-to-D conversion

![](_page_15_Figure_2.jpeg)

input reference

#### Major features:

- 4-bit ADC implemented as a Hopfield network
- The first demo for the memristorbased Hopfield neural network
- CMOS discrete IC neurons
- Discrete packaged memristors
- Fine-tuning to cope with offsets and variations

![](_page_15_Picture_10.jpeg)

Experimental results

![](_page_15_Figure_12.jpeg)

![](_page_15_Picture_13.jpeg)

## **Local Minima in Hopfield Network**

![](_page_16_Figure_1.jpeg)

Local minima present problems!

![](_page_16_Figure_3.jpeg)

Color background:

Baseline Hopfield neural network

 $\Sigma$  = sum amp & comparator

![](_page_16_Picture_7.jpeg)

# Simulated Annealing with Generalized Hopfield Network (Boltzmann Machine)

![](_page_17_Figure_1.jpeg)

Local minima present problems!

<u>Solution</u>: employ probabilistic neurons (stochastic VMMs) to implement simulated annealing

![](_page_17_Figure_4.jpeg)

Color background:

Baseline Hopfield neural network

Stochastic annealing

 $\sum$  = sum amp & comparator × = scaling

![](_page_17_Picture_9.jpeg)

# **Emerging (Custom) Hardware for Combinatorial Optimization**

## Nanomagnets / P-bits

![](_page_18_Picture_2.jpeg)

Experimentally measured ground states for the network consisting of up to 3 coupled magnetic devices with fixed coupling;

## - Limited (near neighbor, fixed) coupling and/or ...

Debashis et al. IEDM 2016

Integer (up to 945) factorization with 8 p-bits

- ... high CMOS overhead Borders *et al. Nature* 573 2019

### **CMOS**

![](_page_18_Picture_9.jpeg)

Experimental results for solving maximum-cut problem with 2×30K-spin Ising network 40-nm 23.65-mm<sup>2</sup> SRAM-based chips

Not in-memory (bulky, slower, power hungry)
 Binary weights

Takemoto, et al. ISSCC 2018

## **Josephson Junction**

![](_page_18_Picture_14.jpeg)

Experimentally measured ground state of random spin glass problems based on 108-qubit D-Wave One system (with evidence of quantum annealing)

Low temperature operationMany issues unsolved

Boixo, et al. Nature phys. 2018

## **Photonics**

![](_page_18_Figure_19.jpeg)

Experimental results for solving max-cut problems with up to 2,000 nodes with Ising network based on degenerate optical parametric oscillators

Slow due to high overhead of the electronic feedback used for updating spatial light modulator

Inagaki, et al. Science 2016

![](_page_18_Picture_23.jpeg)

# **Adjustable Energy Function Annealing**

Energy =  $E_{\text{original}}$  + exp(-time) $E_{\text{addon}}$ 

![](_page_19_Picture_2.jpeg)

Another solution inspired by quantum annealers: Dynamically adjustable energy function

 $\rightarrow V_3$  $\rightarrow V_2$  $\rightarrow V$  $V_{\rm bias}$ bias

Color background:

Baseline Hopfield neural network

 $\sum$  = sum amp & comparator × = scaling

Adjustable energy function / weight annealing

![](_page_19_Picture_9.jpeg)

## Yet Another Approach: Chaotic Annealing

![](_page_20_Figure_1.jpeg)

Color background: Baseline Hopfield neural network  $\sum$  = sum amp & comparator × = scaling

![](_page_20_Picture_4.jpeg)

![](_page_20_Picture_6.jpeg)

# New Result #3: Flexible-Annealing Mixed-Signal Generalized Hopfield Networks for Combinatorial Optimization

![](_page_21_Figure_1.jpeg)

Color background:

- Baseline Hopfield neural network
- Stochastic annealing
- Adjustable energy function / weight annealing
- Chaotic annealing

 $\sum$  = sum amp & comparator × = scaling

![](_page_21_Picture_8.jpeg)

# New Result #3: Combinatorial Optimization Demo with FG

### • Weighted graph partitioning problem...

(finding two mutually exclusive, set of nodes with maximally balanced node weights and minimized edge weights between two sets)

![](_page_22_Figure_3.jpeg)

# ... and experimental results using 10×20 180-nm NOR flash memory array

![](_page_22_Figure_5.jpeg)

# New Result #3: Combinatorial Optimization Demo with Passive 64×64 Metal-Oxide Memristive Crosbar Circuits

problem

- 5-node maximum-weighted clique problem
- -10 0 – Simulation Experiment Simulation — Experiment 300 runs 30 runs Energy (a.u. Base Base — -25.0 Q CSA CSA Energy -25.5 -15 **SSA** -7 EA -26.0 -20 -26.5 Average 50 100 50 100 150 U rag -6 **Ver** -25 -8 150 30 60 90 120 20 60 80 100 40 **Epoch Number Epoch Number**

12-node maximum-weighted vertex cover

M.R. Mahmoodi et al., Proc. IEDM'19

# New Result #3: Combinatorial Optimization Demo with Passive 64×64 Metal-Oxide Memristive Crosbar Circuits

 10-node maximum-weight independent set problem

![](_page_24_Figure_2.jpeg)

 6-node maximum-weight graph partitioning problem

![](_page_24_Figure_4.jpeg)

![](_page_24_Picture_5.jpeg)

M.R. Mahmoodi et al., Proc. IEDM'19

# Summary

- In-memory analog computing based on emerging analog grade memory devices to enable very energy-efficient, compact, and fast analog VMMs
  - Near term: Metal oxide memristors (the most dense though least mature)
  - Long term: Embedded NOR floating gate memories (available at foundries now)
- Intrinsic noise of memory devices to implement stochastic transfer function or stochastic vector-by-matrix multiplication

| Performance estimates & |                         | Conventional |      | Emerging technology     |              | This work |           | * benchmarked on                                                            |
|-------------------------|-------------------------|--------------|------|-------------------------|--------------|-----------|-----------|-----------------------------------------------------------------------------|
| <u>compariso</u>        | n to competition*       | CPU          | GPU  | D-Wave                  | Fiber optics | Memristor | NOR flash | noisy mean-field<br>algorithm,<br>adapted from<br><i>ArXiv</i> :1903 11194) |
|                         | Time to solution (µs)   | 220          | 10   | <b>10</b> <sup>10</sup> | 600          | 3         | 10        |                                                                             |
|                         | Energy to solution (µJ) | 4000         | 2500 | 250×10 <sup>12</sup>    | ?            | 0.2       | 0.6       |                                                                             |

- Experimental demonstration of Bolztmann machines based on small-scale stochastic VMMs circuits with applications in deep believe networks and combinatorial optimization
- Major memristor challenges: poor yield, device uniformity, high cell currents

![](_page_25_Picture_8.jpeg)

# **Relevant References**

## Stochastic neurocomputing and neuro-optimization demos

- M.R. Mahmoodi et. al., "Combinatorial optimization by weight annealing in memristive Hopfield networks", to appear in Scientific Reports 20
- M.R. Mahmoodi et al., "An analog neuro-optimizer with adaptable annealing based on 64×64 0T1R crossbar circuit", Proc. IEDM'19
- M.R. Mahmoodi et al., "Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization", *Nature Communications* 10, art. 5113, 2019
- X. Guo et al., "Modeling and experimental demonstration of a Hopfield network analog-to-digital converter with hybrid CMOS/memristor circuits", *Frontiers in Neuroscience* 9, art. 488, Dec. 2015
- L. Gao et al., "Digital-to-analog and analog-to-digital conversion with metal oxide memristors for ultra-low power computing", *Proc. NanoArch'13*, pp. 19-22

## Passive metal-oxide memristors

- H. Kim et al. arXiv 2019
- F. Merrikh Bayat et al., "Implementation of multilayer perceptron network with highly uniform passive memristive crossbar circuits", *Nature Communications* 9, art. 2331, 2018
- G.C. Adam et al., "3-D memristor crossbars for analog and neuromorphic computing applications", *IEEE TED 64* (1), pp. 312-318, 2017
- M. Prezioso et al., "Training and operation of an integrated neuromorphic network based on metal-oxide memristors", *Nature* 521, pp. 61-64, 2015
- M. Prezioso et al., "Modeling and implementation of firing-rate neuromorphic-network classifiers with bilayer Pt/Al2O3/TiO2-x/Pt memristors", *Proc. IEDM'15*, pp. 17.4.1 17.4.4

## NOR flash VMM-level experimental demos

- F. Merrikh Bayat et al. "High-performance mixed-signal neurocomputing with nanoscale floating-gate memory cells", *IEEE TNNLS* 29 4782-4790 2018
- X. Guo et al. "Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology", *Proc. IEDM'17*, pp. 6.5.1-6.5.4
- X. Guo et al., "Temperature-insensitive analog vector-by-matrix multiplier based on 55 nm NOR flash memory cells", Proc. CICC'17, pp. 1-4

![](_page_26_Picture_17.jpeg)

# Questions?! strukov@ece.ucsb.edu

#### Paper co-authors:

UC Santa Barbara: Zahra Fahimi, Hyungjin Kim, Hussein Nili, M. Reza Mahmoodi

Linkoping U., Norrkoping, Sweden: Leo Sedov and Val Polishchuk

Acknowledgments: G. Adam, F. Alibart, M. Bavandpour, B. Chakrabarti, N. Do, J. Edwards, M. Graziano, X. Guo, B. Hoskins, I. Kataeva, M. Klachko, K. Likharev, F. Merrikh Bayat, M. Prezioso, S. Sahay, A. Vincent

#### **Sponsors (past and present):**

![](_page_27_Picture_6.jpeg)

![](_page_27_Picture_7.jpeg)

![](_page_27_Picture_8.jpeg)