Co-design of Neural Network Weights and eNVM Encoding for On-Chip Storage

Abstract:

Advances in deep learning have inspired in recent years a rapid evolution in hardware accelerators for machine learning applications. This has allowed to achieve state-of-the-art results in various classification and regression applications. However, there remain many challenges in deploying neural networks (NN) to edge devices. One of the biggest performance bottlenecks of today's NN accelerators is off-chip memory accesses.

Embedded non-volatile memories (eNVMs) are a promising solution for increasing the on-chip storage density. eNVMs are generally more dense and energy efficient than SRAM. Moreover, the storage density can be further increased by storing multiple bits in a single memory cell using multi-level cell (MLC) programming. While MLC encoding can potentially eliminate all off-chip weight accesses, it also increases the probability of faults.

In this talk, I will discuss the benefits of co-designing NN weights and memories such that their properties complement each other and the faults result in no noticeable NN accuracy loss. In the extreme case, the weights in fully connected layers can be stored using a single transistor. With weight pruning and clustering, the co-design technique reduces the memory area by over an order of magnitude compared to an SRAM baseline. In the case of VGG16 (130M weights), we are able to store all the weights in 4.9 mm², well within the area allocated to SRAM in modern NN accelerators.

Bio:

Dr. Donato received his B.S. and M.S. (cum laude) in Electrical Engineering from the University of Rome “La Sapienza”, Italy, in 2008 and 2010, respectively, and his Ph.D. in Electrical Sciences and Computer Engineering from Brown University in 2016. In 2017, he joined Harvard University as a Postdoctoral Fellow. His research interests include modeling and analysis of noise sources in nanoscale circuits, and design of automated tools for noise-tolerant circuit architectures. Currently, he is working on the design of novel embedded memory subsystems and circuitry in advanced CMOS technology nodes with applications to machine learning hardware accelerator SoCs.

Last Updated: Nov 10, 2021