NPU
Thermal Management (new; cites John Gustafson)
https://ieeexplore.ieee.org/abstract/document/9211572
Abstract:
Neural Processing Units (NPUs) are becoming an integral part in all modern computing systems due to their substantial role in accelerating Neural Networks (NNs). The significant improvements in cost-energy-performance stem from the massive array of multiply-accumulate (MAC) units that remarkably boosts the throughput of NN
inference.In this work, we are the
first to investigate the thermal challenges that NPUs bring, revealing how MAC arrays, which form the heart of any NPU,
impose serious thermal bottlenecks to on-chip systems due to their excessive power densities.
For the first time, we explore 1) the effectiveness of precision scaling and frequency scaling in
temperature reductions and
2) how advanced on-chip cooling using superlattice thin-film Thermoelectric (TE)
open doors for new
trade-offs between temperature, throughput, cooling cost and inference accuracy in NPU chips.