Generating SIMD Instructions for Cerebras CS-1 using Polyhedral Compilation Techniques
The Cerebras CS-1 is a computing system based on a waferscale processor having nearly 400,000 compute cores. It is intended for training of and inference on deep neural networks.
February 22, 2020
Sven Verdoolaege, Manjunath Kudlur, Rob Schreiber, Harinath Kamepalli
Online Normalization for Training Neural Networks, NeurIps 2019
Online Normalization is a new technique for normalizing the hidden activations of a neural network. Like Batch Normalization, it normalizes the sample dimension. While Online Normalization does not use batches, it is as accurate as Batch Normalization.
May 15, 2019
Vitaliy Chiley, Ilya Sharapov, Atli Kosson, Urs Koster, Ryan Reece, Sofia Samaniego de la Fuente, Vishal Subbiah, Michael James
Memory Efficient 3D U-Net with Reversible Mobile Inverted Bottlenecks for Brain Tumor Segmentation
We propose combining memory saving techniques with traditional U-Net architectures to increase the complexity of the models on the Brain Tumor Segmentation (BraTS) challenge. The BraTS challenge consists of a 3D segmentation of a 240 240 155 4 input image into a set of tumor classes.
March 5, 2021
Mihir Pendse, Vithursan Thangarasa, Vitaliy Chiley, Ryan Holmdahl, Joel Hestness, Dennis DeCoste
The curious case of developmental BERTology: On sparsity, transfer learning, generalization and the brain
In this essay, we explore a point of intersection between deep learning and neuroscience, through the lens of large language models, transfer learning and network compression.
July 7, 2020
Fast Stencil-Code Computation on a Wafer-Scale Processor
The performance of CPU-based and GPU-based systems is often low for PDE codes, where large, sparse, and often structured systems of linear equations must be solved. Iterative solvers are limited by data movement, both between caches and memory and between nodes.
October 7, 2020
Kamil Rocki, Dirk Van Essendelft, Ilya Sharapov, Robert Schreiber, Michael Morrison, Vladimir Kibardin, Andrey Portnoy, Jean Francois Dietiker, Madhava Syamlal, and Michael James
A Templated C++ Interface for ISL
Polyhedral libraries typically support only a very limited collection of types for representing objects, corresponding to broad mathematical classes such as sets, binary relations and functions.
February 20, 2020
Sven Verdoolaege, Ron Estrin, Oleksandr Zinenko, Tianjiao Sun, Manjunath Kudlur, Harinath Kamepalli
Pipelined Backpropagation at Scale: Training Large Models without Batches
New hardware can substantially increase the speed and efficiency of deep neural network training. To guide the development of future hardware architectures, it is pertinent to explore the hardware and machine learning properties of alternative training algorithms.
March 1, 2021
Atli Kosson, Vitaliy Chiley, Abhinav Venigalla, Joel Hestness, Urs Koster