Keynote - Lizy Kurian John
202312190846
Status: #idea
Tags: HiPC 2023
For ML and With ML: The New Normal in System Design - Lizy Kurian John
For ML: Systems for ML
- Trade-off between flexibility (CPU) and efficiency (ASICs)
- Heterogeneous systems which use all (CPU, GPU, FPGA, ASIC)
Importance of Software Ecosystems/Libraries
- Microsoft DeepSpeed - How to train models larger than your VRAM
ML-specific FPGA Architectures
- Add hard matrix multiplier blocks
- Much larger than a configurable logic block (or other things in the FPGA)
- Not adding more area to the chip
- Non-ML benchmarks do not experience a significant slowdown
- Add Processing-In-Memory (PIM) to RAM embedded in FPGAs
- Speedup and energy reduction
Weightless Neural Networks
- Uses lookup tables
- Is a shallow NN
- Data is very sparse. We can use hashing to increase the density of what-we-remember
- BNNs - Expressiveness is limited
- Do not have high energy requirements
- LUTs are a readily-available building block in FPGAs
- Accuracies are still trailing Deep NNs
- Right now, very good with table-driven data and 1D time-series data
With ML: ML Guided System Design
- Using ML for Pre-Silicon Hardware Design
- Full system simulation is prohibitively slow
- Large gap b/w what can be evaluated pre-silicon and post-silicon
Power-level prediction
- To cover the gap between prediction and the real-world measured power, we can use a calibrated McPAT, so it can make accurate predictions
- A regression model is sufficient, no need for ML, to train the power simulator (McPAT)
ML for Cross-Platform Prediction
- Predict the power of new silicon using existing old silicon
- Have a host machine and new simulator
- Run applications on both the simulator and the host, to derive the correlation between the 2
- Run the complex application on the current hardware, and use the correlation to predict the performance on new hardware
- Used constrained LASSO regression
- It is a piece-wise linear function
- Arriving at this took 2 years for a grad student with good ML experience
- Error dropped from ~30% to
- Use-cases
- Slow simulation
- Hardware software co-development
ML for chip design
- Using ML for things like converting C code to Verilog, and also for Performance and Energy Prediction (PEP)
- Reduces time to market for new chips