Jack Dongarra on High-Performance Computing and Responsibly Reckless Algorithms

202502061354
Status: #idea
Tags: SCI

Jack’s LINPACK benchmark list evolved into the Top500
- Capture the asymptotic rate of throughput, and put that on the benchmark
Attack of the killer micros
- Microprocessors scaled better than large vector processors
- Super-computers used
Dennard scaling ended ~2007
Cloud vendors
- building their own chips
  - AWS Graviton
  - Google TPU
- building their own interconnects, accelerators
Environment for HPC in scientific computing
- Communication is vv expensive compared to floating point ops
- Floating point goes from 64-4bits
  - Nvidia TF32, Google BF16, etc.
  - Nvidia FP8 (2 versinos)
    - Forward prop requires more precision on the fraction
    - Back prop requires more range

Performance & Benchmarking Evaluation Tools

LINPACK is not a very relevant benchmark
- FLOP is not very hard
- Real world applications no longer solve a lot of dense matrix problems

AMD MI300A has both CPU cores and GPU cores, which is separate from the Epyc CPUs on the compute node.
- What’s the diff b/w the CPU cores on the MI300 and the Epyc?
- How does the MI300A compare to the GH200?