Review

202502271028
Status: #idea
Tags:

Critique

Introduction

Core Contribution

1. Modular Performance Simulation (Runtime Estimator)

2. Automated Configuration optimization

3. Realistic Benchmark Suite

Limitations

Points to improve

  • Would have loved to see the prediction on a real-world workload (even if the workload is internal to MSoft)
    • Would cement the relevance of simulation
    • Would also like to know the existing internal processes for optimal configurations, and seeing if Vidur performs as well as (or better than) existing methods used in prod
  • The graphs do not indicate any form of error bars, and the paper doesn’t discuss if the experiments were run multiple times
    • Since it incorporates hardware profiling, taking the mean and ensuring a low deviation is necessary
  • Would be nice to see a deeper analysis of the relationship between certain SLOs and $QPS/$
    - Especially as they note the rapid change in QPS when SLO is only slightly changed
  • It is unclear if the profiling is fixed-time or fixed-work
    • Systems benchmarks should always be fixed-work[1]
  • Figures and graphs are inaccessible
    • Poor choice of colours
    • Lack of symbols is neither printer-friendly not colour-blindness friendly
    • Perhaps prefer bright+ieee from SciencePlots

Conclusion


References

  1. VIDUR - A Large-Scale Simulation Framework for LLM Inference
  2. My Notes

  1. https://benchmarking-book.com/ ↩︎