Smallest Loss That Compute Can Buy
With Chris Alexiuk and Atul Dhingra
The most expensive portion of model training today is GPU time. Given that, it is useful to ask what is the best way to spend the compute budget. More formally, the optimization problem is: minimize test loss given a FLOPs budget. To achieve the