It is well known that progress in machine learning (ML) is driven by three primary factors - algorithms, data, and computing. This makes intuitive sense - the development of algorithms like backpropagation transformed the way that machine learning models were trained, leading to significantly improved efficiency compared to previous optimization techniques. Data has been becoming increasingly available, particularly with the advent of “big data” in recent years. At the same time, progress in computing hardware has been rapid, with increasingly powerful and specialized AI hardware.
What is less obvious is the relative importance of these factors, and what this implies for the future of AI. Kaplan studied these developments through the lens of scaling laws, identifying three key variables:
Trying to understand the relative importance of these is challenging because our theoretical understanding of them is insufficient - instead, we need to take large quantities of data and analyze the resulting trends.
In the previous investigation by Amodei and Hernandez (2018), the authors found that the training compute used was growing extremely rapidly - doubling every 3.4 months. With approximately 10 times more data than the original study, we find a doubling time closer to 6 months.
One of the more speculative contributions of our paper is that we argue for the presence of three eras of machine learning. This is in contrast to prior work, which identifies two trends separated by the start of the Deep Learning revolution. Instead, we split the history of ML computing into three eras:
A key benefit of this framing is that it helps make sense of developments over the last two decades of ML research. Deep Learning marked a major paradigm shift in ML, with an increased focus on training larger models, using larger datasets, and using more computing.
However, there is a fair bit of ambiguity with this framing. For instance, how do we know exactly which models can be considered large-scale? How can we be sure that this “large-scale” trend isn’t just due to noise?