Potential for a next AI Winter

In the 1940s, Walter Pitts and Warren McCulloch revolutionised computing by showing that artificial neurons could perform basic logical functions. This concept was advanced in 1957 by Frank Rosenblatt with his perceptron, capable of learning from data. However, in 1969, Marvin Minsky and Seymour Papert’s critique highlighted that single-layer perceptrons couldn’t handle complex tasks like the XOR problem, leading to scepticism and the first AI winter. It wasn’t until the development of multi-layer deep networks (DNN) and the backpropagation algorithm that these issues were resolved, enabling modern neural networks to solve intricate problems and demonstrate AI’s growing capabilities.

Since the mid-20th century, artificial neural networks (ANNs) have made significant strides, mimicking the brain’s ability to process information. Despite these advancements, ANNs still face challenges with input tensor size, learning efficiency, and operational costs. Today, debates continue about AI’s future, especially the contrast between hyperscalers and other organisations. Hyperscalers invest heavily in large data-centers, advanced hardware, and cutting-edge algorithms, allowing them to process and analyse data at unprecedented speeds. Traditional companies, on the other hand, often lack such infrastructure. Budget constraints limit their ability to acquire and maintain the necessary hardware and software for large-scale AI and ML projects, forcing them to rely on more cost-effective but less powerful solutions or to increase costly cloud consumptions.

To provide a more comprehensive understanding of the total cost of ownership (TCO) could indicate the next upcoming AI winter. Let’s start from the beginning, focusing on large language models (LLMs). LLMs have a naming schema that needs to be understood before selecting a model, such as from popular hubs like Huggingface, which serves more than 400,000 different models. The schema usually consists of the following parts:

  • Model name (e.g., LLama, Falcon)
  • Field of application (e.g., ChatQA)
  • Amount of parameters (sum of all weights and biases)
  • Context window size (maximum length of tokens the model can consider)
  • Quantization format (e.g., GGUF, GGML)
  • Precision of weight storage

Based on this model information, we can start a first estimate. The starting point is the “amount of parameters” our model has. This means each forward and backward pass involves calculating these weights and biases; simply, it is a linear component of m * x + b. According to the following paper (https://arxiv.org/pdf/2311.03687), we can assume that a relatively modern GPU setup, such as 8xA100 80GB , with an A100 being fairly equivalent to an A800 GPU (sorry experts), will have a throughput of 5K tokens/second on average for a 7b-10b parameter model. This is considered a small model.

Tokens directly correlate with the processed dataset. Let’s assume a common Wikipedia article has around 10,000 tokens, which would result in 2 seconds per article. Assuming a small dataset of 1 million articles results in 2 million seconds, roughly 556 hours. The next factor is the number of epochs required. Assuming a good run of only 3 epochs, we get 556 x 3 = 1668 hours.

If we take a common cloud training platform pricing from Hugginggface https://huggingface.co/docs/inference-endpoints/en/pricing, an NVIDIA-A100-8x would cost roughly $48/hour. Including discounts and being fair with cheaper offerings and well-performed int4 quantization, let’s assume €10/hour. In summary, a small model with minimal data services would require approximately €16,680 for training a simple model. Reflecting on a common model today, with roughly 70 billion parameters and serving roughly 7 trillion tokens, it would require about 3.8 million hours, costing roughly €38 million on investment for training a state of the art model. This calculation assumes a single instance, fully aware that new H100 GPUs offer a speedup factor of 3-8 but also a comparable increase in investment. Today’s hyperscalers use 40,000 to 350,000 GPUs per run, in parallel.

Hyperscalers lead AI and ML research, frequently publishing groundbreaking studies and developing new technologies. They have dedicated research teams and collaborate with academic institutions, pushing the boundaries of AI and ML. Traditional companies, often lacking extensive R&D resources, focus on practical applications and immediate business needs, resulting in slower adoption of the latest advancements. The differences between hyperscalers and traditional companies in AI and ML are significant. Hyperscalers’ ability to harness vast computational power, manage enormous datasets, and invest heavily in R&D enables them to develop and deploy advanced AI and ML solutions at a scale and speed unmatched by smaller companies. While traditional companies can use cloud-based tools and platforms to narrow this gap and focus on fine-tuning existing foundation models, they still face significant challenges in terms of resources and expertise.

This gap between hyperscalers and traditional companies could lead to the next AI winter if no new approaches are adopted. For instance, moving away from the current methods of matrix calculations or even abandoning this mathematical approach altogether could be beneficial. Additionally, focusing more on embedding techniques, such as those described in the matrusky embedding paper (matrusky embedding), might be necessary to avoid stagnation.

Stay tuned, as the coming years promise to be particularly interesting.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *