Large Language Models

Summary

Language modelling is the task of predicting the next word in a text given the previous words. Large Language Models (LLM) are machine learning algorithms that learn statistical associations between billions of words and phrases to perform tasks such as generating summaries, translating, answering questions and classifying text. The term Large Language Models (LLM) is used to describe models typically tens of gigabytes in size with billions of parameters and and trained on, well, large, amounts of data, often in the petabyte scale. LLMs entered the public consciousness in 2020 with the release of OpenAI’s GPT-3 which materially pushed forward the state-of-the-art (SOTA) in the creation of human-like text. The past 3 years has seen unprecedented progress in the field especially in extending LLMs into other content generation tasks such as DALLE-2 in 2022 with text-to-image generation.

Technical Maturity

Language models have evolved from basic N-gram models to more complex recurrent (RNN) and long short-term memory (LSTM) neural networks. In 2017, Google introduced a new network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. This techniques paved the way for whole sentence and paragraph processing and parallelisation. Since 2017, language models have got progressively larger from Elmo in 2019 with 1 billion parameters through BERT, GPT-2, T5, Megatron-LM, Turing-NLG, GPT-3, Jurrassic-1, Megatron-Turing NLG, to PaLM with 540 billion parameters. There are suggestions GPT-4 will have 100 trillion parameters. The rush is now on to commercialise LLMs as a value chain emerges around the hardware, software, and developer tooling.

Market Readiness

Language models, mainly n-gram models, have been a part of daily life for over a decade now through search engines, voice assistant and transcription services. These services were fine, but transformers from 2017 brought a higher level of accuracy that brought accuracy to above and beyond human level. The performance of GPT3 and DALLE2 in particular broke through to the mainstream in a big way bringing attention and investment. On the demand-side, there has always been a large market for a whole bunch of language-based applications from RPA to language translation to voice transcription to voice recognition. LLMs are now creating new text-to-image or text-to-video markets that were not possible previously.

LLMs are highly novel. They compete with alternative, older methods like n-grams and LSTM networks, but the performance is so much better that they aren’t really competitive. LLMs are novel because of their generality, the same model can be used for question answering, document summarisation, text generation, etc. So in terms of long-term advantage, LLMs could be akin to a GPU for AI, and for some applications it may be advantageous in terms of cost or performance, just like modern AI accelerators, to use a smaller model that has good enough performance but at much cheaper costs. Part of this calculus will depend on cost/performance of LLMs over time, because LLM providers will benefit from huge economies of scale that niche providers will not.

Extremely powerful tailwinds because of the generality of the technology and the sheer number of applicable use cases. In terms of market access, at multiple billions and soon trillions of parameters, training cost and available training data are the main bottlenecks. Very few organisations have the resources to train LLMs and can limit and slow access as they wish as seen by the limited roll-out of GPT-3 and DALLE2 by OpenAI. However, the AI open-source community is very strong with EleutherAI, HuggingFace, and StabilityAI among others offering LLMs in an open way. In terms of performance improvements, cost could become the rate limiting factor, but demand could push forward advances in energy-efficient computing firstly with ASICs and FGPAs and longer-term the use of Optical Computing, Neuromorphic Computing and other exotic computing designs.

Impact

The speculative highest impact scenario, sees LLMs scaling to trillions of parameters and as one of two pathways to artificial general intelligence (AGI). (Assuming a pathway exists at all) This scaling hypothesis, is just that, a hypothesis at this point, but even at some low probability (<5%) of success, AGI is probably the single most impactful technology humans can ever create, assuming AGI can solve hard problems around energy, space exploration, longevity, etc. Slightly less impactful is a much more probable scenario in which LLMs are the digital equivalent of the factory system, mass production and lean production revolutionised physical production, LLMs are maybe the first digital production tool enabling all digital production to scale. The productivity dividend from the digital revolution and Internet has failed to materialise. There is a high probability that was because digital production was still manual. LLMs could be the catalyst that finally delivers the expected productivity gains. All content creation will go through a disruptive change in the next decade as humans and LLMs combine to automate and augment all digital production.

Sources

The Promise and Perils of Large Language Models, https://twosigmaventures.com/blog/article/the-promise-and-perils-of-large-language-models/
Open-source language AI challenges big tech’s models, https://www.nature.com/articles/d41586-022-01705-z
A Review of the Neural History of Natural Language Processing, https://ruder.io/a-review-of-the-recent-history-of-nlp/