Summary
Generative AI is media produced and manipulated by an algorithm rather than a human. Synthetic media is a broad category of content including the creation of code, images, music, language, and video such as deep fakes. Attempting to make machines create something is as old as ancient Greek automata, but recent developments in AI such as generative adversarial networks in 2014, transformers in 2017, and diffusion models in 2022 has led to an ever increasing improvement in output quality. It seems Large Language Models like DALLE2 and diffusion models more generally are a phase change in input simplicity and output quality.
Viability (5)
Little engineering risk left for low information density media like text, image and music. For these domains, it’s a case of bringing SOTA models to market. For more complex domains like code, video and 3D video it’s unclear to what extent large language models (LLMs) just need to scale to more parameters or if there is some unknown limitation. Potentially this limitation is cost and energy usage which would require further engineering. GPT-3 was released in 2020, DALL-E-2, Palm, BLOOM, Flamingo, Gato and many others all in 2022. Progress is exceptionally fast and few of these systems have been fully commercialised. OpenAI noted that they have onboarded 100k users to DALL-E-2 with a target of 1 million users. Certainly in the next 2 to 3 years we will see thousands of applications utilising transformer models to create new media. Even without further scaling which is unlikely, existing models will support transformative products in the next few years
Drivers (5)
Developments in AI in particular generative adversarial networks, transformers, and recently diffusion models, all types of neural network architecture, have dramatically improved media generation quality. OpenAI’s GPT-3 for language generation, CoPilot for code generation, and DALL-E-2 for image generation from natural language are reference implementations for the art of the possible in the field.
Novelty (5)
Transformer-based synthetic media creates orders of magnitude higher quality and accuracy media than anything previously possible. The type of media that can be created in 2022 using transformers is not a linear progression from 2021 but a phase change, and as such entirely novel. Relative to human media creation it’s likely to be lower quality but much cheaper especially relative to expensive labour and IP costs for things like music and video production. But the type of output that is generated will be not necessary worse or better, but different. Likely a productivity tool in short term for creators like coders, writers, artists, game designers, etc. But over time, we can expect some job displacement in the creative industries.
Diffusion (5)
Ability to create media faster and cheaper will drive adoption, but more importantly ability to create novel and creative things is the step-change. This is a classic cost reduction and then new value creation technology. Data, text and music will see impact first, with language, video and then 3D video likely to take a few years as the technology matures. There will be a regulatory response especially around video and deep fakes, but this is unlikely to materially impact diffusion speed beyond demand for authenticity and verification tools.
Impact (4)
Generative AI significantly reduces digital production costs where content includes anything on the Internet (future Metaverses): text, images, code, data, and video. Synthetic media will likely replace low-value content creation and augment high-value creators to be more productive and creative. Lowering barrier to creation will usher in a new era in AI/human art, design, and narrative while also driving up the price of demand and human attention. The case for making it a 5 is compelling, but 5s are reserved for technologies that have the potential to fundamentally alter the human condition or economic growth.
Sources