Generative AI (or GenAI) is a type of AI that can create new content such as text, images, videos, audio, music or code in response to user-inputted prompts. You may have heard of some of the more well-known models or tools such as OpenAI’s ChatGPT and DALL-E, Microsoft’s Copilot, or Google’s Gemini and Midjourney. The first GenAI tools were launched to the public in late 2022. The technology is rapidly evolving and becoming more sophisticated, with new tools and improvements being regularly launched.
Generative AI is given a prompt – this could be in the form of a text-based question or any other item the tool can process such as images, music, video or speech. It then generates an output/response to the prompt based on algorithms. The response can then be further enhanced and fine-tuned by adding further prompts. See the Prompt engineering page for further guidance.
The algorithms within GenAI tools evolved from subsets of Artificial Intelligence called machine learning, neural networks, and natural language processing. Neural networks are designed to process data in similar ways to human brains and “learn” by finding patterns in data sets. GenAI tools have been trained on large amounts of data to learn patterns so they can generate new content.
The terms GenAI and Large Language Models (LLM) are often used interchangeably, but LLM are a subset of GenAI which focus more on human language. Many of the tools most relevant to academia are based on text generation and summarisation which are provided by Large Language Models (LLM). Often when we see integrated AI tools and assistants in academic databases these are using LLM technology.
Large Language Models create text, images and code in response to prompts provided by users. The output is so plausible it seems like it has been created by humans. LLM are trained on large amounts of text-based data so they can recognise patterns in writing and the way words and phrases are put together. Their knowledge of patterns in human language enables them to predict the likelihood of words appearing together in a sequence based on probability.
LLM are trained by reading trillions of words from published websites on the internet. Through analysing these words and looking at their relationships such as where they appear in a sequence, which other words appear close by, and sometimes which words don’t appear, they are able to create an understanding of words and their use which are then translated into lists of vectors (values). Each word can have hundreds of values based on its meaning and linguistic characteristics. Words with similar meanings (synonyms) will have similar values.
Until recently, language processing algorithms used recurrent neural networks (RNNs) which could only scan words from left to right and process them sequentially which limited the ability to understand the context and use of words. In 2017 Google researchers made great progress in natural language processing by creating transformers with self-attention – this meant the LLM was able to look at whole sentences, paragraphs and pages of text at once rather than sequentially so it could further understand the relationships between words and determine which words around a word are most important to know how and when a word is used.
Due to its understanding of how words are used together, LLM can encode a prompt inputted by a user and then generate an accurate response based on predicting the probability of which word is likely to appear next in a sequence.
This is an uncomplicated explainer on exactly how GenAI works with great visual diagrams: Visual Storytelling Team and Murgia, M (2023, September 12). Generative AI exists because of the transformer. Financial Times. https://ig.ft.com/generative-ai/
More on Transformers and RNN can be found at this Google Research blog post: Uszkoreit, J. (2017, August 31). Transformer: A novel neural network architecture for language understanding. Google Research. https://research.google/blog/transformer-a-novel-neural-network-architecture-for-language-understanding/