Why the future of AI is flexible, reusable basic models
When learning another language, the easiest way to start is with fill in the blanks. “It’s raining cats and…”
By making and correcting mistakes, your brain works (what linguists agree is wired for language learning) begins by discovering patterns in grammar, vocabulary, and word order — which can be applied not only to fill in blanks, but also to convey meaning to other people (or computers, dogs, etc.).
The latter is important when we’re talking about so-called “base models,” one of the hottest (yet underreported) topics in artificial intelligence right now.
According to an review article from 2021basic models are “trained on broad data (usually using self-monitoring at scale) that can be adapted to a wide variety of downstream tasks.”
In non-academic language, just like studying fill-in-the-blanks, basic models learn things in a way that can later be applied to other tasks, making them more flexible than current AI models.
Why are foundation models different?
The way basic models are trained solves one of the biggest bottlenecks in AI: labeling data.
When (to prove you’re not a robot) a website asks you to select “all images with a boat”, you’re essentially tagging. This tag can then be used to send images of boats to an algorithm so that it can reliably recognize boats itself at some point. This is traditionally how AI models are trained; using data labeled by humans. It is a time consuming process and it takes a lot of people to label data.
Foundation models do not need these types of labels. Instead of relying on human notes, they use the fill-in method and self-generated feedback to continuously learn and improve performance, without the need for human supervision.
This makes basic models more accessible to industries that don’t yet have a wide range of data. According to Dakshi Agrawal, IBM Fellow and CTO at IBM AI, depending on the domain you’re training a base model in, a few gigabytes of data can be enough.
These complex models may sound a long way off to a user like you, but you’ve almost certainly seen a basic model at work online at some point. Some of the more well-known are the GPT-3 language model, which when fed with works by famous writers, can produce remarkable imitations, or DALL-E, which produces stunning visuals based on users’ cues.
But basic models are not limited to human language.
In addition to creating new entertainment, the flexibility that basic models provide can help accelerate cutting-edge medical research, scientific advancements, engineering, architecture, and even programming.
Emerging properties
Foundation models are characterized by two very interesting properties: emergence and homogenization.
Emergence means new unexpected features that show models that were not available in previous generations. It usually happens when the models get bigger. A language model that performs basic arithmetic is an example of an emergent property of a model that is somewhat unexpected.
Homogenization is a complicated term for a model trained to understand and use the English language to perform various tasks. Think of summarizing a piece of text, performing a poem in the style of a famous writer or interpreting a command from a human (the GPT-3 language model is a good example of this).
But basic models are not limited to human language. Essentially, we teach a computer to find patterns in processes or phenomena that it can then replicate under a given condition.
Let’s unpack that with an example. Take molecules. Physics and chemistry dictate that molecules can only exist in certain configurations. The next step is to define a use for molecules, such as medicines. A basic model can then be trained, using masses of medical data, to understand how different molecules (i.e. drugs) interact with the human body in the treatment of disease.
Of course, such models can also cause controversy.
This insight can then be used to ‘fine-tune’ the basic model so that it can make suggestions about which molecule might work in a given situation. This could speed up medical research considerably, allowing professionals to simply ask the model to come up with molecules that might have certain antibacterial properties, or could act as a medicine against a certain virus.
However, as mentioned, this can sometimes produce unexpected results. Recently, a group of scientists used a basic AI model to discover cures for rare diseases discovered that the same model could also be used to discover the most powerful chemical weapons known to man.
Fundamental concerns
One small indication of the big change these models can make is the emergence of companies offering “prompt generators,” which people use to come up with prompts for models like Midjourney or DALL-E that reliably produce interesting or accurate images. produce.
Naturally, models like this cause controversy. Recently, a number of artists have spoken out against using their artworks to train image-generating models.
There is also something to be said about the energy consumption required to train a large-scale model. Add to this the fact that the significant computing resources required to create a base model mean that only the world’s largest tech companies can afford to train them.
On the other hand, as Agrawal explained, increasing efficiency in the training and use of these models means that they are becoming more accessible to more people at an ever-increasing rate, reducing both energy consumption and costs.
Another more fundamental (sorry) problem with these models is that any biases or errors in the original model can carry over to tools built with them. So if racist language is used as training data for a language model, it can lead to offensive output and even lawsuits against the company in question.
One way to avoid this is by manually removing unwanted training data, but another more futuristic method is through the use of so-called synthetic data. Synthetic data is essential fake data generated by an AI model to mimic the real thing, but in a more controlled way. This can be useful to ensure that a foundation model does not receive offensive or privacy-sensitive data during the learning process.
Will more advanced AI models take over our jobs?
Well, yes and no.
Most AI researchers see these models as a tool. Just as an electric screwdriver meant fewer hours to put together a wooden structure, it still required a person to handle the electric screwdriver.
Take IBM’s base model Ansible Wisdom. In a quest to find out whether computers can be taught to program computers, researchers have refined a model to generate Ansible code snippets that previously had to be written manually. It allows developers to use natural language to ask the model to suggest, say, the weatherwort automation to deploy a new web server.
Agrawal thinks this will completely change programmers’ jobs.
The entire innovation cycle will accelerate thanks to AI. For example, if you look at code, using basic models makes coding much faster with the first generation of basic models. I am sure that productivity will double within a few years.
The company is releasing the model as an open source project in partnership with Red Hat, best known for distributing and maintaining the open source Linux operating system.
This usage is similar to the electric screwdriver. It takes a mundane task and uses a tool to automate parts of it so that the task can be done more efficiently, saving developers time that they can then use for more creative endeavors.
“It can take over activities that people do today, and people just move on to another activity. I think 80% of the American population used to work in agriculture. Less than 2% are now (according to the USDA ERS – Agriculture and Food Sectors and the Economy) –– people have moved on to other activities and our quality of life has improved with that,” said Agrawal.
Foundation models have the potential to change many processes that are now tedious or repetitive for humans. They also provide the opportunity to come up with radical and unpredictable solutions to some of the most difficult problems we face. In fact, fundamental models could represent a complete paradigm shift in the way knowledge is created and applied. The key will be to ensure that these models are made accessible to the general public, with the right safeguards.
Contents