ChatGPT started a new kind of AI race – and made text boxes cool again
It’s pretty obvious that no one saw ChatGPT coming. Not even OpenAI. Before it became, in some ways, the fastest-growing consumer app in history, before it popularized the phrase “generative, pre-trained transformers,” before every company you can think of raced to adopt the underlying model, ChatGPT launched in November as a ‘research preview’.
The blog post announcing ChatGPT is now a hilarious case study in underselling. “ChatGPT is a sister model of InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed answer. We are excited to introduce ChatGPT to get feedback from users and learn more about its strengths and weaknesses.” That is it! That’s the whole field! Not a poetic fuss about fundamentally changing the nature of our interactions with technology, not even a line about how cool it is. It was just a research preview.
But now, barely four months later, it looks like ChatGPT is really about to change the way we think about technology. Or, perhaps more accurately, change it back. Because as we go, the future of technology isn’t super fast interfaces or the metaverse. It’s “typing commands into a text box on your computer.” The command line is back – it’s just a lot smarter now.
Really, generative AI is going in two simultaneous directions. The first is much more infrastructural, adding new tools and capabilities to the things you already use. Major language models like Google’s GPT-4 and LaMDA will help you write emails and memos; they will automatically spice up your slideshows and correct any mistakes in your spreadsheets; they are going to edit your photos better than you; they are going to help you write code and in many cases just do it for you.
This is pretty much the path AI has been on for years, right? Google has integrated all kinds of AI into its products in recent years, and even companies like Salesforce have launched strong AI research projects. These models are expensive to create, expensive to train, expensive to search, and can potentially be game-changing for business productivity. AI improvements in products you already use is a big business – or at least investing in such a business – and will be for a long time to come.
The other AI direction, where interaction with the AI becomes a consumer product, was a much less obvious development. It makes sense now, of course: who not do you want to talk to a robot who knows all about movies and recipes and what to do in tokyo, and if i say just the right things things could go completely off the rails and try to make love to you? But before ChatGPT took the world by storm, and before Bing and Bard both got the idea and tried to turn it into their own products, I certainly wouldn’t have bet that typing in a chat window would become the next big thing in the user world. interfaces.
In a way, this is a return to a very old idea
In a way, this is a return to a very old idea. For many years, most users only communicated with computers by typing on a blank screen – the command line was how you told the machine what to do. (Yes, ChatGPT is a lot of machines, and they’re not on your desk, but you get the idea.)
But then a funny thing happened: we invented better interfaces! The problem with the command line was that you had to know exactly what to type and in what order the computer should behave. Pointing and clicking large icons was much easier, and it was much easier to teach people what the computer could do through pictures and icons. The command line gave way to the graphical user interface and the GUI still reigns supreme.
However, developers never stopped trying to make the chat UI work. WhatsApp is a good example: the company has spent years trying to figure out how users can use chat to communicate with businesses. Allo, one of Google’s many failed messaging apps, hoped you could interact with an AI assistant in chats with your friends. The first round of chatbot hype, circa 2016, made many very smart people think that messaging apps were the future of everything.
There’s just something appealing about the messaging interface, the “conversation AI.” It starts with us all knowing how to use it; Messaging apps are how we keep in touch with the people we care about the most, which means they are a place where we spend a lot of time and energy. You may not know how to navigate the slots of the Uber app or how to find your frequent flyer number in the Southwest app, but “text these words to this number” is a behavior almost everyone understands. In a market where people don’t want to download apps and mobile websites are still mostly bad, messaging can greatly simplify experiences.
While messaging isn’t the most sophisticated interface, it’s arguably the most extensible. Take Slack for example: you probably think of it as a chat app, but in that back-and-forth interface you can embed links, editable documents, interactive polls, informative bots, and much more. WeChat is famous as an entire platform – an entire internet, in fact – crammed into a messaging app. You can start messaging and go to many places.
But so many of these tools stumble in the same way. Chat is perfect for a quick exchange of information, such as during office hours: ask a question and get an answer. But browsing a catalog as a series of messages? No thanks. Buying a plane ticket with a thousand messages back and forth? Hard pass. It’s no different than voice assistants, and God help you if you’ve ever tried to buy even simple things with Alexa. (“For Charmin, say ‘three.'”) For the most complicated things, a visual and special user interface is much better than a message box.
And when it comes to ChatGPT, Bard, Bing and the rest, things get complicated very quickly. These models are smart and collaborative, but you still need to know exactly what to ask, in what way, and in what order to get what you want. The idea of a “fast engineer,” the person paying you to know exactly how to coax Stable Diffusion’s perfect image or how to get ChatGPT to generate just the right Javascript seems ridiculous, but it’s actually an extremely necessary part of the equation. It’s not unlike in the early computer age, when only a few people knew how to tell the computer what to do. There are already marketplaces where you can buy and sell really great prompts; there are prompt gurus and books about prompts; I suppose Stanford is already working on a Prompt Engineering major that everyone will be taking soon.
The remarkable thing about generative AI is that it feels like it can do almost anything. That’s the whole problem. If you can do anything, what do you do? Where do you start? How do you learn how to use it when your only window of opportunity is a blinking cursor? Ultimately, these companies could develop more visual, more interactive tools that help people really understand what they can do and how it all works. (This is one reason to keep an eye out for ChatGPT’s new plugin system, which is pretty basic for now, but could soon expand the things you can do in the chat window.) Right now, the best idea is that a of them, give a few suggestions on things you could type.
AI would become a feature. Now it’s the product. And that means the text box is back. Messaging is again the interface.