The Story So Far: Where Have All The AI Cowboys Gone?
Risk, power, and the yin and yang of large language models
When we last left Big Tech, it was secretly (and allegedly) trying to wield its influence in congress to clamp down on AI startups by making the case that AI is “unsafe.”
Now, “unsafe” is an odd objection to hear from the “move fast and break things” crowd, so my previous post may have left you wondering why Google, Meta, and the other large players — players who have their own internal AI efforts that are rumored to be very far ahead of what the public sees — don’t just squash the AI upstarts by unleashing their own models on the public and figuring out the guardrails and revenue models later.
The answer lies in the relationship between size, capability, and risk in large language models. Furthermore, this relationship has its roots in the mechanics of how these models work and in the nature of language, itself.
👉 To summarize up-front:
Big companies = big (+ costly to train and run) models = more capable + larger model attack surface + lower risk tolerance
Small startups = smaller (+ cheaper to train and run) models = less capable + smaller model attack surface + higher risk tolerance
I’ll try to untangle this relationship as best I can in this newsletter because it’s a key factor in the debate over who gets to use AI and how. This factor will also play a role in the impact of AI on the business models of Google and other big tech platforms.
Big companies, small risks
Google in particular is a company that has been willing to launch major products (or acquire them) and run them at great cost while assuming they’ll just figure out how to make money on them eventually. So you’d think the urgent threat from AI startups would warrant at least this level of risk-taking from them. Indeed, the creator of Gmail, one of those products they didn’t originally have a business model for, is now saying that Google’s search will be toast within two years thanks to ChatGPT-style products.
So why not bring whatever secret, internal Godlike AI they have to market, demonstrate that Google can still dominate competitors, and work on how to use it to sell ads later?
👷♂️ I’m not sure I know the complete answer to this question, but I do know that part of the answer really does have to be “safety” — but not so much the public’s safety as Google’s safety.
Google and Meta are simply bigger targets with more at risk from models gone wild. They have different relationships with the government, large advertisers, and consumers, and those relationships constrain their products in ways that are not true for much smaller players.
😬 So we end up in a world where Google and Facebook can make incredibly large, powerful models, but they just can’t show them off to the public because the risk of GIGO dunks is too great.
🤠 In contrast, smaller players are freer to go full cowboy and wow the public with their smaller models because the inevitable GIGO dunks aren’t as damaging to them.
🥱 Anyway, this point is pretty straightforward and not very interesting, but I felt I had to put it out there for completeness’ sake. It has also been reported very thoroughly in the WaPo article linked in this tweet, so click that if you want to read a lot more backstory on this issue of Big Tech’s unwillingness to risk GIGO dunks by releasing their AI for public use:
With that out of the way, I can get to the really interesting part of this update, which is where I unpack the reasons why really big, powerful models must necessarily have all this bad stuff in them that their corporate overlords need to keep hidden from the public.
Big models, big attack surfaces
🎯 There’s a part in the intro to this piece where I say a bigger model means a “larger model attack surface.” What I mean by this is that bigger models know more bad/toxic/problematic/false things than smaller models, so their internals present fatter targets for haters who want to coax out trophy examples of “toxic AI” for viral tweets and articles.
In this section, I want to explain why there’s no “fix” for the problem of models having naughty stuff inside them that, if accessed by the public, could get BigCos in trouble. This is going to always be how it is, and there’s not even really a theoretical solution for it. The problems are quite fundamental.
Here’s what’s going on:
📈 Because of the way scaling laws work in ML, throwing more data and more computer power at training and running a model translates into the following two benefits:
The model memorizes more things in its dataset verbatim. This memorization used to be thought of as a problem, but more recently it’s understood to play a role in the relationship between scale and how good the model is at doing whatever it was trained to do.
The model knows more things about its dataset. It has a better internal representation of the relationships between different concepts at different levels of attraction.
For an example of #2 above, a small model trained on less data might know that a kitten is a juvenile cat and a mammal, but it may not make a connection with cat deities in ancient Egypt. A larger model, in contrast, may have seen a big enough chunk of the internet to have that “ancient Egyptian cat deities” in its training data and enough parameters to represent all the relevant concepts (i.e., kitten, cat, juvenile/mature, antique/modern, Egypt, deity) and the relationship between those things.
In theory, both of the above benefits of model size should be unqualified goods, but in practice, they both have very serious downsides that can expose the model makers to reputational and legal risks. Let’s take these one at a time.
Memorization’s downsides
Everyone knows that LLMs memorize input data, and we’ve been seeing this come up in the anti-Codex lawsuit where the model is spitting out GPLed code verbatim. But it comes up in images, too:
©️ The obvious problem with a large model memorizing its inputs is copyright violation.
If the model is memorizing copyrighted inputs and reproducing them, which models are definitely doing and are going to do a lot more of as they get bigger, this will give ammo to the alliance of generative AI haters I described in my previous post. Here are the main options for fixes
Train models on public domain material
Win the lawsuits in court and establish new precedents
Abolish copyright, or at least significantly curtail it
🫢 The other problem with memorization is that secrets and personal info can show up in the model’s output. This can theoretically be solved by more careful human curation of training data, but it is a hard problem and will probably be with us for a while.
Now let’s move on to the downsides associated with the model knowing more things, especially more things about language. This is where it gets interesting.
Knowing too much
🧠 In the course of a training run, an LLM builds up an internal map of the many relationships between the tokens (words, images, computer code, etc.) in its training data. Without going into any detail on exactly how that works, the take-home is this: an LLM like ChatGPT needs to bootstrap a sophisticated understanding of human language starting from zero, so the more examples of language you train it on and the more computing power you throw at it the better it can do that. Right now, LLM makers are training on really big, wild-and-wooly datasets because they have to.
(🌈 FYI: The relationships an LLM infers among the tokens are represented mathematically by sets of numbers called embeddings, which you can think of as coordinates in multidimensional space. If you want to know more about how embeddings work, this is a good intro. )
🤷♂️ But not every human understands language in the same way or uses it in the same way. Not everyone is working with the same facts, words, or relationships between words and concepts.
More importantly, though: even people who do indeed share the same facts, words, and relationships between words and concepts often make very divergent choices about how to reflect their internal understandings in their own outward-facing speech. They make different choices because their speech acts occur in different contexts of relationships, status, and power — none of which can really be applied to a one-size-fits-all, public-facing language model.
➡️ To sum up: You have this huge pile of language that you’ve trained this AI on for the purpose of helping it understand language well enough to produce it fluently, and inside that pile is good language and bad language, or good concepts and bad concepts, or true relationships and false relationships.
💡 You might naively think the answer is to just train the model on only the good language — teach it only things that are true, beautiful, and unproblematic or non-cancel-worthy.
It doesn’t work this way, though. You have to teach the model the bad along with the good. I’m not the first person to point this out (I can’t find the link where I first read about this issue, but if I do I’ll update. Update: Found it!), and the reasons for it are pretty deep. But I can illustrate the point easily with a pair of examples.
Examples: Limericks and nekkid people
Let’s say I’m training my own version of ChatGPT — call it “JonGPT” — and I want it to produce limericks. JonGPT is supposed to be G-rated so my kids can use it, but of course, most limericks are pretty bawdy.
To pull this off, I’ll have a tough time finding a big enough corpus of mostly clean limericks to really give my model a good understanding of how to make them, so I’ll need to train JonGPT on a database of mostly bawdy limericks.
So there’s a practical problem with the naive plan to train it on clean limericks only, but there’s also a more fundamental problem: if the model has never seen a bawdy limerick then it won’t know if it has accidentally produced one.
🐍🍎 The model cannot act in ways that are good if it doesn’t even know what bad looks like. It has to eat from the tree of the knowledge of good and bad if it’s going to actively avoid stumbling into the bad.
The solution, then, is something like the following:
Train the model on both dirty limericks and clean limericks
Somehow educate the model about the difference between the two types of limericks
Ensure that the model produces only clean limericks
This seems to be what ChatGPT has more or less done because the limerick results are astonishingly lame:
😅 The model is really stretching to keep it clean — you can see the sweat on its virtual brow like in that classic scene from “The Ladies Man” where Tim Meadows is interviewing the nun… (This is a family-friendly newsletter so I won’t link that scene, but if you’re an adult and you haven’t seen the movie, then Google it because it’s hilarious and the only really funny scene in the whole movie.)
My points:
Training the model on both types of limericks is both easy and necessary
Educating the model about the difference between the two limerick types is harder but we can now do it with RHLF.
Keeping the model from accidentally surfacing its knowledge of dirty limericks when it’s producing output is the hard part, and is where OpenAPI’s moderation API comes in.
If you can’t pull off number 3 — i.e., preventing your model from sharing with the public the parts of its internal knowledge that will expose you to reputational and legal risk — then as a BigCo you can’t let the public use it.
🙈 My second example is from Stability AI and my recent coverage of the controversial differences between version 1.5 of Stable Diffusion and version 2.0. Version 1.5 was trained on pictures of naked people — there was porn in the training data — and the naughty bits in its dataset have given it two important abilities:
It’s good at rendering human bodies from different angles and in varied states of dress or undress.
It can be used to render porn.
When Stability AI depornified the dataset for version 2.0, it was a lot worse at rendering human bodies and users revolted. So they had to release a tweak of the model (2.1) where they’d relaxed the rules on nudity to quell some of the outrage. Again, the problematic training data was critical for the model’s capabilities and effectiveness.
🤷♀️ Unlike with OpenAI’s heroic efforts at keeping the limericks G-rated, Stability AI didn’t even try steps two and three above with their generative image model. They just put it out there for people to use however they liked, and the results are… impressive, as I mentioned in a previous newsletter.
The lesson here is that even the most righteous committee of woke commissars could not train an unproblematic model, and not even the Vatican could make a model that knows no sin — at least, not the model is going to be capable enough to be interesting. Such pure, virtuous training is not even theoretically possible. Thus it is that LLMs by their very nature must contain the good along and the bad, the light and the dark, the yin and the yang.
Note that this will be true even if and when we figure out how to train highly capable language models on much smaller bodies of text. (This is an area of ongoing research, and I’m sure there will be a lot of rapid progress.) For the simple reason that an overly naive model might accidentally produce a toxic output, we’ll always be training language models on both bad and good data.
Problems out the yin-yang
☯️ Given the yin/yang situation with LLMs, the only thing a Google or a Facebook can do to protect its downside is to prevent the bad stuff that’s in their big models from getting out. But the lower a company’s risk tolerance the harder it is to sanitize its models adequately, which is why the following balance of power will exist in this configuration for the foreseeable future:
BigCos will keep their bigger, more powerful models under lock and key, and only let the public touch them in very limited, lame, and unimpressive ways.
Startups will push their smaller, less powerful models out to a public that will be wowed by them and will use them in ways that cause the BigCos immense heartburn and potentially fatal damage to the bottom line.
This is the way it is right now, and I’m not even sure there’s another way it could possibly be in the future.
😈 Actually, there is one other option, and it’s the one Microsoft has availed itself of: invest in a startup (OpenAI) that has a more aggressive risk profile, give that arms-length startup access to your massive pool of compute resources for training and running its models, and then let that startup absorb the blowback from unleashing this giant model on the public.
I wonder if we won’t see more big tech companies pursue a similar approach where they fund a startup that can then take the risks that they can’t. Alphabet could spin out Google Brain, or Meta could spin out its own AI efforts. I wouldn’t be surprised to see this happen.
I suspect the issue with OpenAI is not so much low risk tolerance, as Sam Altman being "woke".