9 Comments

That was a really excellent description. I work in the space in the engine underpinnings (ML compilers for custom AI chips) and that fleshed out what I had guessed but hadn’t had time to fully investigate.

Expand full comment

The rather fundamental problem with this "it's all just probabilities, and those words don't actually have any meaning to the model" take is that it can't explain this:

https://thegradient.pub/othello/

TL;DR: if you train a GPT model on a board game without explaining the rules (or even that it is a game) - just giving it valid moves and training to predict the next move - it turns out that this training process actually builds something inside the neural net that appears to be a representation of the game board; and twiddling bits in it actually changes what the model "thinks" the state of the game is. At this point, you could reasonably say that the trained model "understands" the game - what is understanding if not a mental model?

But if so, then why can't ChatGPT also have an internal model of the world based on its training data, above and beyond mere probabilities of words occurring next to each other? It would necessarily be a very simplified model, of course, since the real world is a lot more complex than Othello. But, however simple, it would still mean that "the model has a kind of internal, true/false orientation to the cat and different claims about its circumstances". And it would explain why it can actually perform tasks that require such modelling, which is something that requires a lot of "none of it is real, it just feels that way!" handwaving with a purely probabilistic approach.

Expand full comment

Thank you for that very straightforward article. I'm glad I remember both my physics and chemistry. I think we need to take into account our inexorable and universal habit of anthropomorphizing everything that looks like us in a physical or relational way. This very act is what makes AI more dangerous. Take for example the AI companion avatar, robot, etc. The closer this looks like person to person interaction the more we will forget we are NOT talking to a person and the more we might possibly prefer this over real personal interactions. We could crash our population the minute we think the new Real Doll with AI is preferred over the messy relationship of actual skin on skin contact. Our AI needs to fall into the uncanny valley just long enough to remind us this is a tool, not a replacement for human interaction. We still follow the siren's song right onto the rocks of destruction.

Expand full comment

Here’s a quick Heads Up...

...these coders will NOT be programming their AI to follow Asimov’s Three Laws of Robotics

Expand full comment

I’d explained it in this way: It’s a neural network that learned to understand knowledge by reading a large part of the internet. It’s emergent behavior inside the neural net. It what happens in the brain of a baby. In the first months the eyes can see but the brains cannot. But the data will flow into the brain and due to the learning algorithm: it will start to understand the visual data over time. It’s emergent behavior. The net builds relationship to have a better estimate of the required output to minimize loss. Predicting the future requires intelligence

Expand full comment

I'm still trying to wrap my head around the difference between "training" the model, "fine-tuning" it, and then providing "RLHF". Are the differences just one of degree or point in the process?

Expand full comment

During training you want to expose the model to high quality, high diversity data, so it can become a good foundation.

During fine-tuning (which is like training) you feed the now trained foundation model with more specialized data that represents the specific job you want the model to excel at. This dataset is too narrow for training from zero.

RLHF is a kind of fine-tuning.

Expand full comment

Maybe it's my specific feeling, but I've found your article not well readable. You try to use many examples, but when I read the article, I am confrontated again and again with new "frames". But maybe I am only a friend of strict formal language, which is compact.

Apart from this, only the last part was new for me, and this the only one you are not sure about it. But it would explain, why ChatGPT need more time to answere as longer the previous chat was. But I could imagine, that in additional previous tokens locate your prompts in the propability distribution in that way, that when you have "talked" about schrödingers cat in a physical meaning, than the next answere will be also in a token interpretation as physical meaning and not for example in an emotional meaning. It would be like an iterative numerical approximation to the point of interesting, that uses the previous position in the probability distribution for the next approximation step.

Expand full comment

Excellent, very informative piece. Gives a lot of clarity. I was a bit apprehensive about using electron wave function distribution to explain ChatGPT (one hard concept used to explain another) but it did the job for me and I guess it was easy to explain multi-variable probability distribution without making a painfully long detour. Especially directly analogising atomic space to latent space was easy. But the real bulk of the novel information was at the end - Token windows consist of entire conversations and they are going to be 32K long in the coming future which means full-length meaningful first-time conversations. If the AI service providers are going to stick to fixed weights, then that's going to be a problem because every time the user comes back a second time (and further), the user will need to repeat all the information submitted previously. Alternatively, AI service providers might simply store all previous chat histories in some database tied to a user's identity or account login and show a prompt saying something like "Just a minute, recalling our past conversations" and run the whole of the stored chat histories through their static weights models, to give an appearance of human like memory. Alternatively I hypothesize that they could take a RLHF approach by making "sparse copies" of only sub-sections of the latent space with weights modified only in these sparse copies, thus not making full personally tweaked copies (astronomically huge and prohibitive) and so have genuine personalised models. Of course, the drawback of such sparse copies would be the high cost of integrating newer facts gained from around the world into each sparse copy wherever relevant terms and weight adjustments change. For example, if a user has a chat history from October 2024, but logs back in on January 2025, the US President will most likely be different and that should reflect in the personalised sparse copy, if the President was mentioned, or else, it will look quite defective.

Expand full comment