I don't recall being promised a black box. Are we certain llms didn't write this article and just came up with one of those It's Not Whatever-random-thing pithy zinger kind of things that they're prone to?
> Ask it "what is the capital of the state containing Dallas" and you can observe, in order:
> the Dallas feature goes active,
> which causes the Texas feature to light up,
> which then causes Austin to light up.
> It seems fairly clear that this is tracing semantic relationships between high-level concepts — and in doing so, performing a kind of pseudo-symbolic inference, similar to what some philosophers would describe as "higher reasoning."
Uhhh no reasoning is required for Austin to follow Texas after Dallas, let alone "higher reasoning".
It is a characteristic of neural nets that they do not have insight into their own functioning.
It is arguably a characteristic of any intelligent system, that at least some part of it must be opaque to itself, but the previous sentence is more defensible than a generalized claim.
If you don't understand what that means, tell me from your own metacognitive insight what parts of your brain are being used to read this. Not because of learned knowledge about what parts of the brain do what, through your own insight in your own functioning. You can't, because you don't have any.
This isn't just that human rationalize a lot. This is below that. This is that even if you notice yourself rationalizing, which is something you can train yourself to do, you have no access to the underlying computations/processes of the rationalization itself, or the process of noticing you are rationalizing.
There is arguably still a sense that we experience in which we humans could reasonably say "No, I'm pretty sure I used addition-with-carry to answer you", so that is perhaps not the easiest example to think about the experience of. But there will always be some question of "how did you do that" to which you can give no answer because the answer is in the firing of the neural net itself and you, who is in one way or another the product of that firing, do not have access to that. How did you quickly catch that ball that someone unexpectedly threw at you? You just did, as far your neural net is concerned.
(Also, while I've expressed this in terms of your conscious experience, this doesn't have anything to do with "consciousness". Neural nets in general do not get this feedback and do not and can not have arbitrary metacognition about their own functioning. This is an artifact of my writing text to address conscious beings.)
Yep. An effect cannot in itself reason about its cause. Some effects can suggest causes though. For example you tend to have memory of an algorithm you just executed, when the usage was at least somewhat conscious or intended, which can lead you to be able to guess which algorithm you used (and potentially even correctly)
As trivial as that example is, it boggles my mind just how large the scale gets of things about me I do not fully understand or cannot explain. It feels different than losing track of what I just did, because a memory of it never existed to lose in the first place. For example, as much as I can try to reason about reasons for executive dysfunction, I cannot seem to understand the real actual equation that results in me being willing or not to do something. It feels like my own brain disagrees with me, and that's so frustrating, and I've been trying to rationalize it for years but in the end I just do not know, and likely cannot know.
A while back during a particularly rough patch where everything was going wrong, I started thinking, "man, I really hope I'm being stupid and doing it wrong..." (because then I can stop doing that!)
And wouldn't you know it, I keep getting my wish :)
Sure, I am not incapable of metacognition, I am just saying I have observed (and keep observing) cases where I lacked real metacognitive insight into something.
If you asked me last year(2025) I would have still said LLMs are a silly toy.
As of Jan 2026 I have come to accept that LLMs are at least part of the puzzle of how intelligence works. They are at this point better than the majority of humans at various intellectual tasks. It may not be or ever be a 1:1 but good enough ran the world already before llms.
There is not even a formal definition of what intelligence is so saying LLM's are intelligent can't even be "right/wrong". Its just arguing semantics and definitions.
They are better than humans at tasks that require information recall and application to specific task.
For example, front end web app layout and basic functionality. Anyone can make a website with interactive buttons with ease now, where as before, you had to go look up examples, try stuff, figure out why its not working, e.t.c.
But in terms of organization and higher level tasks, like for example making front end that is clean, robust, easily extensible, and doesn't break, LLMs require almost as much prompting to do this as it takes to actually write the code.
"How can we understand what an LLM is "thinking"? It's clearly very valuable to do so — it could enable steering model behavior, detecting dangerous intent, and more."
Well that is complete any utter bollocks, dribbled in para three or so, and obviously written by a next token guesser.
LLMs are tools and I'm pretty sure if I let you loose on some of my tools, you might lose an extremity unless I kept an eye on you.
I have an on prem Qwen3.6-35B-A3B-UD-Q4_K_XL working on a box in the office and its quite handy for a chat.
> the Dallas feature goes active,
> which causes the Texas feature to light up,
> which then causes Austin to light up.
> It seems fairly clear that this is tracing semantic relationships between high-level concepts — and in doing so, performing a kind of pseudo-symbolic inference, similar to what some philosophers would describe as "higher reasoning."
Uhhh no reasoning is required for Austin to follow Texas after Dallas, let alone "higher reasoning".
This is really grasping as straws
It is arguably a characteristic of any intelligent system, that at least some part of it must be opaque to itself, but the previous sentence is more defensible than a generalized claim.
If you don't understand what that means, tell me from your own metacognitive insight what parts of your brain are being used to read this. Not because of learned knowledge about what parts of the brain do what, through your own insight in your own functioning. You can't, because you don't have any.
This isn't just that human rationalize a lot. This is below that. This is that even if you notice yourself rationalizing, which is something you can train yourself to do, you have no access to the underlying computations/processes of the rationalization itself, or the process of noticing you are rationalizing.
There is arguably still a sense that we experience in which we humans could reasonably say "No, I'm pretty sure I used addition-with-carry to answer you", so that is perhaps not the easiest example to think about the experience of. But there will always be some question of "how did you do that" to which you can give no answer because the answer is in the firing of the neural net itself and you, who is in one way or another the product of that firing, do not have access to that. How did you quickly catch that ball that someone unexpectedly threw at you? You just did, as far your neural net is concerned.
(Also, while I've expressed this in terms of your conscious experience, this doesn't have anything to do with "consciousness". Neural nets in general do not get this feedback and do not and can not have arbitrary metacognition about their own functioning. This is an artifact of my writing text to address conscious beings.)
And wouldn't you know it, I keep getting my wish :)
As of Jan 2026 I have come to accept that LLMs are at least part of the puzzle of how intelligence works. They are at this point better than the majority of humans at various intellectual tasks. It may not be or ever be a 1:1 but good enough ran the world already before llms.
There is not even a formal definition of what intelligence is so saying LLM's are intelligent can't even be "right/wrong". Its just arguing semantics and definitions.
For example, front end web app layout and basic functionality. Anyone can make a website with interactive buttons with ease now, where as before, you had to go look up examples, try stuff, figure out why its not working, e.t.c.
But in terms of organization and higher level tasks, like for example making front end that is clean, robust, easily extensible, and doesn't break, LLMs require almost as much prompting to do this as it takes to actually write the code.
Well that is complete any utter bollocks, dribbled in para three or so, and obviously written by a next token guesser.
LLMs are tools and I'm pretty sure if I let you loose on some of my tools, you might lose an extremity unless I kept an eye on you.
I have an on prem Qwen3.6-35B-A3B-UD-Q4_K_XL working on a box in the office and its quite handy for a chat.