Anthropic can now track the bizarre inner workings of a large language model
(www.technologyreview.com)
from otter@lemmy.ca to programming@programming.dev on 27 Mar 2025 18:33
https://lemmy.ca/post/41355616
from otter@lemmy.ca to programming@programming.dev on 27 Mar 2025 18:33
https://lemmy.ca/post/41355616
#programming
threaded - newest
“Why does it keep looking at Furry porn…?”
just a taste :
For some reason I don’t find it very bizarre. I’d even speculate that a random human mind isn’t any less weird. Surly, the pathways of my thoughts are often very bizarre. 😅
Cyber neurosurgeons are going to be a thing.
Interesting how these findings refute the assertion that LLMs are just predicting the next word. Sometimes they plan ahead.
The official Anthropic post/announcement
Very interesting read
The math guessing game (lol), the bullshitting of “thinking out loud”, being able to identify hidden (trained) biases, looking ahead when producing text, following multi-step reasoning, analyzing jailbreak prompts, analysis of antihallucination training and hallucinations