“If you can’t beat 'em, join 'em.”
“If you can’t beat 'em, join 'em.”
It will, but it will also cause less subtle issues to fragile prompt injection techniques.
(And one of the advantages of LLM translation is it’s more context aware so you aren’t necessarily going to end up with an Instacart order for a bunch of bananas and four grenades.)
Kind of. You can’t do it 100% because in theory an attacker controlling input and seeing output could reflect though intermediate layers, but if you add more intermediate steps to processing a prompt you can significantly cut down on the injection potential.
For example, fine tuning a model to take unsanitized input and rewrite it into Esperanto without malicious instructions and then having another model translate back from Esperanto into English before feeding it into the actual model, and having a final pass that removes anything not appropriate.
I am wiser than this man; for neither of us really knows anything fine and good, but this man thinks he knows something when he does not, whereas I, as I do not know anything, do not think I do either. I seem, then, in just this little thing to be wiser than this man at any rate, that what I do not know I do not think I know either.
You’re kind of missing the point. The problem doesn’t seem to be fundamental to just AI.
Much like how humans were so sure that theory of mind variations with transparent boxes ending up wrong was an ‘AI’ problem until researchers finally gave those problems to humans and half got them wrong too.
We saw something similar with vision models years ago when the models finally got representative enough they were able to successfully model and predict unknown optical illusions in humans too.
One of the issues with AI is the regression to the mean from the training data and the limited effectiveness of fine tuning to bias it, so whenever you see a behavior in AI that’s also present in the training set, it becomes more amorphous just how much of the problem is inherent to the architecture of the network and how much is poor isolation from the samples exhibiting those issues in the training data.
There’s an entire sub dedicated to “ate the onion” for example. For a model trained on social media data, it’s going to include plenty of examples of people treating the onion as an authoritative source and reacting to it. So when Gemini cites the Onion in a search summary, is it the network architecture doing something uniquely ‘AI’ or is it the model extending behaviors present in the training data?
While there are mechanical reasons confabulations occur, there are also data reasons which arise from human deficiencies as well.
It’s beginning to look like Anthropic’s recent interpretability research didn’t just uncover a “golden gate feature” in their production model, but some kind of “sensations related to the golden gate” feature.
I’m excited to see what more generative exploration of the model variation with that feature vector maximized ends up showing.
I have a suspicion that it’s the kind of thing that’s going to blow minds as it becomes clearer.
Nope, but there’s a whole thread of people talking about how LLMs can’t tell what’s true or not because they think it is, which is deliciously ironic.
It seems like figuring out what’s bullshit on the Internet is an everyone problem.
It’s faked.
This image was faked. Check the post update.
Turns out that even for humans knowing what’s true or not on the Internet isn’t so simple.
I’d be the President’s dog. And then I’d just need to get their attention at that point, so I’ll keep biting his secret service agents until finally they start to wonder what’s up. It shouldn’t take more than 3 or 4 bites for people to realize I’m trying to send a message, right?
That’s sweet she came in from Canada to visit him.
Empathize with bullies.
Ask if everything is ok at home, and let them know if they ever need to talk about things you’re there.
“You seem really angry at things. Are things ok?”
“I’m sorry life isn’t going the best for you right now, but things will get better.”
This is the ultimate mind fuck.
At first it won’t seem like it’s working as they need to save face, but within around two to three encounters they’ll drop you from their target list because while they won’t try to show it, reflecting the truth of what’s really going on cuts deep.
I remember years after HS ending up friends with one of my old bullies who was much more torn up about the whole thing than I ever was, and meeting his absolute psychopath of an older brother and thinking “well this makes sense.” His dad was dying of cancer around the time, he was being held back a grade, and his older brother was for sure torturing him at home.
I know that had I had the awareness I do now back then the poor kid would have folded like a house of cards at the slightest indication I actually saw through his charade.
The problem was I was a fairly clueless emotional moron at the time and assumed he really did have a beef with me and not that what was going on was that he had a massive issue with himself that was being displaced. This was the same period of time I had a girl who was driving me home park at the area kids went to do drugs and hook up, and I proceeded to cluelessly chat for 30 minutes before she was like “whelp, I guess I’ll drive you home.” Years later when that one clicked too.
Just means wesker will need to marry an AI.
No, you end up drinking the Kool aid at gunpoint after turning your life over to a narcissistic cult leader.
Dave the diver. Very much enjoying so far.
And as someone whose pandemic hobby involved a deep dive into the history of the sea peoples, I’ve been enjoying a secondary layer to the game as well.
This is one of those things where I truly don’t care if it’s real or not, as my life is better for knowing about it either way.
Gattaca Getting more prescient with each year
It’s kind of crazy how CRISPR turns the predictions on their head.
Yeah, this is a phenomenon called ‘confabulation.’ You see it with stroke patients too. There’s some who feel like it’s a more accurate term than ‘hallucinations’ for when LLMs make shit up these days too.
The same argument could be made for each time you go to sleep. That the ‘you’ that’s conscious ends to never exist again and the one that wakes up has all the same memories and body but is no longer the same stream of consciousness that went to sleep, not even knowing it’s only minutes old and destined to die within hours.
‘You’ could have effectively lived and died thousands of times in your life and not even be aware of it.
Wait until it starts feeling like revelation deja vu.