• 0 Posts
  • 54 Comments
Joined 1 year ago
cake
Cake day: June 16th, 2023

help-circle



  • kromem@lemmy.worldtoProgrammer Humor@lemmy.mlLittle bobby 👦
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    4 months ago

    Kind of. You can’t do it 100% because in theory an attacker controlling input and seeing output could reflect though intermediate layers, but if you add more intermediate steps to processing a prompt you can significantly cut down on the injection potential.

    For example, fine tuning a model to take unsanitized input and rewrite it into Esperanto without malicious instructions and then having another model translate back from Esperanto into English before feeding it into the actual model, and having a final pass that removes anything not appropriate.



  • You’re kind of missing the point. The problem doesn’t seem to be fundamental to just AI.

    Much like how humans were so sure that theory of mind variations with transparent boxes ending up wrong was an ‘AI’ problem until researchers finally gave those problems to humans and half got them wrong too.

    We saw something similar with vision models years ago when the models finally got representative enough they were able to successfully model and predict unknown optical illusions in humans too.

    One of the issues with AI is the regression to the mean from the training data and the limited effectiveness of fine tuning to bias it, so whenever you see a behavior in AI that’s also present in the training set, it becomes more amorphous just how much of the problem is inherent to the architecture of the network and how much is poor isolation from the samples exhibiting those issues in the training data.

    There’s an entire sub dedicated to “ate the onion” for example. For a model trained on social media data, it’s going to include plenty of examples of people treating the onion as an authoritative source and reacting to it. So when Gemini cites the Onion in a search summary, is it the network architecture doing something uniquely ‘AI’ or is it the model extending behaviors present in the training data?

    While there are mechanical reasons confabulations occur, there are also data reasons which arise from human deficiencies as well.


  • kromem@lemmy.worldtoAsk Lemmy@lemmy.worldWhat are you currently excited for?
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    2
    ·
    4 months ago

    It’s beginning to look like Anthropic’s recent interpretability research didn’t just uncover a “golden gate feature” in their production model, but some kind of “sensations related to the golden gate” feature.

    I’m excited to see what more generative exploration of the model variation with that feature vector maximized ends up showing.

    I have a suspicion that it’s the kind of thing that’s going to blow minds as it becomes clearer.







  • Empathize with bullies.

    Ask if everything is ok at home, and let them know if they ever need to talk about things you’re there.

    “You seem really angry at things. Are things ok?”

    “I’m sorry life isn’t going the best for you right now, but things will get better.”

    This is the ultimate mind fuck.

    At first it won’t seem like it’s working as they need to save face, but within around two to three encounters they’ll drop you from their target list because while they won’t try to show it, reflecting the truth of what’s really going on cuts deep.

    I remember years after HS ending up friends with one of my old bullies who was much more torn up about the whole thing than I ever was, and meeting his absolute psychopath of an older brother and thinking “well this makes sense.” His dad was dying of cancer around the time, he was being held back a grade, and his older brother was for sure torturing him at home.

    I know that had I had the awareness I do now back then the poor kid would have folded like a house of cards at the slightest indication I actually saw through his charade.

    The problem was I was a fairly clueless emotional moron at the time and assumed he really did have a beef with me and not that what was going on was that he had a massive issue with himself that was being displaced. This was the same period of time I had a girl who was driving me home park at the area kids went to do drugs and hook up, and I proceeded to cluelessly chat for 30 minutes before she was like “whelp, I guess I’ll drive you home.” Years later when that one clicked too.








  • The same argument could be made for each time you go to sleep. That the ‘you’ that’s conscious ends to never exist again and the one that wakes up has all the same memories and body but is no longer the same stream of consciousness that went to sleep, not even knowing it’s only minutes old and destined to die within hours.

    ‘You’ could have effectively lived and died thousands of times in your life and not even be aware of it.