• 0 Posts
  • 9 Comments
Joined 1 year ago
cake
Cake day: June 12th, 2023

help-circle
  • In the case I mentioned, it was just a poorly aligned LLM. The ones from OpenAI would almost definitely not do that. That’s because they go through a process called RLHF where those sorts of negative responses get trained out of them for the most part. Of course there’s still stuff that will get through, but unless you are really trying to get it to say something bad, it’s unlikely to do something like in that article. That’s not to say they won’t say something accidentally harmful. They are really good at telling you things that sound extremely plausible but are actually false because they don’t really have any way of checking by default. I have to cross check the output of my system all the time for accuracy. I’ve spent a lot of time building in systems to make sure it’s accurate and it generally is on the important stuff. Tonight it did have an inaccuracy, but I sort of don’t blame it because the average person could have made the same mistake. I had it looking up contractors to work on a bathroom remodel (fake test task) and it googled for the phone number of the one I picked from its suggestions. Google proceeded to give a phone number in a big box with tiny text saying a different company’s name. Anyone not paying close attention (including my AI) would call that number instead. It wasn’t an ad or anything, just somehow this company came up in the little info box any time you searched for the other company.

    Anyway, as to your question, they’re actually pretty good at knowing what’s harmful when they are trained with RLHF. Figuring out what’s missing to prevent them from saying false things is an open area of research right now, so in effect, nobody knows how to fix that yet.





  • SirGolan@lemmy.sdf.orgtoasklemmy@lemmy.mlDeleted
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Yeah. It still definitely does! The interesting thing is that it seems to be very good at estimating and the final answer it gives is usually pretty close to correct in my experience. Of course close doesn’t really count in math problems.


  • SirGolan@lemmy.sdf.orgtoasklemmy@lemmy.mlDeleted
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    Not surprised. I got access to bard a while back and it does quite a lot more hallucinating than even GPT3.5.

    Though it doubling down on the wrong answer even when corrected is something I’ve seen GPT4 do even in some cases. It seems like once it says something, it usually sticks to it.



  • SirGolan@lemmy.sdf.orgtoasklemmy@lemmy.mlDeleted
    link
    fedilink
    English
    arrow-up
    7
    ·
    1 year ago

    From ChatGPT 4:

    Yes, the box is awesome.

    According to the provided statements, the box is yellow and red.

    “If the box is yellow, it is good.” - So, since the box is yellow, it is good.

    “If the box is red, it is happy.” - And since the box is red, it is happy.

    Finally, “If the box is good and happy, the box is awesome.” - Therefore, because the box is both good (due to being yellow) and happy (due to being red), the box is indeed awesome.