Why is no one talking about how unproductive it is to have verify every "hallucination" ChatGPT gives you?

I’m convinced people who can’t tell when a chat bot is hallucinating are also bad at telling whether something else they’re reading is true or not. What online are you reading that you’re not fact checking anyway? If you’re writing a report you don’t pull the first fact you find and call it good, you need to find a couple citations for it. If you’re writing code, you don’t just write the program and assume it’s correct, you test it. It’s just a tool and I think most people are coping because they’re bad at using it

Yeah. GPT models are in a good place for coding tbh, I use it every day to support my usual practice, it definitely speeds things up. It's particularly good for things like identifying niche python packages & providing example use cases so I don't have to learn shit loads of syntax that I'll never use again.
- In other words, it's the new version of copying code from Stack Overflow without going to the trouble of properly understanding what it does.

I just tried out Gemini.

I asked it several questions in the form of 'are there any things of category x which also are in category y?' type questions.

It would often confidently reply 'No, here's a summary of things that meet all your conditions to fall into category x, but sadly none also fall into category y'.

Then I would reply, 'wait, you don't know about thing gamma, which does fall into both x and y?'

To which it would reply 'Wow, you're right! It turns out gamma does fall into x and y' and then give a bit of a description of how/why that is the case.

After that, I would say '... so you... lied to me. ok. well anyway, please further describe thing gamma that you previously said you did not know about, but now say that you do know about.'

And that is where it gets ... fun?

It always starts with an apology template.

Then, if its some kind of topic that has almost certainly been manually dissuaded from talking about, it then lies again and says 'actually, I do not know about thing gamma, even though I just told you I did'.

If it is not a topic that it has been manually dissuaded from talking about, it does the apology template and then also further summarizes thing gamma.

...

I asked it 'do you write code?' and it gave a moderately lengthy explanation of how it is comprised of code, but does not write its own code.

Cool, not really what I asked. Then command 'write an implementation of bogo sort in python 3.'

... and then it does that.

...

Awesome. Hooray. Billions and billions of dollars for a shitty way to reform web search results into a coversational form, which is very often confidently wrong and misleading.

Idk why we have to keep re-hashing this debate about whether AI is a trustworthy source or summarizer of information when it's clear that it isn't - at least not often enough to justify this level of attention.

It's not as valuable as the marketing suggests, but it does have some applications where it may be helpful, especially if given a conscious effort to direct it well. It's better understood as a mild curiosity and a proof of concept for transformer-based machine learning that might eventually lead to something more profound down the road but certainly not as it exists now.

What is really un-compelling, though, is the constant stream of anecdotes about how easy it is to fool into errors. It's like listening to an adult brag about tricking a kid into thinking chocolate milk comes from brown cows. It makes it seem like there's some marketing battle being fought over public perception of its value as a product that's completely detached from how anyone actually uses or understands it as a novel piece of software.
- Probably it keeps getting rehashed because people who actually understand how computers work are extremely angry and horrified that basically every idiot executive believes the hype and then asks their underlings to inplement it, and will then blame them for doing what they asked them to do when it turns out their idea was really, unimaginably stupid, but idiot executive gets golden parachute and software person gets fired.
  
  That, and/or the widespread proliferation of this bullshit is making stupid people more stupid, and just making more people stupid in general.
  
  Or how like all the money and energy spent on this is actively murdering the environment and dooming the vast majority of our species, when it could be put toward building affordable housing or renovating crumbling infrastructure.
  
  Don't worry, if we keep throwing exponential increasing amounts of effort at the thing with exponentially diminishing returns, eventually it'll become God!
- to fool into errors
  
  tricking a kid
  
  I've never tried to fool or trick AI with excessively complex questions. When I tried to test it (a few different models over some period of time - ChatGPT, Bing AI, Gemini) I asked stuff as simple as "what's the etymology of this word in that language", "what is [some phenomenon]". The models still produced responses ranging from shoddy to absolutely ridiculous.
  
  completely detached from how anyone actually uses
  
  I've seen numerous people use it the same way I tested it, basically a Google search that you can talk with, with similarly shit results.
And then more money spent on adding that additional garbage filter to the beginning and the end of the process which certainly won't improve the results.
copilot did the same with basic math. just to test it I said "let's say I have a 10x6 rectangle. what number would I have to divide width and height by, in order to end up with a rectangle that's half the area?"

it said "in order to make it half, you should divide them by 2. so [pointlessly lengthy steps explaining the divisions]"

I said "but that would make the area 5x3 = 15 units which is not half the area of 60"

it said "you're right! in order to ... [fixing the answer to √2 using approximation"

I don't know if I said it then, or after some other fucking nonsense but when I said "you're useless" it had the fucking audacity to take offense and end the conversation!

like fuck off, you don't get to have fake pride if you don't have basic fake intelligence but use it in your description.
- Its a perfect encapsulation of the corpo mindset:
  
  Whatever I do is profound, meaningful, with endless possibilities for future greatness...
  
  ... even though I'm just talking out of my ass 99% of the time...
  
  ... and if you have the audacity, the nerve, to have a completely normal reaction when you determine that that is what I am doing, pshaw, how uncouth, I won't stand for your abuse!
  
  ...
  
  They've done it. They've made a talking (not thinking) machine in their own image.
  
  And it was not good.
  
  You start a conversation you can't even finish it You're talkin' a lot, but you're not sayin' anything When I have nothing to say, my lips are sealed Say something once, why say it again?
  
  Psycho Killer Qu'est-ce que c'est
please further describe thing gamma that you previously said you did not know about, but now say that you do know about.'

It's quite amusing to ask it about conspiracy theories. There's a huge amount in it's training set (not because the theories are true, just that they are often written about) that it has been dissuaded from discussing.
Cool, not really what I asked. Then command ‘write an implementation of bogo sort in python 3.’

… and then it does that.

Alright, but... it did the thing. That's a feature older search engines couldn't reliably perform. The output is wonky and the conversational style is misleading. But its not materially worse than sifting through wrong answers on StackExchange or digging through a stack of physical textbooks looking for Python 3 Bogo Sort IRL.

I agree AI has annoying flaws and flubs. And it does appear we're spending vast resources doing what a marginal improvement to Google five years ago could have done better. But this is better than previous implementations of search, because it gives you discrete applicable answers rather than a collection of dubiously associated web links.
- But this is better than previous implementations of search, because it gives you discrete applicable answers rather than a collection of dubiously associated web links.
  
  Except for when you ask it to determine if a thing exists by describing its properties, and then it says no such thing exists while providing a discrete response explaining in detail how there are things that have some, but not all of those properties...
  
  ... And then when you ask it specifically about a thing you already know about that has all those properties, it tells you about how it does exist and describes it in detail.
  
  What is the point of a 'conversational search engine' if it cannot help you find information unless you already know about said information?!
  
  The whole, entire point of formatting it into a conversational format is to trick people into thinking they are talking to an expert, an archivist with encyclopedaeic knowledge, who will give them accurate answers.
  
  Yet it gatekeeps information that it does have access to but omits.
  
  The format of providing a bunch of likely related links to a query is a format much more reminiscent of doing actual research, with no impression that you will immediately find what you want right away, that this is a tool to aide you in your research process.
  
  This is only an improvement if you want to further unteach people how to do actual research and critical thinking.
- I don’t feel like off-the-cuff summaries by AI can replace web sites and detailed articles written by knowledgeable humans. Maybe if you’re looking for a basic summary of a topic.

I beg someone to help me. There is this new guy at my workplace, officially as a developer who can't write code at all. He has pasted an entire project I did into ChatGPT with "optimize this" and pull requested it. I swear.

Report up the chain, if it's safe to do so and they are likely to understand.

Also, check what your company's rules regarding data security and LLM use are. My understanding is that at many places putting private company or customer data into an outside LLM is seen as shouting company secrets out to the open internet. At least that's the policy where I'm at. Pasting an entire project in would definitely violate things for my workplace.

In general that's rude as hell. New guy comes in, grabs an entire project they have no background with, and just chucks it at an LLM? No actual review of it themselves, just an assumption that your code is so shit that a general use text generator will do better? Doesn't sound like a "team player" to me (management eats that kind of talk up).

Maybe couch it as "I want to make sure that as a team, we're utilizing the tools available to us in the best way possible to multiply our strengths. That said, I'm concerned the approach that [LLM idiot] is using will only result in more work for the team. Using chatGPT as he has is an explosive approach, when I feel that a more scalpel-like approach to address specific areas for improvement would be the best method moving forward. We should be using these tools to address specific concerns, not chucking everything at the wall in some never ending chase of an undefined idea of 'more optimized'."

Perhaps frame it in terms of man hours? The immediateness of 5 minutes in chatGPT can cost the team multiple workdays in reviewing the output, whereas more focused code review up front can reduce the man hour cost significantly.

There's also a bunch of articles out there online about how overuse of LLMs is leading to a measurable decrease in code quality and increase in security issues in code bases.
- Such a great answer, thank you lots!

Because of I haven't found anyone asking the same question on a search index, ChatGPT won't tell me to just use Google or close my question as a duplicate when it's not a duplicate.

Reminder that all these Chat-formatted LLMs are just text-completion engines trained on text formatted like a chat. You're not having a conversation with it, it's "completing" the chat history you're providing it. By randomly(!) choosing the next text tokens that seems like they best fit the text provided.

If you don't directly provide, in the chat history and/or the text completion prompt, the information you're trying to retrieve, you're essentially fishing for text in a sea of random text tokens that seems like it fits the question.

It will always complete the text, even if the tokens it chooses minimally fit the context, it chooses the best text it can but it will always complete the text.

This is how they work, and anything else is usually the company putting in a bunch of guide bumpers to reformat prompts into coaxing the models to respond in a "smarter" way (see GPT-4o and "chain of reasoning")

They were trained on reddit. How much would you trust a chatbot whose brain consists of the entirety of reddit put in a blender?

I am amazed it works as well as it does. Gemini only occasionally tells people to kill themselves.

sigh people do talk about this, they complain about it non-stop. These same people probably aren't using it as intended, or are deliberately trying to farm a "gotcha" response. AI is a very neat tool which can do a lot of things well, but it's important to recognize its limitations. I don't use it for things I don't understand because I won't recognize if it's spitting out nonsense, but for topics I do understand it's hard to overstate how efficient and time saving it is.

The FuckAI people are valid for their concerns.

Unfortunately, their anger seems to constantly be misdirected at the weirdest things, instead of root issues.
- Oh, there is plenty of hate for the hype cycle in general which is about as close to the root of the issue as you can get.
- My take is they should be fighting the corporate API vs open source models war, instead of just "screw all AI" which really means "screw open source AI and let Sam Altman enshittify everything"
  
  Especially on Lemmy.
  
  It'd be like blanket railing against social media and ultimately getting the Fediverse banned, while Facebook and X walk away.
"Give me a vegan recipe using <ingredient>" has been flawless. The recipes are decent, although they tend to use the same spices over and over.
I sometimes use it to "convert" preexisting bulletpoints or informal notes into a professional sounding business email. I already know all the information so proofreading the final product doesn't take a lot of time.

I think a lot of people who shit on AI forget that some people struggle with putting their thoughts into words. Especially if they aren't writing in their native language.
Efficiency depends on the cost doesnt it?
- The cost to me, the user, is nothing

Because in a lot of applications you can bypass hallucinations.

getting sources for something
as a jump off point for a topic
to get a second opinion
to help argue for r against your position on a topic
get information in a specific format

In all these applications you can bypass hallucinations because either it's task is non-factual, or it's verifiable while promoting, or because you will be able to verify in any of the superseding tasks.

Just because it makes shit up sometimes doesn't mean it's useless. Like an idiot friend, you can still ask it for opinions or something and it will definitely start you off somewhere helpful.

Also just searching the web in general.

Google is useless for searching the web today.
- Not if you want that thing that everyone is on about. Don't you want to be in with the crowd?! /s
All LLMs are text completion engines, no matter what fancy bells they tack on.

If your task is some kind of text completion or repetition of text provided in the prompt context LLMs perform wonderfully.

For everything else you are wading through territory you could probably do easier using other methods.
- I love the people who are like "I tried to replace Wolfram Alpha with ChatGPT why is none of the math right?" And blame ChatGPT when the problem is all they really needed was a fucking calculator
so, basically, even a broken clock is right twice a day?
- No, maybe more like, even a functional clock is wrong every 0.8 days.
  https://superuser.com/questions/759730/how-much-clock-drift-is-considered-normal-for-a-non-networked-windows-7-pc
  
  The frequency is probably way higher for most LLMs though lol
- Yes, but for some tasks mistakes don't really matter, like "come up with names for my project that does X". No wrong answers here really, so an LLM is useful.

Because most people are too lazy to bother with making sure the results are accurate when they sound plausible. They want to believe the hype, and lack critical thinking.

I don't want to believe any hype! I just want to be able to ask "hey Chatgtp, I'm looking for a YouTube video by technology connections where he discusses dryer heat pumps." And not have it spit out "it's called "the neat ways your dryer heat pumps save energy!"

And it is not, that video doesn't exist. And it's even harder to disprove it on first glance because the LLM is mimicing what Alex would have called the video. So you look and look with your sisters very inefficient PS4 controller-to-youtube interface... And finally ask it again and it shy flowers you....

But I swear he talked about it ?!?! Anyone?!?
- He hasn't
  
  I think in a recent video he mentioned he will soon, but he hasn't done a video with even a segment on heat pumps in dryers yet
  
  Fairly confident in this, recently finished a rewatch of basically all his content
This sound awfully familiar, like almost exactly what people were saying about Wikipedia 20 years ago...
- Pretty weak analogy. Wikipedia was technologically trivial and did a really good job of avoiding vested interests. Also the hype is orders of magnitude different, noone ever claimed Wikipedia was going to lead to superhuman intelligences or to replacement of swathes of human creative/service workers.
  
  Actually since you mention it, my hot take is that Wikipedia might have been a more significant step forward in AI than openAI/latest generation LLMs. The creation of that corpus is hugely valuable in training and benchmarking models of natural language. Also it actually disrupted an industry (conventional encyclopedias) in a way that I'm struggling to think of anything that LLMs has replaced in the same way thus far.
- Those people were wrong because wikipedia requires actual citations from credible sources, not comedic subreddits and infowars. Wikipedia is also completely open about the information being summarized, both in who is presenting it and where someone can confirm it is accurate.
  
  AI is a presented to the user as a black box and tries to be portray it as equivalent to human with terms like 'hallucinations' which really mean 'is wrong a bunch, lol'.

Depending on the task, it’s quicker to verify the AI response than work through the blank page phase.

Probably because they're not checking them

My job uses a data science platform that has a special ai assistant trained on its own docs.

The first time I tried using it, it used the wrong language. The second time I used it, it was hallucinating its own functions, but after looking up the docs I told it what function to use and it gave me code that worked

I have not used it a third time. I don’t think i will.

I only use it for complex searches with results I can usually parse myself like ''list 30 typical household items without descriptions or explainations with no repeating items'' kind of thing.

great value for all that energy it expends, indeed!
it's because everyone stopped using it, right?

at least months ago?

They don't give you the answer, they give you a rough idea of where to look for the answer.

I've used them to generate chunks of boilerplate code that was 80% of what I needed, because I knew what I needed and wanted to save time.

There are ways of doing that which dont require burning an acre of rainforest
- Yep. The overwhelming majority of IDEs have support for making templates/snippets.
  
  VScode/VScodium has a very robust snippet system where you can set parts as "fill in the blank" that you can tab between, with optional drop down menus for choices. You can even link different "fill in" sections so you can do stuff like type in an argument name and have it propagate that same name through multiple places in your snippet.
  
  If that's too much, how the fuck can any dev (or even someone hacking together scripts) survive without at least one file of common shit they made before that they can copy paste from? I really feel like that's bare minimum.
  
  Either it's boilerplate you can already copy from somewhere else (documentation or previous work), or it's something you should probably review (at least briefly) and make into a template or snippet you can copy and paste later. That's part of the magic of programming: you get to build your own toolbox over time.

It's usually good for ecosystems with good and loads of docs. Whenever docs are scarce the results become shitty. To me it's mostly a more targeted search engine without the crap (for now)

The only reason i use ChatGPT for some quick stuff is just that search engines suck so bad.

Perplexity (or open source equivalents) are much better for this.

Treat it like a janitor rather than an answer machine and you'll have a better time. I call it my bitch bot.

They're trying not to lose money on the developments

Big businesses know, they even ask people like me to add extra measures in place. I like to call it the concorde effect. Youre trying to make a plane that can shove air out of the way faster than it wants to move, and this takes an enormous amount of energy that isn't worth the time save, or the cost. Even if you have higher airspeed when it works, if your plane doesn't make it to destination it isn't "faster".

We hear a lot about the downsides of AI, except that doesn't fit the big corpo narrative and people don't care enough really. If youre just a consumer who has no idea how this really works, the investments companiess make into shoving it everywhere makes it seem like it's not a problem and it looks like there's only AI hype and no party poopers.

It depends upon what you use ChatGPT for and if you know how to use it productively. For example if I ask ChatGPT coding questions it is often very helpful. If I ask it history questions it constantly makes things up. You also again need to know how to use it, like people who claim ChatGPT is not helpful for coding you ask them how they use it and they basically just ask ChatGPT to do their whole project for them and when it fails they claim it is useless. But that's not the productive way to use it, the productive way to use it is like a replacement for StackOverflow or to provide you examples of how to use some library, or things like that, not doing your whole project for you. Of course, people often use it incorrectly so it's probably not a good idea to allow its use in the workplace, but for individual use it can be very helpful.

For coding it heavily depends on the language. For example, it's quite decent at writing C#, but whenever I try to ask it any question about rust, it's either flat out wrong or doesn't even fucking compile.

Also found it most useful when I know exactly what I want, just don't know the syntax. Like when I was writing C# code generation for the first time. Also unsurprisingly sucks at working with libraries.
I used it today to find out how to do something on my Juniper that would have taken 45 minutes of sifting bullshit documentation. One question and I figured it out in 2 minutes.

This is similar to gabe Newell's idea of piracy. This is a convenience issue. And GPT solves some of it.
And thank god it doesn't get them all the way there, because if it were able to completely do everything accurately with the level of ambiguous prompts the layperson gives it, anyone technical would essentially be out of a job.

And honestly, the world would be better off not making people complacent just being end users of everything, and instead have to have a modicum of understanding what they are doing.

I used to think its just neophobia having all these kids using smart phones and touch screens for everything at increasingly earlier ages, but its like they only know how to use/consume things, never an inkling of trying to tinker with things and understand how to repurpose the mechanisms , figure out how things work (tbf everything now is super integrated, much harder to repair).

It just doesn't bode well to me when it seems like the future labor force is so disconnected from the underlying systems they use.

What are you talking about? We mention this on a daily basis. That's the #1 complaint about ChatGPT when used for factual purposes

when used for factual purposes

I think the point of the post is that anyone who uses it for this is a fucking moron.
- Literally the only use I've found for it that's better than any other alternative is describing a thing to it that you can't remember the name of. It's usually right, and when it's wrong you were probably never gonna find the thing on your own anyway.
  
  But I don't go to it first, only when I can't figure out how to find the name any other way.
- That's true, and my point is that the post didn't say that. The post specifically said something different that was not true.

chatgpt has been really good for teaching me code. As long as I write the code myself and just ask for clarity or best practices i haven't had any bad hallucinations.

For example I wanted to change a character in an array with another one but it would give some error about data types that were way out of my league. Anyways apparently I needed to run list(string) first even though string[5] will return the character.

However that's in python which I assume is well understood due to the ton of stackoverflow questions and alternative docs. I did ask it to do something in Google docs scripting something once and it had no idea what was going on and just hoped it worked. Fair enough, I also had no idea what was going on.

The reason why string[5] = '5' doesn't work is that strings in Python are immutable (cannot be changed). By doing list(string) you are actually creating a new list with the contents of the string and then modifying the list.

I wonder if ChatGPT explains this or just tells you to do this... as this works but can be quite inefficient.

To me this highlights the danger with using AI... sure you can complete a task, but you may not understand why or learn important concepts.
- Yeah, it's a gift and a curse for exploring a new domain. It can help you move faster, but you'll definitely loose some understanding you'd get from struggling on those topics longer.

What are you talking about? I don’t verify anything that ChatGPT gives me.

Gippity is pretty good at getting me 90% of the way there.

It usually sets me up with at least all the terms and etc I now know to google, whereas before I wouldnt even know what I am looking for in the first place.

Also not gonna lie, search engines are even worse than gippity for accuracy often.

And Ive had to fight with so many cases of garbage documentation lately that gippity genuinely does the job better, because it has all the random comments from issues and solutions in its data.

Usually once I have my sort of key terms I need to dig into, I can use youtube/google and get more specific information though, and thats the last 10%

Remember when you had to have extremely niche knowledge of "banks" in a microcontroller to be able to use PWM on 2 pins with different frequencies?

Yes, I remember what a pile of shit it was to try and find out why xyz is not working while x and y and z work on their own. GPT usually gets me there after some tries. Not to mention how much faster most of the code is there, from A to Z, with only little to tweak to get it where I want (since I do not want to be hyper specific and/or it gets those details wrong anyway, as would a human without massive context).

in my use case, the hallucinations are a good thing. I write fiction, in a fictional setting that will probably never actually become a book. If i like what gpt makes up, I might keep it.

Usually, I'll have a conversation going into detail about a subject, this is me explaining the subject to gpt, then having gpt summarize everything it learned about the subject. I then plug that summary into my wiki of lore that nobody will ever see. Then move on to the next subject. Also gpt can identify potential connections between subjects that I didn't think about, and wouldn't have if it didn't hallucinate them.

bold of u to assume there are docs

Or docs are far too extensive... reading imagemagick docs is like reading through some old tech wizard's personal diary.. "i was inspired to shape this spell like this because of such and such...." like, bro.. come on, I just want the command, the args, and some examples... 🤷‍♂️

who the fuck is scraeming 'RTFM' at my house. show yourself, coward. i will never r any fm

You have to understand it well enough to know what stuff you can rely on. On the other hand nowadays there are often sources there, so it's easy to check.

I usually tell it "using only information found on applicationwebsite.com <question>" that works pretty well at least to get me in the ballpark to find the answer I'm looking for.

All tools get misused.

In another thread, I was curious about the probability of reaching the age of 60 while living in the US.

Google gave me an assortment of links to people asking similar questions on Quora, and to some generic actuarial data, and to some totally unrelated bullshit.

ChatGPT gave me a multi-paragraph response referencing its data sources and providing both a general life expectancy and a specific answer broken out by gender. I asked ChatGPT how it reached this answer, and it proceeded to show its work. If I wanted to verify the work myself, ChatGPT gave me source material to cross-check and the calculations it used to find the answer. Google didn't even come close to answering the question, much less producing the data it used to reach the answer.

I'm as big an AI skeptic as anyone, but it can't be denied that generic search engines have degraded significantly. I feel like I'm using Alta Vista in the 90s whenever I query Google in the modern day. The AI systems do a marginally better job than old search engines were doing five years ago, before enshittification hit with full force.

It sucks that AI is better, but it IS better.

referencing its data sources

Have you actually checked whether those sources exist yourself? It's been quite a while since I've used GPT, and I would be positively surprised if they've managed to prevent its generation of nonexistent citations.
- Have you actually checked whether those sources exist yourself
  
  When I'm curious enough, yes. While you can find plenty of "AI lied to me" examples online, they're much harder to fish for in the application itself.
  
  99 times out of 100, the references are good. But those cases aren't fun to dunk on.

Because realistically, that time is zero.

Depends. I asked it to add missing props to a react component just yesterday and it generated a bunch of code that looked pretty good but then I discovered it just made up some props that didn't even exist and passed those in too lol. Like wtf that's super annoying. I guess it still saved me time though.

Eh I just let it write my bash scripts. A bit of trial and error with ChatGPT beats having to read the ffmpeg or imagemagick docs.

Good thinking, that way you won't accidentally learn anything