Maybe LLMs don’t hallucinate? Maybe they’re just lazy?

Andy TheyersFounder

The Stack

Bear with me.

As tasks require more computing power to run, decisions are being taken on our behalf about how much effort (compute time, so money) to expend when carrying out our requests. GPT-5 routes request to different models, partly to get the best result in the shortest time, but also to stop simple requests costing too much.

In other words, OpenAI is making a decision about how much effort (money) they should spend in answering a question. Which is, essentially, how humans operate. We look at a task and decide how much effort to put in to complete it.

So when we offload boring, time consuming tasks to an LLM, it too is deciding whether or not it can be arsed to spend enough time on it. And if it chooses to spend less than necessary it can produce bad answers. Exactly like a lazy human would.

As an example, the first set text on my new creative writing course is a fairly challenging book; stream of consciousness prose and heavy reliance on the Rashomon Effect leave you completely unsure of what’s going on, while in places the content is pretty hard to stomach. I guess they want to weed out the tyre-kickers nice and early.

After gamely putting together my own thoughts on the book, I thought I’d ask ChatGPT for a bit of assistance. Not “please write my essay”, but “please help me with the admin tasks that will enable me to write a better essay”. Specifically I wanted a breakdown of the book by chapter, with point of view and major plot points, so that I had something to quickly refer back to as I wrote. I started the session with:

“What do you know about the novel XXXXXXXXX?”

It came back with some potted details about the book (author, translator, publication date, number of pages, genre) followed by a plot synopsis, a short summary of the major themes of the book, a discussion of the style of writing, and a summary of the critical reception. “Sorted,” I thought, and followed up with:

“Can you give me a chapter breakdown that specifically highlights the point of view that each chapter is told from (or the main character that the chapter is about), plus a brief synopsis of the plot developments of each chapter”

It chewed away for a moment or two and came back with a nicely formatted table, with chapter number, main character, and 2 sentences of plot for each chapter.

Perfect. Something that I had been planning to pull together by hand to help my essay was done in mere seconds. Exactly what an LLM is for.

Except. The book is 8 chapters long, and the table listed 9. And the new chapter was - apparently - written from the perspective of a character that I did not recognise at all.

That is quite the hallucination. At this point it would be easy to put it to one side and add another tick in the "LLMs can't be trusted" column. But I am a curious creature, and have seen LLMs put to very good use, particularly when summarising and interpreting text. And given the spot-on summarisation of plot, theme and style I wondered what was going on.

First I told it, quite gently, because I am polite and do not want to be first up against the wall when AGI bursts into the world, that I was pretty sure that the book was only 8 chapters long.

The response was to merge the first two chapters (which it had correct) but to keep the ‘hallucinated’ chapter.

I followed up with a firmer response, pointing out that the invented chapter 4 didn’t exist, and that the original chapters 1 and 2 were correct.

This time it spent over three minutes thinking and searching the web, before finally coming back with a new version of the table that correctly outlined the structure of the book.

So what on earth was going on?

In the spirit of Mike Caulfield’s post (Is the LLM response wrong or have you just failed to iterate it?) that I'd happened to read the day before, I asked “I’m interested in how many sources you used for your first answer? Can you list them? And how many did you use for your last, correct, answer? Can you list them as well?”

And lo! We come upon the answer. For the first answers it used the following sources:

A few newspaper reviews
A few book reviews from book bloggers
Reviews and book info from Amazon and Goodreads
And a semi-professional passnotes/teaching resources website. Which is broken in such a way that when you follow the link to the free sample of passnotes for the novel in question you get a free sample of passnotes for a different book altogether

And for the final, correct answer it added…. Drum roll please…. The full text of the book itself.

ChatGPT was not hallucinating. It was replaying the most easily accessed information that it had harvested. What it could not tell was that the only apparently authoritative source is broken. And it’s only when pressed - like the example in Mike’s post - that it took in the text of the book and realised that it had been led astray.

So. Pretty much like a lazy Literature student would have done: “what can I get away with without actually reading the book?”

So now we find ourselves in the situation that as we offload tasks that look like too much effort to an LLM, the LLM might also decide they’re too much effort too. In our own laziness, we’re handing off tasks to lazy assistants.

If we're not careful it’s lazy all the way down.

Back to The Stack