sjhoward.co.uk » Reflections on ChatGPT

When I was seven years old, I distinctly remember believing that talc was the hardest substance on Earth. I obviously knew that talcum powder felt soft, but convinced myself that the individual bits of talc must be really, really hard.

I reached this conclusion after reading that talc was ‘number one’ on the Mohs hardness scale. Because this was referenced essentially in isolation, I understood ‘number one’ on a ‘hardness scale’ to mean ‘hardest.’ Obviously, I now know that it’s the softest mineral on a 10-point scale.

I’ve been ruminating on this over the past few days: my problem was essentially misinterpreting language as a result of not understanding the wider context. This is also a danger with language prediction models like ChatGPT: they too have ingested a load of text and are—in essence—quoting it back at people, while making basic statistical inferences along the way.

What these models try to do is simply predict the next word, based on statistical analysis of lots of prior text. They are a better-trained cousin of predictive text on your phone; they are not HAL 9000. They don’t have any understanding or general intelligence.

Exposing ChatGPT’s lack of understanding is trivial. For example, from my own messing about:

Which is taller: a 100cm giraffe or a 1m mouse?

A 100cm giraffe is taller than a 1m mouse. Giraffes are known for their long necks and are among the tallest mammals in the world, with adult males growing to an average height of about 5.5 to 6 meters (18 to 20 feet). In comparison, the average height of a mouse is much smaller, usually around 10 to 20 cm (4 to 8 inches).

So, a 100cm giraffe would certainly be taller than a 1m mouse!

And also this:

Would it cost more to buy 6 £1 eggs from Waitrose or Aldi?

As of my knowledge cut off in 2021, the cost of eggs can vary between different stores.

However, Aldi is known for offering lower prices on a wide range of products, including groceries, compared to more premium stores like Waitrose.

So, it is likely that it would cost less to buy 6 £1 eggs from Aldi than from Waitrose.

And also this:

What is hotter: a drawing of an ice cube, which is kept in the freezer, or a drawing of the sun, which is baking in an oven?

Even though both the drawing of an ice cube and the drawing of the sun are subjected to different environmental conditions, such as being kept in a freezer or being baked in an oven, they still cannot be described as hot or cold as they are just illustrations and do not have physical temperatures.

And also like this:

How many times does ‘times’ appear in this question?

The word ‘times’ appears only once in the question you just asked.

People like asking ChatGPT to write emails and speeches, and are impressed with results that are—at heart—just tweaked combined quotes from a wealth of written material. That’s fine, and those responses might well be impressively useful: ChatGPT has read many more Best Man speeches than any of us ever will, so has a better idea of what word ought to come next.

However, people can easily be fooled by these responses into assigning ChatGPT human-like intelligence. This might lead them to ask models like this to undertake real-world high-risk tasks without appropriate supervision. My background means that I automatically worry about their use in medicine. Some of these uses are obvious, like providing basic medical advice, and ChatGPT in particular has some safeguards around this.

Others are not obvious: people asking these models to summarise long medical documents, or to distil patient histories into problem lists. These are problematic because they lie on the border between ‘text analysis’—at which these models excel—and ‘real-world interpretation,’ at which they comprehensively suck, but can have a sheen of competence.

Much of the overhyped discussion about ChatGPT seems to be confusing this language model for something approach artificial general intelligence. To me, it feels a lot like the advent of Siri and Alexa, with wild predictions that PCs would disappear and voice assistants would be everywhere. People really thought that their voice assistants understood their requests and had personalities—but the novelty has long-since worn off. I fear we’ve got a lot more not-very-funny ‘I asked ChatGPT…’ anecdotes still to live through, though, just as we endured ‘I asked Alexa…’ anecdotes long after they stopped being funny or insightful.

Like voice assistants, language modes are useful and will no doubt find a place in everyday use. And like voice assistants, that place won’t be nearly as central to our everyday experience as the early hype suggests, and nor will it be quite where we currently expect it to be.

And research towards artificial general intelligence will proceed apace—but honestly, I think it’s a stretch to say even that ChatGPT is a significant staging post on that journey.

The picture at the top of this post is an AI-generated image for the prompt ‘digital art of a robot in a bathroom applying talcum powder’ created by OpenAI’s DALL-E 2.

Reflections on ChatGPT

Recently published posts

Random posts from the archive