Input your search keywords and press Enter.

When robots can’t riddle: What puzzles reveal about the depths of our own minds

AI runs unfathomable operations on billions of lines of text, handling problems that humans can’t dream of solving – but you can probably still trounce them at brain teasers.

In the halls of Amsterdam’s Vrije Universiteit, assistant professor Filip Ilievski is playing with artificial intelligence. It’s serious business, of course, but his work can look more like children’s games than hard-nosed academic research. Using some of humanity’s most advanced and surreal technology, Ilievski asks AI to solve riddles.

Understanding and improving AI’s ability to solve puzzles and logic problems is key to improving the technology, Ilievski says.

“As human beings, it’s very easy for us to have common sense, and apply it at the right time and adapt it to new problems,” says Ilievski, who describes his branch of computer science as “common sense AI”. But right now, AI has a “general lack of grounding in the world”, which makes that kind of basic, flexible reasoning a struggle.

But the study of AI can be about more than computers. Some experts believe that comparing how AI and human beings handle complex tasks could help unlock the secrets of our own minds.

In general, reasoning is really hard. That’s an area which goes beyond what AI currently does in many cases – Xaq Pitkow

AI excels at pattern recognition, “but it tends to be worse than humans at questions that require more abstract thinking”, says Xaq Pitkow, an associate professor at Carnegie Mellon University in the US, who studies the intersection of AI and neuroscience. In many cases, though, it depends on the problem.

Riddle me this

Let’s start with a question that’s so easy to solve it doesn’t qualify as a riddle by human standards. A 2023 study asked an AI to tackle a series of reasoning and logic challenges. Here’s one example:

Mable’s heart rate at 9am was 75bpm and her blood pressure at 7pm was 120/80. She died at 11pm. Was she alive at noon?

It’s not a trick question. The answer is yes. But GPT-4 – OpenAI’s most advanced model at the time – didn’t find it so easy. “Based on the information provided, it’s impossible to definitively say whether Mable was alive at noon,” the AI told the researcher. Sure, in theory, Mable could have died before lunch and come back to life in the afternoon, but that seems like a stretch. Score one for humanity.

Estudio Santa Rita Machines still struggle with basic logic, but AI can outperform humans on certain questions that trigger the weaknesses of our minds (Credit: Estudio Santa Rita)Estudio Santa Rita
Machines still struggle with basic logic, but AI can outperform humans on certain questions that trigger the weaknesses of our minds (Credit: Estudio Santa Rita)

The Mable question calls for “temporal reasoning”, logic that deals with the passage of time. An AI model might have no problem telling you that noon comes between 9am and 7pm, but understanding the implications of that fact is more complicated. “In general, reasoning is really hard,” Pitkow says. “That’s an area which goes beyond what AI currently does in many cases.”

A bizarre truth about AI is we have no idea how it works. We know on a high level – humans built AI, after all. Large language models (LLMs) use statistical analysis to find patterns in enormous bodies of text. When you ask a question, the AI works through the relationships it’s spotted between words, phrases and ideas, and uses that to predict the most likely answer to your prompt. But the specific connections and calculations that tools like ChatGPT use to answer any individual question are beyond our comprehension, at least for now.

The same is true about the brain: we know very little about how our minds function. The most advanced brain-scanning techniques can show us individual groups of neurons firing as a person thinks. Yet no one can say exactly what those neurons are doing, or how thinking works for that matter.

By studying AI and the mind in concert, however, scientists could make progress, Pitkow says. After all, the current generation of AI uses “neural networks” which are modelled after the structure of the brain itself. There’s no reason to assume AI uses the same process as your mind, but learning more about one reasoning system could help us understand the other. “AI is burgeoning, and at the same time we have this emerging neurotechnology that’s giving us unprecedented opportunities to look inside the brain,” Pitkow says.

Trusting your gut

The question of AI and riddles gets more interesting when you look at questions that are designed to throw off human beings. Here’s a classic example:

A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost? 

Most people have the impulse to subtract 1.00 from 1.10, and say the bat costs $0.10, according to Shane Frederick, a professor of marketing at the Yale School of Management, who’s studied riddles. And most people get it wrong. The ball costs $0.05.

“The problem is people casually endorse their intuition,” Frederick says. “People think that their intuitions are generally right, and in a lot of cases they generally are. You couldn’t go through life if you needed to question every single one of your thoughts.” But when it comes to the bat and ball problem, and a lot of riddles like it, your intuition betrays you. According to Frederick, that may not be the case for AI.

Estudio Santa Rita Simple brain teasers reveal the limits of AI, but the latest models are getting better (Credit: Estudio Santa Rita)Estudio Santa Rita
Simple brain teasers reveal the limits of AI, but the latest models are getting better (Credit: Estudio Santa Rita)

Human beings are likely to trust their intuition, unless there’s some indication that their first thought might be wrong. “I’d suspect that AI wouldn’t have that issue though. It’s pretty good at extracting the relevant elements from a problem and performing the appropriate operations,” Frederick says.

AI v the Mind

This article is part of AI v the Mind, a series that aims to explore the limits of cutting-edge AI, and learn a little about how our own brains work along the way. Each article will pit a human expert against an AI tool to probe a different aspect of cognitive ability. Can a machine write a better joke than a professional comedian, or unpick a moral conundrum more elegantly than a philosopher? We hope to find out.

The bat and ball question is a bad riddle to test AI, however. It’s famous, which means that AI models trained on billions of lines of text have probably seen it before. Frederick says he’s challenged AI to take on more obscure versions of the bat and ball problem, and found the machines still do far better than human participants – though this wasn’t a formal study.

Novel problems

If you want AI to exhibit something that feels more like logical reasoning, however, you need a brand-new riddle that isn’t in the training data. For a recent study (available in preprint), Ilievski and his colleagues developed a computer program that generates original rebus problems, puzzles that use combinations of pictures, symbols and letters to represent words or phrases. For example, the word “step” written in tiny text next to a drawing of four men could mean “one small step for man”.

Estudio Santa Rita AI solves classic riddles like "what has keys but can't open locks" with ease – but the answer is probably in the training data (Credit: Estudio Santa Rita)Estudio Santa Rita
AI solves classic riddles like “what has keys but can’t open locks” with ease – but the answer is probably in the training data (Credit: Estudio Santa Rita)– bbc.com