Advertisement

Column: These Apple researchers just showed that AI bots can’t think, and possibly never will

OpenAI Chief Executive Sam Altman: Can his products really "think"? Researchers say they can't.
OpenAI Chief Executive Sam Altman: Can his products really “think”? Researchers say they can’t.
(Eric Risberg / Associated Press)
Share via

See if you can solve this arithmetic problem:

Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?

If you answered “190,” congratulations: You did as well as the average grade school kid by getting it right. (Friday’s 44 plus Saturday’s 58 plus Sunday’s 44 multiplied by 2, or 88, equals 190.)

You also did better than more than 20 state-of-the-art artificial intelligence models tested by an AI research team at Apple. The AI bots, they found, consistently got it wrong.

Advertisement

The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results.

— AI critic Gary Marcus

The Apple team found “catastrophic performance drops” by those models when they tried to parse simple mathematical problems written in essay form. In this example, the systems tasked with the question often didn’t understand that the size of the kiwis have nothing to do with the number of kiwis Oliver has. Some, consequently, subtracted the five undersized kiwis from the total and answered “185.”

Human schoolchildren, the researchers posited, are much better at detecting the difference between relevant information and inconsequential curveballs.

Advertisement

The Apple findings were published in October in a technical paper that has attracted widespread attention in AI labs and the lay press, not only because the results are well-documented, but also because the researchers work for the nation’s leading high-tech consumer company — and one that has just rolled out a suite of purported AI features for iPhone users.

“The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results,” says Gary Marcus, a critic of how AI systems have been marketed as reliably, well, “intelligent.”

Indeed, Apple’s conclusion matches earlier studies that have found that large language models, or LLMs, don’t actually “think” so much as match language patterns in materials they’ve been fed as part of their “training.” When it comes to abstract reasoning — “a key aspect of human intelligence,” in the words of Melanie Mitchell, an expert in cognition and intelligence at the Santa Fe Institute — the models fall short.

Advertisement

Shares in AI companies have powered a huge runup in the stock market this year, but users are beginning to question whether the craze will fall flat.

“Even very young children are adept at learning abstract rules from just a few examples,” Mitchell and colleagues wrote last year after subjecting GPT bots to a series of analogy puzzles. Their conclusion was that “a large gap in basic abstract reasoning still remains between humans and state-of-the-art AI systems.”

That’s important because LLMs such as GPT undergird the AI products that have captured the public’s attention. But the LLMs tested by the Apple team were consistently misled by the language patterns they were trained on.

The Apple researchers set out to answer the question, “Do these models truly understand mathematical concepts?” as one of the lead authors, Mehrdad Farajtabar, put it in a thread on X. Their answer is no. They also pondered whether the shortcomings they identified can be easily fixed, and their answer is also no: “Can scaling data, models, or compute fundamentally solve this?” Farajtabar asked in his thread. “We don’t think so!”

The Apple research, along with other findings about the limitations of AI bots’ cogitative limitations, is a much-needed corrective to the sales pitches coming from companies hawking their AI models and systems, including OpenAI and Google’s DeepMind lab.

The promoters generally depict their products as dependable and their output as trustworthy. In fact, their output is consistently suspect, posing a clear danger when they’re used in contexts where the need for rigorous accuracy is absolute, say in healthcare applications.

That’s not always the case. “There are some problems which you can make a bunch of money on without having a perfect solution,” Marcus told me. Recommendation engines powered by AI — those that steer buyers on Amazon to products they might also like, for example. If those systems get a recommendation wrong, it’s no big deal; a customer might spend a few dollars on a book he or she didn’t like.

Advertisement

“But a calculator that’s right only 85% of the time is garbage,” Marcus says. “You wouldn’t use it.”

A federal judge rejected most of Sarah Silverman’s lawsuit against OpenAI for copyright infringement. But what he left in tells us more about where this dispute is headed.

The potential for damagingly inaccurate outputs is heightened by AI bots’ natural language capabilities, with which they offer even absurdly inaccurate answers with convincingly cocksure elan. Often they double down on their errors when challenged.

These errors are typically described by AI researchers as “hallucinations.” The term may make the mistakes seem almost innocuous, but in some applications, even a minuscule error rate can have severe ramifications.

That’s what academic researchers concluded in a recently published analysis of Whisper, an AI-powered speech-to-text tool developed by OpenAI, which can be used to transcribe medical discussions or jailhouse conversations monitored by correction officials.

The researchers found that about 1.4% of Whisper-transcribed audio segments in their sample contained hallucinations, including the addition to transcribed conversation of wholly fabricated statements including portrayals of “physical violence or death ... [or] sexual innuendo,” and demographic stereotyping.

That may sound like a minor flaw, but the researchers observed that the errors could be incorporated in official records such as transcriptions of court testimony or prison phone calls — which could lead to official decisions based on “phrases or claims that a defendant never said.”

Advertisement

Updates to Whisper in late 2023 improved its performance, the researchers said, but the updated Whisper “still regularly and reproducibly hallucinated.”

Will self-driving cars and artificial intelligence take over the world? A distinguished technology expert says don’t hold your breath.

That hasn’t deterred AI promoters from unwarranted boasting about their products. In an Oct. 29 tweet, Elon Musk invited followers to submit “x-ray, PET, MRI or other medical images to Grok [the AI application for his X social media platform] for analysis.” Grok, he wrote, “is already quite accurate and will become extremely good.”

It should go without saying that, even if Musk is telling the truth (not an absolutely certain conclusion), any system used by healthcare providers to analyze medical images needs to be a lot better than “extremely good,” however one might define that standard.

That brings us to the Apple study. It’s proper to note that the researchers aren’t critics of AI as such but believers that its limitations need to be understood. Farajtabar was formerly a senior research scientist at DeepMind, where another author interned under him; other co-authors hold advanced degrees and professional experience in computer science and machine learning.

The team plied their subject AI models with questions drawn from a popular collection of more than 8,000 grade school arithmetic problems testing schoolchildren’s understanding of addition, subtraction, multiplication and division. When the problems incorporated clauses that might seem relevant but weren’t, the models’ performance plummeted.

That was true of all the models, including versions of the GPT bots developed by OpenAI, Meta’s Llama, Microsoft’s Phi-3, Google’s Gemma and several models developed by the French lab Mistral AI.

Advertisement

Some did better than others, but all showed a decline in performance as the problems became more complex. One problem involved a basket of school supplies including erasers, notebooks and writing paper. That requires a solver to multiply the number of each item by its price and add them together to determine how much the entire basket costs.

AI investors say their work is so important that they should be able to trample copyright law on their pathway to riches. Here’s why you shouldn’t believe them.

When the bots were also told that “due to inflation, prices were 10% cheaper last year,” the bots reduced the cost by 10%. That produces a wrong answer, since the question asked what the basket would cost now, not last year.

Why did this happen? The answer is that LLMs are developed, or trained, by feeding them huge quantities of written material scraped from published works or the internet — not by trying to teach them mathematical principles. LLMs function by gleaning patterns in the data and trying to match a pattern to the question at hand.

But they become “overfitted to their training data,” Farajtabar explained via X. “They memorized what is out there on the web and do pattern matching and answer according to the examples they have seen. It’s still a [weak] type of reasoning but according to other definitions it’s not a genuine reasoning capability.” (the brackets are his.)

That’s likely to impose boundaries on what AI can be used for. In mission-critical applications, humans will almost always have to be “in the loop,” as AI developers say—vetting answers for obvious or dangerous inaccuracies or providing guidance to keep the bots from misinterpreting their data, misstating what they know, or filling gaps in their knowledge with fabrications.

To some extent, that’s comforting, for it means that AI systems can’t accomplish much without having human partners at hand. But it also means that we humans need to be aware of the tendency of AI promoters to overstate their products’ capabilities and conceal their limitations. The issue is not so much what AI can do, but how users can be gulled into thinking what it can do.

Advertisement

“These systems are always going to make mistakes because hallucinations are inherent,” Marcus says. “The ways in which they approach reasoning are an approximation and not the real thing. And none of this is going away until we have some new technology.”

Advertisement
seductrice.net
universo-virtual.com
buytrendz.net
thisforall.net
benchpressgains.com
qthzb.com
mindhunter9.com
dwjqp1.com
secure-signup.net
ahaayy.com
tressesindia.com
puresybian.com
krpano-chs.com
cre8workshop.com
hdkino.org
peixun021.com
qz786.com
utahperformingartscenter.org
worldqrmconference.com
shangyuwh.com
eejssdfsdfdfjsd.com
playminecraftfreeonline.com
trekvietnamtour.com
your-business-articles.com
essaywritingservice10.com
hindusamaaj.com
joggingvideo.com
wandercoups.com
wormblaster.net
tongchengchuyange0004.com
internetknowing.com
breachurch.com
peachesnginburlesque.com
dataarchitectoo.com
clientfunnelformula.com
30pps.com
cherylroll.com
ks2252.com
prowp.net
webmanicura.com
sofietsshotel.com
facetorch.com
nylawyerreview.com
apapromotions.com
shareparelli.com
goeaglepointe.com
thegreenmanpubphuket.com
karotorossian.com
publicsensor.com
taiwandefence.com
epcsur.com
southstills.com
tvtv98.com
thewellington-hotel.com
bccaipiao.com
colectoresindustrialesgs.com
shenanddcg.com
capriartfilmfestival.com
replicabreitlingsale.com
thaiamarinnewtoncorner.com
gkmcww.com
mbnkbj.com
andrewbrennandesign.com
cod54.com
luobinzhang.com
faithfirst.net
zjyc28.com
tongchengjinyeyouyue0004.com
nhuan6.com
kftz5k.com
oldgardensflowers.com
lightupthefloor.com
bahamamamas-stjohns.com
ly2818.com
905onthebay.com
fonemenu.com
notanothermovie.com
ukrainehighclassescort.com
meincmagazine.com
av-5858.com
yallerdawg.com
donkeythemovie.com
corporatehospitalitygroup.com
boboyy88.com
miteinander-lernen.com
dannayconsulting.com
officialtomsshoesoutletstore.com
forsale-amoxil-amoxicillin.net
generictadalafil-canada.net
guitarlessonseastlondon.com
lesliesrestaurants.com
mattyno9.com
nri-homeloans.com
rtgvisas-qatar.com
salbutamolventolinonline.net
sportsinjuries.info
wedsna.com
rgkntk.com
bkkmarketplace.com
zxqcwx.com
breakupprogram.com
boxcardc.com
unblockyoutubeindonesia.com
fabulousbookmark.com
beat-the.com
guatemala-sailfishing-vacations-charters.com
magie-marketing.com
kingstonliteracy.com
guitaraffinity.com
eurelookinggoodapparel.com
howtolosecheekfat.net
marioncma.org
oliviadavismusic.com
shantelcampbellrealestate.com
shopleborn13.com
topindiafree.com
v-visitors.net
djjky.com
053hh.com
originbluei.com
baucishotel.com
33kkn.com
intrinsiqresearch.com
mariaescort-kiev.com
mymaguk.com
sponsored4u.com
crimsonclass.com
bataillenavale.com
searchtile.com
ze-stribrnych-struh.com
zenithalhype.com
modalpkv.com
bouisset-lafforgue.com
useupload.com
37r.net
autoankauf-muenster.com
bantinbongda.net
bilgius.com
brabustermagazine.com
indigrow.org
miicrosofts.net
mysmiletravel.com
selinasims.com
spellcubesapp.com
usa-faction.com
hypoallergenicdogsnames.com
dailyupdatez.com
foodphotographyreviews.com
cricutcom-setup.com
chprowebdesign.com
katyrealty-kanepa.com
tasramar.com
bilgipinari.org
four-am.com
indiarepublicday.com
inquick-enbooks.com
iracmpi.com
kakaschoenen.com
lsm99flash.com
nana1255.com
ngen-niagara.com
technwzs.com
virtualonlinecasino1345.com
wallpapertop.net
casino-natali.com
iprofit-internet.com
denochemexicana.com
eventhalfkg.com
medcon-taiwan.com
life-himawari.com
myriamshomes.com
nightmarevue.com
healthandfitnesslives.com
androidnews-jp.com
allstarsru.com
bestofthebuckeyestate.com
bestofthefirststate.com
bestwireless7.com
britsmile.com
declarationintermittent.com
findhereall.com
jingyou888.com
lsm99deal.com
lsm99galaxy.com
moozatech.com
nuagh.com
patliyo.com
philomenamagikz.net
rckouba.net
saturnunipessoallda.com
tallahasseefrolics.com
thematurehardcore.net
totalenvironment-inthatquietearth.com
velislavakaymakanova.com
vermontenergetic.com
kakakpintar.com
begorgeouslady.com
1800birks4u.com
2wheelstogo.com
6strip4you.com
bigdata-world.net
emailandco.net
gacapal.com
jharpost.com
krishnaastro.com
lsm99credit.com
mascalzonicampani.com
sitemapxml.org
thecityslums.net
topagh.com
flairnetwebdesign.com
rajasthancarservices.com
bangkaeair.com
beneventocoupon.com
noternet.org
oqtive.com
smilebrightrx.com
decollage-etiquette.com
1millionbestdownloads.com
7658.info
bidbass.com
devlopworldtech.com
digitalmarketingrajkot.com
fluginfo.net
naqlafshk.com
passion-decouverte.com
playsirius.com
spacceleratorintl.com
stikyballs.com
top10way.com
yokidsyogurt.com
zszyhl.com
16firthcrescent.com
abogadolaboralistamd.com
apk2wap.com
aromacremeria.com
banparacard.com
bosmanraws.com
businessproviderblog.com
caltonosa.com
calvaryrevivalchurch.org
chastenedsoulwithabrokenheart.com
cheminotsgardcevennes.com
cooksspot.com
cqxzpt.com
deesywig.com
deltacartoonmaps.com
despixelsetdeshommes.com
duocoracaobrasileiro.com
fareshopbd.com
goodpainspills.com
hemendekor.com
kobisitecdn.com
makaigoods.com
mgs1454.com
piccadillyresidences.com
radiolaondafresca.com
rubendorf.com
searchengineimprov.com
sellmyhrvahome.com
shugahouseessentials.com
sonihullquad.com
subtractkilos.com
valeriekelmansky.com
vipasdigitalmarketing.com
voolivrerj.com
zeelonggroup.com
1015southrockhill.com
10x10b.com
111-online-casinos.com
191cb.com
3665arpentunitd.com
aitesonics.com
bag-shokunin.com
brightotech.com
communication-digitale-services.com
covoakland.org
dariaprimapack.com
freefortniteaccountss.com
gatebizglobal.com
global1entertainmentnews.com
greatytene.com
hiroshiwakita.com
iktodaypk.com
jahatsakong.com
meadowbrookgolfgroup.com
newsbharati.net
platinumstudiosdesign.com
slotxogamesplay.com
strikestaruk.com
trucosdefortnite.com
ufabetrune.com
weddedtowhitmore.com
12940brycecanyonunitb.com
1311dietrichoaks.com
2monarchtraceunit303.com
601legendhill.com
850elaine.com
adieusolasomade.com
andora-ke.com
bestslotxogames.com
cannagomcallen.com
endlesslyhot.com
iestpjva.com
ouqprint.com
pwmaplefest.com
qtylmr.com
rb88betting.com
buscadogues.com
1007macfm.com
born-wild.com
growthinvests.com
promocode-casino.com
proyectogalgoargentina.com
wbthompson-art.com
whitemountainwheels.com
7thavehvl.com
developmethis.com
funkydogbowties.com
travelodgegrandjunction.com
gao-town.com
globalmarketsuite.com
blogshippo.com
hdbka.com
proboards67.com
outletonline-michaelkors.com
kalkis-research.com
thuthuatit.net
buckcash.com
hollistercanada.com
docterror.com
asadart.com
vmayke.org
erwincomputers.com
dirimart.org
okkii.com
loteriasdecehegin.com
mountanalog.com
healingtaobritain.com
ttxmonitor.com
nwordpress.com
11bolabonanza.com