A test that’s correct nine times out of ten is 90 per cent accurate, right? Sometimes not, says Professor Ian Stewart.
By Ian Stewart, Published: 14 Oct 2010
A paper in the Journal of Neuroscience received widespread publicity because it reported a method for diagnosing autism in adults using brain scans, with 90 per cent accuracy. It used pattern recognition methods, training a computer to distinguish between scans of people with and without autism. The resulting test is far cheaper than the usual ones, so it all sounds promising. But ‘90 per cent accuracy’ focuses on the question ‘how often is this test right?’ The vital question is: ‘how often is this test wrong?’
Why? Let’s do the sums.
About one person in 100 suffers from autism. Suppose we applied the test to a million people, of whom 10,000 would actually be autistic. Then among those suffering from autism, we’d get 9000 diagnoses right and 1000 wrong - not bad for a cheap 15-minute test. But among the 990,000 without the condition, the test would produce no correct diagnoses, and wrongly diagnose autism in 99,000 cases. So, out of a total of 108,000 people diagnosed as having autism, 99,000 would not suffer from the condition. That’s a success rate of about 9 per cent.
This is a rough-and-ready calculation. Taking more details into account changes the result a little - to about 5 per cent.
If there are independent reasons to suspect that someone suffers from autism, then false negatives - failing to diagnose the condition even though it is present - are the big worry. But if you’re screening a substantial section of the population, most of whom are not autistic, then it’s the false positives that cause trouble. The same issues arise for any relatively rare condition. In the early days of AIDS, some people committed suicide when told that a very accurate test had shown them to be HIV-positive, when actually the test led to a large proportion of false positives.
There are other problems with the autism study. The computer was trained to distinguish brain scans of people with or without autism by looking at five different features of the scans, and working out which mathematical combination of them was most closely associated with the presence or absence of autism. However, the features were all large scale - the thickness of the cortex, how convoluted the brain’s surface was, and so on. Such features are not especially suited to diagnosing any specific condition: it’s a bit like trying to diagnose chickenpox by looking at weight, height, and hair colour.
Moreover, the number of people involved was small: 20 with autism, 20 without. With that small a group, it’s hard to tell whether any association that shows up is meaningful. You can train a computer using photos of the family cat, and it will calculate whichever combination of size, colour, and whisker length best detects autism in its owner. There are so many potential combinations that in all likelihood one of them will appear to perform pretty well. But try it on another bunch of people, and the odds are it will fail.
Studies of this type need to be carried out on large groups of people, and any potential association that shows up must be checked independently using a totally different group of people. If the association survives all that, you might well be on to something. But until then, it could easily be meaningless.
Ian Stewart FRS is an Emeritus Professor of Mathematics at Warwick University.