High profile study claims mice show “autism-like” behaviour. But does the evidence stack up?
Two weeks ago, a paper published in the journal Cell claimed to provide evidence that microbes in the gut contribute to the development of autism. The researchers, led by Gil Sharon and Sarkis Mazmanian at the California Institute of Technology, found that mice with gut bacteria from autistic children exhibited more “autistic-like” behaviours than mice whose gut bacteria came from non-autistic children.
We transplanted gut microbiota from human donors with ASD or TD controls into germ-free mice and reveal that colonization with ASD microbiota is sufficient to induce hallmark autistic behaviors.
The result, if true, could have important implications for understanding the causes of autism and, potentially, its treatment. Not surprisingly, the study received widespread media coverage, including articles in The Guardian and The Economist.
But soon after publication, scientists began expressing concerns about the paper on social media. These were echoed in a blogpost by drug discovery chemist Derek Lowe and then in a series of comments on the PubPeer website. Looking more closely at the data, the results are a whole lot less compelling than the media coverage, the press releases, and even the paper itself suggest.
So what exactly did the researchers find? And why the skepticism?
Can gut microbes cause autism?
The idea that gut bacteria play a role in autism is not itself implausible. Gastro-intestinal problems are common amongst autistic people and numerous studies looking at faeces samples from autistic children and adults have identified differences in the bacteria that colonise their digestive system. Earlier this year, a systematic review identified 16 “medium to high quality” studies with a total of 381 autistic individuals and 283 non-autistic controls. It concluded that there are some fairly consistent differences in the gut microbiome of people with autism:
The overall changing of gut bacterial community in terms of β-diversity was consistently observed in ASD [autism spectrum disorder] patients compared with HCs [healthy controls]. Furthermore, Bifidobacterium, Blautia, Dialister, Prevotella, Veillonella, and Turicibacter were consistently decreased, while Lactobacillus, Bacteroides, Desulfovibrio, and Clostridium were increased in patients with ASD relative to HCs in certain studies.
But finding that autism is associated with altered gut bacteria does not necessarily mean that these bacteria cause autism. Autistic people often have quite a restricted diet, which could itself affect gut microbes. And there’s emerging evidence that genetic differences affecting brain development can also impact the functioning of the gut. In other words, differences in gut bacteria may not be why people are autistic. They may arise because they are autistic.
This is where Sharon and colleagues’ new study comes in. The researchers took faecal samples from autistic children and neurotypical (non-autistic) control children and gave them to “germ-free” mice — animals bred in an isolator that prevents exposure to micro-organisms. Growing up without bacteria (many of which are healthy and beneficial) means that these mice are not ideal test subjects. So the researchers bred the mice, making sure to pair male and female mice that had the same human “donor”. The resultant offspring each received their gut bacteria from a single human child. It was this second generation of mice that the researchers tested.
It’s an ingenious design because it allows separation of cause and effect. If gut microbes don’t cause autism, then we shouldn’t expect there to be any differences in the behaviour of mice receiving their microbes from autistic versus non-autistic donors. But if there are differences between the two groups of mice — if the mice with autistic donors show more autistic behaviour — it’s evidence that gut microbes cause or at least contribute to autism.
The big question, however, is how do you measure autism in a mouse?
The three chambers test of sociability
The central feature of autism is impairment in social interactions. This can manifest in different ways — from social awkwardness through to an almost complete avoidance of interaction with other people. But if researchers are claiming that their mice showed “hallmark autistic behaviours”, we should expect strong evidence of reduced or atypical social behaviour.
Sharon and colleagues used two tests of mouse sociability. In the first of these, the mouse was placed for 10 minutes in a three-chambered box. One chamber was empty, one contained an inanimate object, and the third contained another mouse of the same sex. Of interest was how much time the first mouse spent in the chamber containing the second mouse.
The graph below shows the Sociability Index for each of the mice tested — the amount of time spent with the other mouse versus the time with the inanimate object. Positive scores indicate more time being sociable. Each dot represents a single mouse and they are arranged according to the identity of their human donor. Boxes show the inter-quartile range — essentially the middle 50% of the data. There is a lot of variation between individual mice, even with the same donor. But — contrary to predictions — no difference between mice with autistic versus non-autistic donors.
This particular test has been widely used to study animal models of autism, as noted in this recent review.
The three-chambered social preference test was developed by Jacqueline Crawley’s group specifically for assessing social impairment phenotypes in autism rodent models (Kazdoba et al., 2016). This test is one of the most frequently used tests in studies involving rodent models of ASD (Amodeo et al., 2012; Crawley, 2012; Schwartzer et al., 2013; Silverman et al., 2010; Wöhr & Scattoni, 2013).
As such, the failure to find an effect in the current study is important evidence against the hypothesis that gut bacteria cause “hallmark autism behaviours”. However, it warrants just a single line in the paper with the data relegated to a supplementary figure.
The direct social interaction test
Having failed in their first attempt to find evidence of social impairment, Sharon and colleagues moved onto a second test, which they suggest is “more sensitive”. The mouse was placed in a cage, initially on its own and then for 6 minutes with another mouse. The researchers then scored videos of the interaction for social approach, aggression, and grooming behaviour.
This time, there was a significant difference — on average, mice with autistic donors spent less time approaching the other mouse in their cage. However, this second sociability test was only administered to mice from 8 of the donors — 5 autistic children and 3 controls. Two of the controls (N5 and C4) have data that are indistinguishable from the autistic children. And so everything rests on a single control donor (C1) with relatively sociable mice.
Communication difficulties are also a core feature of autism although, again, there is huge variation between autistic individuals. Some people have little or no speech. Others are highly verbal but have difficulties maintaining conversations or using nonverbal communication. And so in current diagnostic criteria, the focus is on the social aspects of communication, rather than language ability itself.
In Sharon et al.’s study, as in many other mouse studies of autism, communication was assessed by measuring ultrasonic (high-frequency) vocalizations. Male mice were held in near isolation for 3 days. Then on the the fourth day, the researchers recorded their vocalizations during a 3 minute interaction with a female mouse.
The researchers reported that mice with autistic donors vocalised significantly less than mice with non-autistic donors. This was based on a comparison of data from just 4 control donors and 5 autistic donors (excluding those with “Mild ASD”). Because only male mice were assessed, the number of mice per child is smaller than for other tests.
As well as social and communication impairments, the other core feature of autism is what’s referred to as repetitive and restricted behaviours. This can include stereotyped motor patterns such as hand-flapping and rocking back and forth, unusual and obsessive interests, and a need for structure and a strict routine.
Here, as in other studies, the mouse analogue of these human behaviours was measured in a marble-burying task. The mouse was placed in a cage with 20 glass marbles arranged on top of woodchip bedding. After 10 minutes, the researchers recorded the number of marbles that the mouse had buried.
On average, the mice with autistic donors buried more marbles than those with non-autistic donors. The difference was statistically significant when the mice from donors with “Mild ASD” were excluded from the analysis.
The researchers also looked at correlations between mouse behaviour and the characteristics of their autistic donors. They noted a significant relationship between marble-burying and scores on the ADOS — a diagnostic assessment in which a clinician interacts with the child in a series of activities designed to elicit autistic behaviour.
The figure below shows the data from the nine autistic donors who completed the ADOS and whose mice completed the marble-burying task (colours here represent different donors). Children who displayed more autistic behaviours donated their faeces to mice who buried more marbles.
The open field test
Finally, mice completed an open field test, in which they were tracked as they explored a square plastic arena for 10 minutes. The researchers extracted 9 different measures of behaviour and reported that mice with autistic donors had “reduced locomotion” — they travelled a shorter distance than those with non-autistic donors. However, if there is a statistically significant effect, it’s clearly very small. It’s also not clear how reduced locomotion relates to any of the canonical autism behaviours.
Does the evidence really show that gut bacteria contribute to autism behaviours?
Sharon and colleagues summarise the findings from these tests as follows:
We report herein that colonization of mice with gut microbiota from human donors with ASD, but not from TD controls, is sufficient to promote behaviors in mice consistent with the core behavioral features of ASD.
So let’s look at the “core behavioural features of ASD” and review how the evidence stacks up:
- Social dysfunction: No evidence for reduced sociability on the three chambers test. Reduced sociability on the direct social interaction test although this appears to all hang on a single control participant.
- Communication impairment: Reduction in ultrasonic vocalizations if donors with Mild ASD are excluded from the analysis.
- Repetitive and restricted behaviours: Increased marble burying behaviour if donors with Mild ASD are excluded.
The researchers also claim that mice with ASD donors show reduced locomotion on the open field test. Even if this is true, there is no sense in which it relates to the “core behavioural features of autism”.
Given the fanfare accorded to this study, the evidence for its central claim is remarkably weak. The differences between mice with autistic and non-autistic donors are subtle if they exist at all. And there are reasons to be skeptical about even these small effects.
1. Mice are not tiny humans with tails
Autism is defined in terms of human behaviour. And so the claim that mice showed “autism-like” behaviour relies on an assumption that the mouse behaviours under investigation are in some sense equivalent to the behaviours that define autism in humans. We assume, for example, that reduced ultrasonic vocalizations are analogous to the language and communication difficulties experienced by autistic children, or that marble-burying equates to repetitive behaviours and interests. And those assumptions may be wrong.
In fairness, this isn’t a criticism of this study in particular. It’s a challenge for any researcher attempting to study autism via mouse behaviour. But it’s important to remember that when we say “autism-like behaviour”, what this really means is “behaviour in mice that can be described using the same words that we use to describe autism”.
2. By any standards, the sample size was tiny
Although an impressive number of mice were tested, the faecal samples came from a total of just 16 children. And for many analyses, the sample was even smaller. The only evidence for social impairments in the mice came from a comparison of just 5 autistic donors and 3 control donors.
As Derek Lowe put it in his blogpost:
“…the tiny number of human donors really makes me wonder. It should go without saying that the human microbiome is quite variable, person-to-person, and I have trouble believing that this is (or even can be) a representative sample.”
The whole point of the study is to tell us something about autism. The autistic children in this study represent autistic children in general. The 3, 4, or 5 control children represent the rest of the population. Just as an opinion poll only becomes meaningful when the pollsters sample a large number of voters, it’s premature to really draw any conclusions from a study with this few donors.
3. The researchers had a lot of flexibility in how they analysed the data
Conventional statistical analyses work on the assumption that the tests you’re doing right now is the only test you’ve run. Without getting into technicalities, this means that the more different things you analyse and the more different analyses you run on the same data, the more likely it is that you pick up spurious effects. In this instance, that means finding “significant” differences between the mice of autistic and non-autistic donors when none exist in reality.
Sharon and colleagues ran a lot of analyses. And they had a lot of choices:
- Which measures to look at for each test
- What type of analysis to perform
- Whether to treat the autistic children as one group or to separate out those with “Mild ASD”
- Whether (and how) to transform the data before analysing it
The choices they made are probably all defensible. But each choice meant an opportunity to select the analysis that gave the effect they were looking for. And it increased the likelihood of finding an effect that wasn’t real. As neurogeneticist Kevin Mitchell argued, we should really await a replication before taking any of the findings seriously.
4. In several cases it’s unclear how the researchers found statistically significant effects
I’m not the only person looking at the data and wondering how the authors managed to find a statistically significant effect.
Some commentators have also attempted to replicate the analyses described in the paper and have not found significant effects. The likeliest explanation is that they haven’t analysed the data in exactly the same way as the original researchers.
Regardless of how these discrepancies are resolved, they illustrate that there were numerous sensible ways the researchers could have analysed their data that would have given them less favourable results than those they reported.
The central claim of this study is that gut bacteria can lead to autism-like symptoms. But even if we accept the premise that mouse behaviours are directly analogous to behaviours exhibited by autistic humans, the evidence is both weak and inconsistent. It’s fair to say, I think, that the authors have presented the data in its most flattering light.
I’ve focused here on the experiments asking if faecal transplants impact mouse behaviour. But that’s actually only a small part of the paper. Sharon and colleagues also reported a series of experiments investigating how that could have occurred. These involved detailed investigation of the gut microbiome of humans and mice; analysis of the metabolites produced by the gut bacteria in mice; and then the use of some of these metabolites in experiments, first with living mice, then with slices of mouse brain, and finally with individual neurons from rat brains. I’m far from qualified to critique these experiments. But the concern is that they are attempting to explain a connection between bacteria and behaviour that doesn’t exist.
That’s not to rule out the underlying premise entirely. It remains possible that there is a causal connection from gut bacteria to the development of autism. But this study does not look like providing the evidence to support that idea.
Update (17th June 2019)
Since posting this critique last week, further developments have cast more doubt on the conclusions of this study. The authors responded to criticisms on PubPeer. In doing so, they released the code for their analyses, which appear to show important discrepancies between how the analyses were described in the paper and how they were actually conducted.
Credit for getting to the bottom of these discrepancies goes to Prof Thomas Lumley, a biostatistician at the University of Auckland, New Zealand. In a blogpost composed over the weekend, he worked out what the researchers must have done to get the results they reported. His inferences were then confirmed when the authors released their code.
Essentially, the authors implied that their analyses (correctly) accounted for the fact that faecal samples came from a handful of children, each acting as donor to dozens of mice. But Lumley says that the code they released had (incorrectly) treated each mouse as if it had a unique donor. That’s important. For example, in the Direct Social Interaction test, there were 128 mice in total receiving faeces from 8 donors. But the analysis conducted by the researchers imagined a situation with 128 donors — one for each mouse. This led the authors to conclude that mice with autistic donors showed reduced sociability on that particular test. When Lumley re-analysed the data using a more appropriate model, the effect disappeared.
Click on the arrow!!
This radically changes the conclusions of the study. Analysing the data correctly (and in line with the authors’ description in the paper) means that all but one of the effects reported in the paper disappear. There is no evidence of reduced social behaviour amongst mice with autistic donors, either on the Three Chambers test or now on the Direct Social Interaction test. There’s no reduced ultrasonic vocalization. No reduced locomotion. The only survivor is a barely significant effect of increased marble burying. This puts the statistical analysis in line with the intuitions of most people looking at those data. There’s no there there.
Lumley suspects that the culprit is the confusing interface of the SPSS software the authors used for their analyses. There’s no reason to see this as anything other than an honest mistake. But, as Lumley notes in his post, the episode shows the importance of researchers sharing their analysis code as well as their data.
Of course, all the concerns outlined in my original post and by other commentators still hold. But the conclusion that gut bacteria contribute to autism-like behaviours now hangs by the slenderest of threads — one group of mice burying slightly more marbles than another group of mice. To reiterate the conclusion of the original post, it’s still possible that gut bacteria can in some circumstances contribute to the development of autism. But we shouldn’t take this particular study as support for that idea.
Footnote: In his post, Lumley re-analysed the data from Figure 1 of the Cell paper. This focused on 8 children (5 autistic) whose mice and faeces underwent detailed analyses. In my original post I worked from the larger dataset which included all the mice who completed each test. Running Lumley’s code on this larger dataset produces essentially the same results. All the effects disappear apart from marble-burying. You can see my re-analysis here.
Jon Brock is a former autism researcher, now science writer, medical grant writer, and co-founder of Frankl Open Science.