Wednesday, February 8, 2017

Science first, communication second. A prequel to the Food and Brand Lab episode

The Food and Brand Lab at Cornell University, led by Brian Wansink (@BrianWansink) caused quite some academic turmoil these recent weeks. This contrasts sharply with the media turmoil usually generated by this lab's research. A series of academic social media posts and a critical article target the lab's inferior methodology and old school approach to rendering null-effects into (a set of) publishable papers. In this post, I want to give my account of a previous similar situation that I had with the same lab in 2012. I think the take-home-messages of this post are the following. First, I want to claim that the recent controversy fits within a longer tradition that pertains to this lab (but maybe also a large part of their discipline) and this just shows that a science revolution for reproducible findings is not a matter of a few years. Second, I do think that given previous criticism on their work, it is astonishing that it has not improved to some extent. Researchers should be more responsive to how the field reacts to their work. Third, the case seems to demonstrate a troublesome trend where labs and universities tend to invest more in PR than in research methodology and ethics. Although I am an enthusiast of science communication, we should not neglect the order in these words: science-communication. "Science first, communication second"

Afbeeldingsresultaat voor children food plate
In our KU Leuven Institute for Media Studies (@ims_kul) we also engage in food research. The differences with the Food and Brand Lab are that (1) we are far less big and productive, but more importantly, (2) we focus on consumer reactions to marketing communication rather than the consumer behavior approach of Brian Wansink and (3) we focus on children rather than adults. This implies that I think I know this type of research well enough while at the same time not really being a part of the same in-group.

For those not aware of the recent turmoil
It is a double one. First, Brian Wansink very bluntly published a blog on the Lab's practices turning null findings in publishable papers with so-called 'deep data dives'. Given more timely view on data analysis, this old school deep data dives are textbook examples of p-hacking, a set of statistical tricks that largely invalidate your claims since these techniques do not follow the rationale of solid analysis and we now (should) consider this to be an example of data fraud. Second, an article (Van der Zee, Anaya & Brown) focused on four of these papers and found no less than 150 errors and inconsistencies in these four papers' reported findings (blog report by the first author, @Research_Tim).

The story
In 2011, the Cornell research published an article (Zampollo, Kiffin, Wansink & Shimizu, 2011) on how children's preferences for food are differentially affected by the how the foods are presented on a plate compared to adults. Given that the study involved children, it touched upon my research and it therefore triggered more of my attention than the other Wansink type studies. The study had some interesting findings with regard to ideas I was working on myself, but some of the findings were incomprehensible from the article (the article is still online, you can consult it). Being the polite, non-tenured, Belgian academic, at a time where we just started to question reproducibility, I wrote a polite email. Asking for some specific information about the statistics.

This was the response I got.
Dear Tim, Thank you for being interested in our paper.Actually there are several errors in the results section and Table 1. What we did was two step chi-square tests for each sample (children and adults), so we did not do chi-square tests to compare children and adults.As indicated in the section of statistical analysis, we believe doing so is more conclusive to argue, for example, that children significantly prefer six colors whereas adults significantly prefer three colors (rather than that children and adults significantly differ in their preferred number of color). Thus, for each sample, we first compared the actual number of choices versus the equal distribution across possible number of choices. For the first hypothesis, say #1=0, #2=0, #3=1, #4=0, #5=2, #6=20 (n=23), then we did a chi-square test (df=5) to compare those numbers with 3.83 -- this verified the distribution is not equal. Then, we did second chi-square test (df=1) to compare 20 and 0.6 (the average of other choices), which should yield 18.3. However, as you might already notice, some of values in the text and the table are not correct -- according to my summary notes, the first 3 results for children should be: 18.3 (rather than 40.4)16.1 (rather than 23.0)9.3 (rather than 26.88) Also, the p-value for .94 (for disorganized presentation) should not be significant apparently. I am sorry about this confusion -- but I hope this clarify your question. 
Well, that was interesting. Just one email, and immediately a bunch of corrections followed. Too bad the answer was nonsensical. So I wrote back to them (bold added now):
When reading the paper, I did understand the first step of the chi-square tests. I was puzzled by the second step, and to be honest, I still am a bit. The test you performed in that second step boils down to a binomial test, examining the difference between the observed number of counts in the most preferred cell and the H0 expected number of counts. Though this is informative, it does not really tell you something about how significant the preferences were. For instance, if you would have the following hypothetical cell counts [0 ; 0 ; 11; 0; 0 ; 12], cell 6 would still be preferred the most, but a similar binomial test on cell 3 would also be strongly significant. In my opinion, I thus believe that the tests do not match their given interpretations in the article. From a mathematical point of view, your tests on how much preferred a certain type of plate is raise the alpha level to .5 instead of .05. What you do test on the .05 level is just the deviation in the observed cell count from the hypothesized count in that particular cell, but this is not really interesting
Then, this remarkable response came. Note that they agree with the "shoddy statistics" (to paraphrase the recent nymag article by @jessesingal on the Food and Brand Lab).  Moreover, they immediately confess to having published this before.
I carefully read your comments and I think I have to agree with you regarding the problem in the second-step analysis.I employed this two-step approach because I employed similar analyses before (Shimizu & Pelham, 2008, BASP). But It is very clear that our approach is not appropriate test for several cases like the hypothetical case you suggested. Fortunately, such case did not happen so often (only case happened in for round position picture for adults). But more importantly, I have to acknowledge that raising the p-value to .5 in this analysis has to be taken seriously. Thus, like you suggested, I think comparing kids counts and adults counts (for preferred vs rest of cells) in 2x2 should be better idea. I will try to see if they are still significant as soon as I have time to do.  
So, I decided to write a letter to the editor (Hugo Lagercrantz) addressing the mistakes, but also making clear that not only the authors but also reviewers should have seen these issues. As I wrote to the authors: "Given the impact of this type of very relevant research – an impact that is both academic and societal – I think the reviewers should have done a better job in giving you the exact same remarks as I had". I still believe this is an important point. Any research should be good and thoroughly executed, analyzed, interpreted, and reported, but when the research itself or the lab then scales the impact to the societal level, the PR should be untouchable. Within the corporate social responsibility literature, there are a few sin industries where the whole idea of doing "good" seems strange (think of cigarettes, alcohol, petroleum, ...), but science cannot ever be such a sin industry!

The editor accepted that letter but rather than asking for a retraction, he just invited the authors for a reply.

So, back to my take-home-messages:

  1. It is part of a research tradition (urgently to be corrected). Really, I don't believe this is just a single case. Within this discipline and others that I claim to know enough about, I've seen similar things. Maybe not to the same extent, but sometimes close enough. This does not at all justify the lack of method-ethical scrutiny in the Food and Brand Lab, but it does put things in perspective. Based on this episode from the Lab (from my 2012 criticism to the recent one), and based on my own experience and interaction with other academics, I really believe that the reproducible revolution in science is going at two speeds. One the one hand, there still is a majority of old school researchers and labs, still empowered by the false feedback of the publishing system that tends to reward such practices. On the other hand, there is growing minority of researchers trying to change this and to improve the evidence base of science. Though the relation between both groups has always been tense, I fear for a rapid dichotomization which will make it more difficult than ever to come to a joint perspective on science and its presumed (hoped-for) self-correcting nature.
  2. Despite the research tradition, criticized labs should up their game. One cannot simply take a hit in these and then just continue. I've witnessed this before when we criticized Turkington and Morrison only to get invited for a "slaughtering" (sic) by these "academics". But how on earth can you just continue with "shoddy methodology" after someone else pointed that out for you? Surely, every researcher makes mistakes (I guess), but I hope we all learn from them to improve science. If not, we are no better than the banking sector, politicians, or whatever profession academics can mock about on social media. 
  3. The PR trend in science communication. Surely, the Food and Brand Lab is a well-oiled marketing machine for its own research. Most of it is probably still a valid suggestion, though not as evidence-based as they suggest (see also Andrew Gelman's post where he opens up that possibility; given my own experience with these type of findings, I want to stress this possibility). But it is troubling that they seem to be better in this science(?) communication than in research ethics and methodology. Just to take this one article as an example: Their own press releases and outreach about that study did not show a single effort of self-correction. You can still find some of that material on their website. Similarly, despite the recent turmoil, I have seen them just continue their online communication efforts. 


  1. Tim, do you not think that the publishers are also largely responsible for promoting this culture of "marketing the message" rather than the core science? I say slap the authors, bit slap the journals/publishers, too.

  2. I read the article. Even before I got to the statistics, there seems to be a bit of a cultural validity problem:

    >participants were asked to choose their favourite from among plates with one through seven
    >food elements of a traditional English breakfast (starting with eggs and gradually adding
    >bacon, sausages, toast, tomatoes, mushrooms and beans).

    OK, the lead author seems to have an affiliation in the UK. But in my experience, despite the transatlantic commonality of the "bacon and eggs" meme, Americans are not especially familiar with most of the elements of a "full English". An English breakfast will typically feature what Americans call Canadian bacon, then big sausage links that are typically twice the diameter of American ones, then after the toast we have grilled tomato halves, some form of butter-fried mushrooms, and canned small white beans in tomato sauce, which despite being called "baked beans" are very different to what Americans expect to see when they hear those two words.

    I would expect most five-year-old American children, and for that many a great many American adults if they have not travelled to the UK and had a hotel breakfast, to fail to recognise several of the items, or to indicate that they didn't much enjoy them (mushrooms and tomato for breakfast, on the same plate as bacon and eggs, are arguably a bit of an acquired taste). Why were these American children and adults not presented with pictures of, say, strips of streaky bacon, pancakes, blueberries, waffles, etc, to go with their eggs?

    1. Agreed, but I just wanted to focus on the most objective part.

  3. "Though the relation between both groups has always been tense, I fear for a rapid dichotomization which will make it more difficult than ever to come to a joint perspective on science and its presumed (hoped-for) self-correcting nature."

    Why do you fear this? A schism between science and pseudoscience would be great. Let the two methods compete.

    1. In an abstract sense, I agree. Problem is that I see both groups being colleagues at a local level. Such a real schism will lead to a lot of tension between colleagues, fights, depression etc. I don't want to sacrifice my sanity