A letter to the black goat

I wrote this letter to the black goat podcast … will update here if I hear back from them …


Dear goaters (is this a good way to address the three of you?),

I attended SIPS some weeks ago (first timer). I was unsure what to expect but got a lot of bang for my buck (which is great) – as a side note I would recommend first timers to try to go there with a concrete project or question or problem, I think there is a good chance to find people with similar issues interested in collaborations.
Here is an observation I would be interested to hear your thoughts on: in the SIPS world everything seems to be really straight forward – we want to ‘unfuck psych science’ (Lindsay, 2018), we want to pre-reg studies, upload data, learn R, comment on code, accelerate science, ‘do it right this time’, collaborate, respect, be inclusive (I really learned a lot in that regard listening to your podcast and talking to people at SIPS) …
All of that is great. I subscribe to all of these points.
Here is the twist – people @SIPS talk about ‘a movement’  (something that seems very American to me), maybe a movement is needed; people @SIPS talk about ‘a revolution’ – again, great! obviously there is a need to rattle the cage and accelerate things beyond ‘paradigm shifts at funerals’ (Planck?)
What happens if you go ‘Outside the SIPS Bubble’ (OSIPSB)?
I have no data (other than my own experiences) but I wonder whether the world (psychology, other sciences) is actually that ready, that willing and open to adopt to these new standards and to actually make a paradigm shift in how we do science. I work at a business school (consumer psych and JDM) there is a lot of finger pointing going on toward psychology (I have a similar feeling that within psychology there is a lot of finger pointing toward social psych) – ‘this is clearly a problem of psychologists but not of us [economists, consumer psychologists, business, accounting researchers …]’.
Another OSIPSB experience I had was talking to an Action Editor of the Journal of Consumer Psychology (JCR) last year – we got into a heated debate about the most basic issues, eg, sharing data, pre-reg studies …
Is this an observation you share? What would be good steps to address these issues? Should we talk about this within SIPS to get a better balance between enthusiasm and real world requirements (eg hiring decisions are still made mostly by senior faculty, assuming my observation hold, less interested in replication than ’new and exciting effects’ (quote anonymous senior AE of JCR).
Thanks for your thoughts!

Blind Haste (aka im Blindflug)

Chance encounters sometimes lead to interesting and new projects. This is one of those cases … I got to know Emanuel de Bellis during my time at Nestle and we never stopped collaborating ever since (he actually lead my 2017 ‘skype to’ statistics with my wife as a close second …)

We got our hands on a rich dataset of speed measures the police in Zürich (Switzerland) does unbeknownst to the drivers during they year for planning purposes. The radar is put into a small black box hardly anybody realises driving by:

(it’s the black box above the bin – not the bin!)

So – we got these data from over a million cars and got to work with them trying to find an answer to a question in perception research: Do people perceive their environment differently when light conditions deteriorate? and (even more important) Do drivers change their driving speed accordingly.

Well – they don’t … and here is correlational proof 🙂

Stay tuned for a causal demonstration – oh yes!

Blind haste: As light decreases, speeding increases

Worldwide, more than one million people die on the roads each year. A third of these fatal accidents are attributed to speeding, with properties of the individual driver and the environment regarded as key contributing factors. We examine real-world speeding behavior and its interaction with illuminance, an environmental property defined as the luminous flux incident on a surface. Drawing on an analysis of 1.2 million vehicle movements, we show that reduced illuminance levels are associated with increased speeding. This relationship persists when we control for factors known to influence speeding (e.g., fluctuations in traffic volume) and consider proxies of illuminance (e.g., sight distance). Our findings add to a long-standing debate about how the quality of visual conditions affects drivers’ speed perception and driving speed. Policy makers can intervene by educating drivers about the inverse illuminance‒speeding relationship and by testing how improved vehicle headlights and smart road lighting can attenuate speeding.

Professor priming – or not

This was my first contribution to a Registered Replication Report (RRR). Being one of 40 participating labs was an interesting exercise – it might seem straightforward to run the same study in different labs, but we learned that such small things as ü, ä and ö can generate a huge amount of problems and work (read this if you are into these kind of things).

Here is one of the central results:

So overall not a lot of action … our lab was actually the one with larges effect size (in the predicted direction).

Here is the abstract of the whole paper and here the Commentary by Ap Dijksterhuis naturally, he sees things a bit different …

Dijksterhuis and van Knippenberg (1998) reported that participants primed with an intelligent category (“professor”) subsequently performed 13.1% better on a trivia test than participants primed with an unintelligent category (“soccer hooligans”). Two unpublished replications of this study by the original authors, designed to verify the appropriate testing procedures, observed a smaller difference between conditions (2-3%) as well as a gender difference: men showed the effect (9.3% and 7.6%) but women did not (0.3% and -0.3%). The procedure used in those replications served as the basis for this multi-lab Registered Replication Report (RRR). A total of 40 laboratories collected data for this project, with 23 laboratories meeting all inclusion criteria. Here we report the meta-analytic result of those 23 direct replications (total N = 4,493) of the updated version of the original study, examining the difference between priming with professor and hooligan on a 30-item general knowledge trivia task (a supplementary analysis reports results with all 40 labs, N = 6,454). We observed no overall difference in trivia performance between participants primed with professor and those primed with hooligan (0.14%) and no moderation by gender.

The root of the problem

One of the root causes of where we are (as a science) in psychology and many other disciplines in terms of reproducibility of key (and other) results could not be better summed up than by the man himself Daryl Bem (2002):

“If a datum suggests a new hypothesis, try to find additional evidence for it elsewhere in the data. If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. If there are participants you don’t like, or trials, observers, or interviewers who gave you anomalous results, drop them (temporarily). Go on a fishing expedition for something — anything — interesting. ”

‘Go on a fishing expidition’ – why should there come anything good from such advise? Bem goes on …

“No, this is not immoral (SIC!). The rules of scientific and statistical inference that we overlearn in graduate school apply to the “Context of Justification.” They tell us what we can conclude in the articles we write for public consumption, and they give our readers criteria for deciding whether or not to believe us. But in the “Context of Discovery,” there are no formal rules, only heuristics or strategies.”

I disagree with this statement, because the idea of finding something through torturing the data (until they confess) is a hug source of false positive results. We find an effect and falsely conclude that something is there when in fact there is nothing. I found the above quote when reading this paper by Zwaan, Etz, Lucas & Donnellan (2017) – a target article for BBS which presents six common arguments against replication and a set of really good responses for such discussions.

Here are the six ‘concerns’ the authors discuss:

Concern I: Context Is Too Variable
Concern II: The Theoretical Value of Direct Replications is Limited
Concern III: Direct Replications Are Not Feasible in Certain Domains
Concern IV: Replications are a Distraction
Concern V: Replications Affect Reputations
Concern VI: There is no Standard Method to Evaluate Replication Results

Both are really good reads – for very different reasons.


Bern, D. (2002). Writing the empirical journal article. In Darley, J. M., Zanna, M. P., & Roediger III, H. L. (Eds) (2002). The Compleat Academic: A Career Guide. Washington, DC: American Psychological Association.

Zwaan, R. A., Etz, A., Lucas, R. E., & Donnellan, B. (2017, November 1). Making Replication Mainstream. Retrieved from psyarxiv.com/4tg9c

Growing up to be old

Some papers have somewhat weird starting points – this one had an awesome starting point – Lake Louise (Canada):

In a little suite we (Joe Johnson, Ulf Böckenholt, Dan Goldstein, Jay Russo, Nikki Sullivan, Martijn Willemsen) sat down during a conference called the ‘Choice Symposium‘ and started working on an overview paper about the history and current status of different process tracing methods. One central result (why can’t all papers be like that) is the figure below where we try to locate many process tracing methods on the two dimensions: temporal resolution and distortion risk (i.e., how fast can a method measure a process and how destructive is this measurement).

Schulte-Mecklenbeck, M., Johnson, J.G., Böckenholt, U., Goldstein, D., Russo, J., Sullivan, N., &  Willemsen, M. (in press). Process tracing methods in decision making: On growing up in the 70ties. Current Directions in Psychological Science.

Ah – everybody was trying to find a path all the time:




Something about reverse inference

Often, when we run process tracing studies (e.g., eye-tracking, mouse-tracking, thinking-aloud) we talk about cognitive processes (things we can’t observe) in a way that they are actually and directly observable. This is pretty weird – which becomes obvious when looking at the data from the paper below. In this paper we simply instruct participants to follow a strategy when making choices between risky gamble problems. Taking the example of fixation duration we see that there is surprisingly litte difference between calculating an expected value, using a heuristic (priority heuristic) and just making decisions without instructions (no instruction) … maybe we should rethink our mapping of observation to cognitive processes a bit?

Here is the paper:

Schulte-Mecklenbeck, M., Kühberger, A., Gagl, S., & Hutzler, F. (in press). Inducing thought processes: Bringing process measures and cognitive processes closer together. Journal of Behavioral Decision Making. [ PDF ]


The challenge in inferring cognitive processes from observational data is to correctly align overt behavior with its covert cognitive process. To improve our understanding of the overt–covert mapping in the domain of decision making, we collected eye-movement data during decisions between gamble-problems. Participants were either free to choose or instructed to use a specific choice strategy (maximizing expected value or a choice heuristic). We found large differences in looking patterns between free and instructed choices. Looking patterns provided no support for the common assumption that attention is equally distributed between outcomes and probabilities, even when participants were instructed to maximize expected value. Eye-movement data are to some extent ambiguous with respect to underlying cognitive processes.

Eye-Tracking with N > 1

This is one of the fastest papers I have ever written. It was a great collaboration with Tomás Lejarraga from the Universitat de les Illes Balears. Why was it great? Because it is one of the rare cases (at least in my academic life) where all people involved in a project contribute equally and quickly. Often, the weight of a contribution lies with one person which slows down things – with Tomás this was different – we were often sitting in front of a computer writing together (have never done this before, thought it would not work). Surprisingly this collaborative writing worked out very well and we had the skeleton of the paper within an afternoon. This was followed by many hours of tuning and tacking turns – but in principle we wrote the most important parts together – which was pretty cool.

Even cooler – you can do eye-tracking in groups, using our code.

Here is the [PDF] and abstract:

The recent introduction of inexpensive eye-trackers has opened up a wealth of opportunities for researchers to study attention in interactive tasks. No software package was previously available to help researchers exploit those opportunities. We created “the pyeTribe”, a software package that offers, among others, the following features: First, a communication platform between many eye-trackers to allow simultaneous recording of multiple participants. Second, the simultaneous calibration of multiple eye-trackers without the experimenter’s supervision. Third, data collection restricted to periods of interest, thus reducing the volume of data and easing analysis. We used a standard economic game (the public goods game) to examine data quality and demonstrate the potential of our software package. Moreover, we conducted a modeling analysis, which illustrates how combining process and behavioral data can improve models of human decision making behavior in social situations. Our software is open source and can thus be used and improved by others.