社会心理学中的“可重复危机”真的那么糟糕吗?事情好转了吗?
作者: 陈明 译 / 3724次阅读 时间: 2017年6月16日
来源: Alex Fradera 文 标签: 可重复 社会心理学
www.psychspace.com心理学空间网

;NwljS0心理学空间B j4V*p0^5X(a&V

Was the “crisis” in social psychology really that bad? Have things improved? 
社会心理学中的“危机”真的那么糟糕吗?事情好转了吗?
:O"|w1nD1DS k;y)w0Alex Fradera心理学空间+HrflA gu
陈明 译
&O-R M!VT9[ rY0心理学空间?.HPx1R^#AEe

qcE7f'iK#j"Bl-G#t:N0人格与社会心理学杂志》的一篇新论文仔细研究了心理学的可重复性和研究质量危机,我们将其研究结果分为两部分进行探讨。心理学空间K'q1q5_/]+~Y2O

心理学空间`ZCsz5}Y

Part One: the researchers’ perspective
.m O0k&z;Qy3K gJ0第一部分:研究者的视角

4_'g0r-?HJ*Ft!`S e0心理学空间S&J*gp*N+vzt\

The field of social psychology is reeling from a series of crises that call into question the everyday scientific practices of its researchers. The fuse was lit by statistician John Ioannidis in 2005,  in a review  that outlined why, thanks particularly to what are now termed “questionable research practices” (QRPs), over half of all published research in social and medical sciences might be invalid. Kaboom. This shook a large swathe of science, but the fires continue to burn especially fiercely in the fields of social and personality psychology, which marshalled its response through    a 2012 special issue in Perspectives on Psychological Sciencethat brought these concerns fully out in the open, discussing    replication failure, publication biases, and how to reshape incentives to improve the field. The fire flared up again in 2015 with the publication of Brian Nosek and the Open Science Collaboration’s high-profile attempt to replicate 100 studies in these fields,    which succeeded in only 36 per cent of cases. Meanwhile, and to its credit, efforts to institute better safeguards like    registered reports  have gathered pace.

1^/u5|HyQaJ0

8Lc4f)mX0社会心理学领域正在受到一系列的危机的影响,这场质疑研究者的日常科学实践的危机,其导火索由统计学家John Ioannidis于2005点燃,他在一篇综述中概述了之所以要特别感谢所谓的“有问题的研究惯例questionable research practices (QRPs)”的原因。在所有已发表的社会科学与医学研究之中,可能有一半是无效的。这个大爆炸撼动了一大堆的科学,但是这场大火在社会和人格心理学领域依旧剧烈地燃烧着,《心理科学透视》2012年特刊的整理和回应,将这场担忧完全公开了,在这个特刊中,讨论了重复性失败 、出版社的偏见、以及如何重塑激励措施、如何改善这一领域。,这场大火于2015年又开始复燃,布莱恩·诺赛克(Brian Nosek)和“开放科学合作组织”(The Open Science Collaboration)在2015年高调尝试重复了这一领域的100项研究,只有36%的案例成功重复。值得赞扬的是,与此同时,努力建立像注册报告这样的更好保障措施已经开始加快了步伐。

!T ?bKD kSe0

;@4L?u~jnP0So how bad did things get, and have they really improved? A    new article  in pre-print at theJournal of Personality and Social Psychology  tries to tackle the issue from two angles: first by asking active researchers what they think of the past and present state of their field, and how they now go about conducting psychology experiments, and second by analysing features of published research to estimate the prevalence of broken practices more objectively.

pu*w3zH(xlx&p6}0心理学空间-N*Q"{(P E nW y

那么,事情有多糟糕?情况真的改善了吗?《人格与社会心理学》杂志预印的一篇新文章试图从两个角度来解决这个问题:首先,向活跃的研究者询问了他们对自己研究领域的过去与现在的状态之看法,以及他们是如何进行心理学实验的;第二,通过分析已发表研究的特点,更客观地估计惯常破坏性行为的流行程度。

/U,h#W7H6^%Sp0

{N%z FC%D0The paper comes from a large group of authors at the University of Illinois at Chicago under the guidance of Linda Skitka, a distinguished social psychologist who participated in the creation of the journal  Social Psychological and Personality Science and who is on the editorial board of many more social psych journals, and led by Matt Motyl, a social and personality psychologist who has published with Nosek in the past, including on the issue of improving scientific practice.

:`1}^3`.R1kRz0心理学空间4lC/Tc@ z)u j

这篇论文来自于伊利诺伊大学的一个团队,论文指导者Linda Skitka是位杰出的社会心理学家,他参与了《社会与人格心理学》杂志的创立,他还是众多社会心理学期刊的编委,这个团队由社会和人格心理学家Matt Motyl带领,Matt Motyl曾经和Nosek一起发表过一篇提高科学实践问题的论文。

B+d+wB;L+A;?-|!Xvjh9Q ZO0

%b7s b;h} i.J2Z0Psychology research is the air that we breathe at the Digest, making it crucial that we understand its quality. So in this two-part series, we’re going to explore the issues raised in the University of Illinois at Chicago paper, to see if we can make sense of the state of social psychology, beginning in this post with the findings from Motyl et al’s survey of approximately 1,200 social and personality psychologists, from graduate students to full professors, mainly from the US, Europe and Australasia.

fz} c;]b q7k7{7U0

} d,Y&T7~'t0心理学研究是“BPS研究精选”栏目中的赖以呼吸的空气,我们对其质量的了解是至关重要的。所以在这个两部分的系列中,我们将探讨这篇论文中提到的问题,看看我们是否可以理解社会心理的状态,Motyl等人以这篇文章为起点,开始调查了约1200名社会和人格心理学家,从研究生到全职教授,他们大都来自美国,欧洲和澳大利亚。心理学空间 uR ypo

D%d4}-^|6V`6H0Motyl’s team began by asking their participants about the state of the field now as opposed to 10 years ago. On average, participants believed that older research would only replicate in 40 per cent of cases – quite close to Nosek’s figure – but they believed that research being conducted now would have a better rate, about 50 per cent, and that generally the field was improving itself in response to the crisis心理学空间%P[8aZ;`+J

心理学空间)]A-[ P3tur

Motyl团队首相询问了该领域现在的状况与10年前有何不同。平均而言,参与者认为,过去的研究只有40%的可重复性——这与Nosek的数字很接近——但他们认为 ,现在进行的研究将会有更好的可复制率,大概50%左右。而且,总的来说,这个领域正在进行自我改善,以应对危机。心理学空间?`7i }'e%oBh&B

9BF2UA o1Ea!e0Motyl’s team also canvassed the respondents on a range of questionable research practices, sketchy behaviours like neglecting to report all the measures taken, or quietly dropping experimental conditions from your study. Thanks particularly to work by Joseph Simmons, Leif Nelson, and Uri Simonsohn, we understand just how much these practices compromise the assumptions of scientific significance testing, making it easy to produce false positive results even in the absence of fraudulent intent. In their words, QRPs are not wrong “in the way it’s wrong to jaywalk”, the way that researchers have often implicitly been encouraged to think of them, but “wrong the way it’s wrong to rob a bank.”

"ZlE7D.@$aJ M0

+G JSB Ab(Xd}0Motyl的团队还向受访者询问了一系列有问题的研究行为,比如,粗枝大叶的玩忽报告所有采取的措施,或者悄然无息的从研究中删除实验条件。特别感谢Joseph Simmons, Leif Nelson, 和Uri Simonsohn的工作,让我们了解了这些做法在多大程度上损害了科学意义检验假设,这些做法很容易在不存在欺诈意图的情况下产生假阳性结果。用他们的话来说,QRPs“在马路上乱窜”是没有错的,鼓励研究人员以毫无保留的方式为自己考虑是没有错的,但是“抢银行就是误入歧途了。”

2K)say#a!w0心理学空间:?1Oj'xKGo&f)R

Previous surveys of researchers’ own QRP usage have uncovered high levels of admissions, as if the field was rushing to the confession box to purge their sins. Here, Motyl’s team used finer-grained questioning to look at frequency (often a “yes” turned out to be “rarely” or “once”) and justification. In some cases, a researcher’s justification showed that they had misinterprete the question and that they were actually expressing strong disapproval of the QRP – in fact, this seemed to be the case in virtually all “confessions” of data fabrication. In other cases, the context provided by a justification painted the particular research practice in a completely different light心理学空间2sU Z |SkOs

b8F-Pl6k FU1`{0之前对研究者本身的QRP习惯调查已经建立了更高的入场门槛,就好象这个领域正在慌忙地赶去教堂忏悔室以清洗他们的罪恶。在这里,Motyl的团队使用更细致的质询( 将问卷选项中的“是”变成了“很少”或“一次”)来查看分布的频次和过失情况。在某些情况下,研究者的辩解表明他们对这个问题的误解,而且,他们实际上是在表达对QRP的强烈反对——事实上,几乎所有数据造假者的“忏悔供词”看起来都是这种情况。其他情况是,由正当理由提供的上下文以一种完全不同的方式描绘了不寻常的研究实践。心理学空间K[ZZL,U

9K'k-DR{0For example, consider the seemingly dodgy decision to drop conditions from your study analysis. If your rationale is that the condition didn’t turn out to do what you want to do – in an emotion and memory study, your sad video didn’t produce a sad mood in participants, for instance – it’s actually more problematic to keep what is effectively a bogus condition in your analysis than it is to exclude it (ideally in a principled way according to a registered procedure). For the new survey, independent judges evaluated all the stated justifications, and felt they legitimised the “questionable” practices in 90 per cent of cases

Tgn,WE1m2D0

[5gB gfm*C^0例如,处心积虑的从你的研究分析中剔除貌似狡诈决定。如果你的理由是,这样的前提条件产生的结果并不是你想要的——例如,在情绪记忆的研究中,伤感的视频没有让参与者产生悲伤的情绪——实际上,在你的分析中保留那些有效伪条件比排除这些更为困难(根据注册程序理论上的原则方法)。新的调查显示,独立审鉴人评估了所有的陈述理由,并认为90%“被质疑的”研究案例合法化了。

[Y%F g ^8d s0

E z*a}]0Discovering these misunderstandings and justifiable practices littered through the QRP data led Motyl’s team to conclude that pre-explosion psychology practices aren’t as derelict as once feared, although the fact that 70 per cent respondents said they are now less likely to engage in many of these practices than ten years ago suggests that all was not entirely virtuous back then.心理学空间0R+?dp-x

心理学空间V4Bo[ld Q z

通过QRP数据发现这些误解与正当做法,使得Motyl团队得出了结论。之前爆料的心理学实践不像以前担忧的那样可怕,尽管事实上,70%的被调查对象表示,他们现在不太可能像10年前一样从事那么多的实验,这些观点显示,当时的品行并非都是端正的。

qdE#^h8P`0心理学空间j,I6Y&BW9v}8he1^

So not perfect, but getting better, is the take within the field: a cautious optimism compared to some dire pronouncements on the state of psychology. In Part Two, we’ll look at the body of psychological research itself, to see if this optimism is justified心理学空间 LXg/r6A?e']

Y r6JyrD+H })ahN0因此,虽然不完但会越来越好,是这个领域内的考验:与心理学状态某些可怕的生命相比,是一种谨慎乐观的态度。在第二部分中,我们将研究心理研究本身,看看这种乐观是否合理。

3Ha3W Bu0心理学空间~rH xD(tEd

第二部分心理学空间O}Uk9k \-Y

心理学空间1V1z#iBg-a

A new paper  in the Journal of Personality and Social Psychology has taken a hard look at psychology’s crisis of replication and research quality and we’re covering its findings in two parts.

kq:`xZ0

qB JUx'Kq0《人格与社会心理学杂志》的一篇新论文仔细研究了心理学的可重复性和研究质量危机,我们将其研究结果分为两部分。

}r D M:z%E0心理学空间'OrDm u!J i

In  Part One, published yesterday, we reported the views of active research psychologists on the state of their field, as surveyed by Matt Motyl and his colleagues at the University of Illinois at Chicago. Researchers reported a cautious optimism: research practices hadn’t been as bad as feared, and are in any case improving.

.v9Z g&LA)['Z0心理学空间1q5fR h!T

我们在第一部分中公布了活跃的心理学家对他们领域之状态的观点,例如Matt Motyl和他的同事们在芝加哥伊利诺伊大学的调查。研究人员报告了一种谨慎乐观的态度:研究实践并没有想象的那么糟糕,而且无论如何都在改善。心理学空间J0\'V)uib(o

$ws)ms D0But is their optimism warranted? After all,  several high-profile replication projects have found that, more often than not, re-running previously successful studies produces only null results. But defenders of the state of psychology argue that replications fail for many reasons, including defects in the reproduction and differences in samples, so the implications aren’t settled.心理学空间7R`E?I3c8? TY

心理学空间-n!`7]6LHPe3A-h `

但是,他们的乐观是有保证的吗?毕竟,一些备受瞩目的重复研究项目发现:重复进行先前的成功研究,往往只产生了无效的结果。但心理学状况的辩护者认为:重复失败的原因很多,包括重复研究中的缺陷,以及样本的不同,因此其影响还没有定论。

AU'q6r#Kh,g4H0

(zs:j+^g6~&}5WK3Sg0To get closer to the truth, Motyl’s team complemented their survey findings with a forensic analysis of published data, uncovering results that seem to bolster their optimistic position. In Part Two of our coverage, we look at these findings and why they’re already proving controversial.

r9h&nq/\0心理学空间5U e1|d%q

为了更接近事实之真相,Motyl的团队通过对已发布数据进行取证分析来补充他们的调查,结果发现,这似乎能够支持他们乐观立场。在第二部分,我们会评审这些发现,以及为什么他们被证明是有争议的。

l8Do"_OS;cqA5{0

-C_0CJH|k0Motyl and his colleagues used a relatively new type of analysis to assess the quality and honesty of the data found in over 500 previously published papers in social psychology. Their approach is technical, involving weirdly-named statistics conducted upon even more statistics, so it helps to use an analogy: Just as a vegetable garden produces a variety of tomatoes, some bigger than others, some misshapen, some puny and poor for eating, an honestly-conducted body of research should bear a range of fruit in the same way. True experimental effects shouldn’t always come out exactly the same: they should vary in size from experiment to experiment, including instances when the effect is too small to be statistically significant.

@#@ u9^8lZ3R0

p_ G1m6X"ZZ+A%R0Motyl和他的同事们使用了一个相对较新的分析方法,来评估以往500多篇已发表的社会心理学论文中数据的质量和诚实性。他们的方法是技术性的,所涉及的古怪统计名词进行了更稀奇罕有的统计措辞,用一个比喻来说:这些研究就像是一个蔬菜园。出产了的各式各样的西红柿,有的比其他的更大,有的畸形,有的没长大而且不好吃,一个诚实的管理研究机构应该以相同的方式为某个范围内的水果负责(bear)。真正的实验效果不应该总是完全相同的:实验和实验之间的大小应该不同, 包括效果太小而不具有统计学意义时的实例子。心理学空间7@ D4o |7L~E

心理学空间fNQ @?mOi

These are the sorts of things you can evaluate in a body of research – in this case with the Test for Insufficient Variance, which Motyl’s study used alongside six other indices. When there were too many irregularities in the data, or bizarre regularity like identikit supermarket tomatoes, this suggested to Motyl and his colleagues that questionable research practices may have been used to make the weak results swell up to reach the desired appearance.

$DV|c;T0心理学空间1GV9S$E6N4w ]

你可以在一个研究领域中评估这一类问题——在测试方差不足的情况下,Motyl的研究运用了其他6个并行的指标。当数据非常的不规则,或者有着离奇的规律性——就像普通超市里的整齐划一的番茄,这时这就暗示了Motyl和他的同事们,该研究方法可能夸大了微弱的数据,以达到所期望的样子。

0^/\nq1Ka3K0

W"R3C+PX-u8B8@#P;f#o0Crucially, however, the study found that more often than not, the indices showed low levels of anomalies, suggesting research practices are more likely to be acceptable than questionable. This was the case for studies from 2003-4, before the crisis was fully acknowledged, and the researchers found an even better picture for more recent (2013-14) papers. The fruits of the research may have been tampered with from time to time, but there was no case that the entire enterprise was “rotten to the core”.心理学空间 SYiU+SM4M8^vY

心理学空间Vq"Od p#La

然而,至关重要的是,该研究发现,这些指标往往显示出了低水平的异常,这表明研究实践更可能是可以接受的,而非是可疑的。在危机完全得到承认之前的2003~2004年的研究情况就是这样。研究人员在近期(2013-2014)的论文中发现了更好的景象。研究的成果可能会不时地被篡改,但并不存在整个事业“烂到了根”的情况。

gBoB["x;xlO0心理学空间kh+js&E nP

This optimistic conclusion conflicts with similar analyses performed in the past, but this might be explained by the different approaches of collecting the data – of gathering the fruit, if you will. Past approaches automatically scraped articles for every instance of a statistic, such as every listed p-value. But this is like a bulldozer ripping out a corner of a garden and measuring everything that looks anything like a tomato, including stones and severed gnome-heads. To take just one example, articles will often list p-values for manipulation checks: confirmations that an experimental condition was set up correctly (did participants agree that the violent kung-fu clip was more violent than the video of grass growing?). But these aren’t tests to determine new scientific knowledge, rather – turning to another analogy – the equivalent of a chemist checking their equipment works before running an experiment. So Motyl’s team took a more nuanced approach, reading through every article and picking out by hand only the relevant statistics.

;`E%y2z$U*@8k h0心理学空间:J'S y*`H|7V

这种乐观的结论与过去进行的类似分析相冲突,但是,收集数据的不同方法——如果你愿意收集成果的话——或许可以解释这种结果。过去的方法自动擦除文章中的每一个统计实例,例如,在列出的每一个P值后,就不需要具体的数据了。但这就像推土机刨开花园的一角,测量一切看起来像西红柿的东西,包括石头和被割断的土地爷的头。举一个例子,文章通常会列出操纵检验的P值:以确认实验条件设置正确(参与者是否同意暴力功夫片比牧草视频更暴力?)。但是,这些并不是用来判断新科学知识的测试,而是转向了另一种类比——相当于一个化学家在进行实验之前检查他们的设备工作。所以,Motyl的团队采取了一种更细致的方式——阅读每一篇文章并手工挑选相关统计数据。心理学空间.wc _ts r5e.~

心理学空间 r0[-yP U

However, all is not rosy in the garden. At their Datacolada blog, “state of science” researchers Joseph Simmons, Leif Nelson, and Uri Simonsohn, have already responded to the new analysis and they’re sceptical. Simmons and co first note the daunting scale of the new enterprise: to correctly identify 1800 relevant test statistics from 500 papers. In an online response, Motyl’s team agreed that yes, it was time consuming, and yes, it required a lot of hands: “there are reasons this paper has many authors: It really took a village,” they said.

7~i0SsU!fD#Z`ZJ0

yK-^bO;{yj0然而,花园里的一切并非称心如意。“科学状态”的研究则Joseph Simmons,Leif Nelson,和Uri Simonsohn,已经在他们的datacolada.org博客中对新的分析作出了回应——他们对此持怀疑的态度。Simmons和联合作者首先注意到了新计划事业的令人生畏的规模:在500篇论文中,需要正确地识别1800个相关的测试统计!Motyl的团队在网上的一个回应中说,这的确很费时间,这需要许多的人工:“这篇论文有很多作者的理由是,他真的需要占领了一个村庄。”他们说。

/}R)U6dK#L&X"y5of7A0

?'LJ#x4y(G+Fx0But Datacolada sampled some of the statistics that Motyl’s team used in their assessments and they argue that far too many of them were inappropriate, including data from manipulation checks that Motyl’s group had themselves categorised as statistica non grata. To the Datacolada team, this renders the whole enterprise suspect: “We are in no position to say whether their conclusions are right or wrong. But neither are they.” In their response, Motyl’s team make some concessions, but they argue that some of the statistic selection comes down to difference of opinion, and defend both their overall procedure, and the amount of coding errors they expect their study will contain. So….

D8o5ruY ` G.]0

Y*}2UUS/S0但Datacolada抽取了Motyl的团队在其评估中使用的一些统计数据,他们认为其中太多的数据是不合适的,包括操作检查的数据、Motyl小组将自己归类为不受欢迎的统计数据。对datacolada团队来说,这使得整个计划受到怀疑:“我们没资格说,他们的结论是正确的或错误的。但他们也是如此。”Motyl的团队在他们的回应中做出了一些让步,但他们认为某些统计选择,可以归结为不同的观点,并捍卫他们的整体过程,也捍卫了他们的研究可能含有编码错误数量的期望。所以……心理学空间5h/vAg/q&U?

心理学空间_*tY%~pJ N

So?

#t"M?;E9P$cP0心理学空间#Z"@lY'G%s"?

是这样吗?心理学空间)Ym3[}[:xe

$|8F0^%ys@/[I0So doing high-quality science isn’t straightforward. Neither is doing high-quality science on the quality of science, nor is gathering everything together to form high-quality conclusions. But if we care about the validity of the more sexy findings in psychology – the amazing powers of power poses to make you physically more confident, how you can hack your happiness simply by changing your face, and how even subtle social signals about age, race or gender can transform how we perform at tasks – we need to care about psychological science itself, how it’s working and how it isn’t. (By the way, those findings I just listed?They’ve all struggled to replicate.).)心理学空间#eYJi&qc9Q

心理学空间 Y[ }s8~|A0BU

所以做高质量的科学并不简单。无论是高质量科学研究的高质量,还是把所有的东西聚集在一起,形成高质量的结论。但是,如果我们关心心理学更性感的发现——更有力量的姿势能够增强你的自信【译注:Amy Cuddy: 肢体语言塑造你自己】——的有效性,你怎么能够只需改变你的表情,你的年龄、种族、性别这些我们可以在任务中表现的社会信号来破解你的幸福?——我们更关心的是心理科学本身,它是如何工作的,以及又是如何无效的。(顺便说一下, 他们都在努力复制我刚才列举的发现。)心理学空间qih!q}2JnZ1D

心理学空间+oQ7eN@"gX

There are surely ways to to improve the methods of this new study – perhaps not coincidentally, Datacolada’s Leif Nelson is running a similar project – but even if the new assessment does include some irrelevant statistics, it will likely be an advance on past analyses that included every irrelevant statistic.心理学空间 ^6E0X_^-j*K!J

T"q:VE{ nH0肯定有改善这一新研究的方法——也许不是巧合,datacolada的 Leif Nelson运行了一个类似的项目,但是,即便新的评估不包括一些不相关的数据,这可能推动以往的分析,包括每一个不相关的数据。心理学空间9a"De.INm]E*vJ{

心理学空间L#\C!M:To4EU(YN x

So … the new insights have budged my position on the state of science a little: I’m still worried, but I can see a little more light among the dark. Motyl’s group make the case that social psychology isn’t ruined, that the garden isn’t totally contaminated. I hope so. But it’s not hope on its own that will move our field forward, but research, debate, and making sense of the evidence. After all, psychology is too good to give up on.

*rb$l0UuSzA4d J)z:rq0

7fH\I&bW#o0z kv|0所以…新见解已改变了我对科学状态的立场:我还在担心,但是我能在黑暗中看到一点点的光。Motyl的团队认为社会心理学还没有沦丧,那个花园没有完全被污染。但愿如此。但是并不能寄希望于其本身会推动我们的领域向前发展,但研究、辩论和证据的是有意义的。毕竟,心理好的让人无法放弃。心理学空间^'`)H)})Fo7Y

心理学空间aaOO-_bN"|

The State of Social and Personality Science: Rotten to the Core, Not so Bad, Getting Better, or Getting Worse?

oWN(h @l#~E0心理学空间W&s9}o"\/C2_.CC'~+W

https://digest.bps.org.uk/2017/05/25/心理学空间`jJd4[t8y3b

www.psychspace.com心理学空间网
TAG: 可重复 社会心理学
«我们与前任保持友谊的原因 科普
《科普》
胡子的心理学»
延伸阅读· · · · · ·