Enlarge
James Porto/Getty Images, James Group Studios/iStockphoto, adapted by S. Egts
In its idealized form, science resembles a championship boxing
match. Theories square off, each vying for the gold belt engraved with
“Truth.” Under the stern eyes of a host of referees, one theory triumphs
by best explaining available evidence — at least until the next bout.
But in the real world, science sometimes works more like a fashion
show. Researchers clothe plausible explanations of experimental findings
in glittery statistical suits and gowns. These gussied-up hypotheses
charm journal editors and attract media coverage with carefully
orchestrated runway struts, never having to battle competitors.
Then there’s psychology. Even more than other social scientists —
and certainly more than physical scientists — psychologists tend to
overlook or dismiss hypotheses that might topple their own, says Klaus
Fiedler of the University of Heidelberg in Germany. They explain
experimental findings with ambiguous terms that make no testable
predictions at all; they build careers on theories that have never
bested a competitor in a fair scientific fight. In many cases, no one
knows or bothers to check how much common ground one theory shares with
others that address the same topic. Problems like these, Fiedler and his
colleagues contended last November in
Perspectives in Psychological Science,
afflict sets of related theories about such psychological phenomena as
memory and decision making. In the end, that affects how well these
phenomena are understood.
Fiedler’s critique comes at a time when psychologists are making a
well-publicized effort to clean up their research procedures, as
described in several reports published alongside his paper. In fact,
researchers generally concede that many published psychology studies
have been conducted in ways that conceal their statistical frailty — and
thus the validity of their conclusions. But Fiedler suspects the new
push to sanitize psychology’s statistical house won’t make much
difference in the long run. Findings published in big-time journals draw
enough media coverage to bring the scrutiny of other researchers, who
eventually expose bogus and overblown effects. “Advances in psychology
will depend more on open-minded theoretical thinking than on better
monitoring of statistical practices,” he says.
Alternative blindness
When Fiedler gives talks to groups of psychologists, he tries to
identify open-minded theoretical thinkers by posing a couple of
questions.
First, he asks audience members to name a published study in
which investigators uncovered an interesting, statistically significant
effect that vanished in later reports. In a seminar conducted last year
by Fiedler at a major Dutch university, 38 research psychologists had no
problem citing flash-in-the-pan findings. Many remembered a well-known
but now contested report that college students react to subtle reminders
of old age by walking more slowly, allegedly because healthy young
people unconsciously act out prompted stereotypes of the elderly
(SN: 5/19/12, p. 26).
In that experiment, student volunteers were timed walking down a
corridor after unscrambling sentences that, for one group, contained
senior citizen–related words such as
wrinkle and
Florida.
Researchers who conducted the investigation concluded that students
weren’t aware of having registered the stereotypical words, but still
acted out an elderly stereotype by slowing their pace shortly after the
reading exercise.
But researchers did not consider the possibility that their
facial expressions or body language might subtly have encouraged the
student volunteers to walk more slowly. They didn’t ask themselves
whether some students noticed elder-related words while unscrambling
sentences and supposed that experimenters
wanted them to mimic
seniors. They did not explore whether some students quickly drew
conclusions about what was expected of them and how to behave,
regardless of any unintended signals from experimenters. Nor did they
examine whether reading words related to any upsetting or
thought-provoking topic would make people walk more slowly.
Enlarge
A DIFFERENCE IN BEHAVIOR
One psychological theory proposes that symbols
of death make individuals aware of their own mortality and lead them to
adopt more cautious behaviors. Klaus Fiedler suggests that many other
factors (bottom) might also cause self-awareness and similarly
conservative behaviors.
S. Egts
Fiedler’s point: Blindness to additional, possibly superior,
explanations for experimental results plagues even prominent
psychological theories. “Psychologists too often fail to consider that
the truth may be broader than their hypotheses,” says psychologist
Barbara Spellman of the University of Virginia in Charlottesville.
Spellman edits the journal
Perspectives in Psychological Science, in which Fiedler’s article appears.
And indeed, as in other seminars Fiedler has run, only a few of
the psychologists at the Dutch seminar came up with anything when they
were asked to name an experiment that included a competing account for
any set of results.
Null and void
Geoffrey Loftus, a psychologist at the University of Washington
in Seattle, is an ally in Fiedler’s battle to broaden psychology’s
perspectives. As editor of
Memory & Cognition from 1993 to
1997, Loftus implored researchers to avoid a standard statistical
practice in psychology known as null hypothesis significance testing
that, in his view, perpetuates theoretical chaos. He continued to attack
the practice in a talk last November at the Psychonomic Society’s
annual meeting in Minneapolis.
Null hypothesis refers to a default position: that there is no
relationship except chance between two measured phenomena in an
experiment (for example, it’s only by chance that college students walk
at different speeds after they’ve read words that refer to old age). To
conclude that there
are grounds to say that a relationship
exists between two phenomena, the null hypotheses must be rejected. This
technique requires researchers to calculate whether an assumption that
no experimental effect exists can be rejected as statistically unlikely
based on measured differences between groups.
This is a statistical charade, Loftus contends, since measures
taken before and after any test are virtually never the same. Rejecting a
null hypothesis doesn’t tell a researcher anything new, even if the
threat of finding an effect that doesn’t really exist has been
eliminated. “Significance testing is all about how the world isn’t,”
Loftus contends, “and says nothing about how the world is.”
The art of theory construction in psychology has withered during
the field’s 50-year romance with null hypothesis significance testing,
asserts psychologist Gerd Gigerenzer of the Max Planck Institute for
Human Development in Berlin: “The problem is not that researchers think
that theory is irrelevant, but that almost anything passes as a theory.”
Gigerenzer has identified three types of theory substitutes in
psychology. Each surrogate for theory is so vague and prediction-free
that it can’t be proven wrong.
Enlarge
No swapping
Psychologist Walter Mischel suggests that
psychologists often operate in isolation without trying to integrate
related theories: “Psychologists treat other people’s theories like
toothbrushes — no self-respecting person wants to use anyone else’s.”
Letizia McCall/Getty Images
First, Gigerenzer says, investigators sometimes explain their
findings by using a term for a theory that can be construed to explain
not only an observed effect but also its opposite. Consider
“representativeness,” which many decision researchers use to explain
gamblers’ frequent intuition that, after landing on a series of red
spaces on a roulette table, they’re going to land on a black space. In
this case, psychologists interpret representativeness to mean that
people assume that random sequences of two outcomes are best represented
by a short sequence containing both: reds and blacks when playing
roulette, or heads and tails when flipping a coin.
Yet investigators have also used representativeness to explain
the opposite intuition, in which people assume that a streak of outcomes
is likely to continue. Sports fans demonstrate this kind of intuition
when they attribute “hot hands” to basketball players who make several
shots in a row
(SN: 2/12/11, p. 26). The
fans expect the players to sink their next try. In this case,
representativeness is interpreted to mean that people regard a run of
scores as characteristic of a larger random sequence containing streaks
of scores and misses.
Another theory-avoiding tactic consists of describing a finding
without trying to explain it, Gigerenzer says. The phrase “inequality
aversion” has been applied in some studies to describe the willingness
of subjects to divide a pot of money equally rather than to find some
other way to divide it. Inequality aversion addresses how participants
behaved, but it makes no prediction about
why they behaved that way.
Perhaps the most popular theory surrogates are two-system
theories. Many psychologists now assume that we make decisions using two
mental systems: System 1, in which we make quick, intuitive decisions
based on fallible rules of thumb, and System 2, in which we make
logical, deliberate choices that require more time and brain power.
Psychologist Daniel Kahneman of Princeton University, a Nobel laureate
in economics, has done the most to popularize the System 1/System 2
distinction.
Gigerenzer contends that almost any behavior in a decision-making
study can be attributed to either System 1 or System 2. In the January
2011
Psychological Review, he and psychologist Arie Kruglanski
of the University of Maryland in College Park argued that intuitive and
deliberate judgments alike are based on shared rules of thumb, or
heuristics. Many parents intuitively allocate attention and love equally
to all of their children, for instance, and many investors deliberately
follow the same simple rule by allocating money equally to all of their
chosen stocks to reduce risk (
SN: 6/4/11, p. 26).
Dividing the mind into a nebulous split between intuitive
heuristics and logical rule-following distracts scientists from
exploring how heuristics operate in both intuitive and deliberative ways
and in what situations heuristics work best, Gigerenzer argues.
Toothbrush culture
None of this is to say that psychology has no genuine theories,
but many of them exist in splendid isolation. Most psychologists work in
narrow communities, such as developmental psychology and social
psychology, where established theories are rarely challenged. As a
quotation cited in 2008 by psychologist Walter Mischel of Columbia
University in New York City puts it, “Psychologists treat other people’s
theories like toothbrushes — no self-respecting person wants to use
anyone else’s.” That kind of professional isolationism leads to
“theoretical disorganization,” write Eli Finkel of Northwestern
University in Evanston, Ill., and Paul Eastwick of the University of
Texas at Austin.
In a chapter in an upcoming book, Finkel and Eastwick discuss
theories about how men and women are attracted to each other. One
popular theory holds that people are attracted to others who satisfy
general needs for pleasure, belonging and a few other social prizes. A
second approach posits that people have evolved certain types of mating
strategies over the past few million years. A third perspective assumes
that individuals form relationship styles early in life with parents and
others that orchestrate choices of romantic partners decades later.
Finkel and Eastwick propose that all three approaches can be
organized around a principle, developed in related research, that
attraction depends on how well one person enables another to achieve
urgent goals for pleasure, reproduction, a good relationship fit — or
anything else. Research grounded in that principle has the potential to
produce a unified theory of attraction.
Opportunities to unify related theories often arise when
scientists from different disciplines collaborate on studies of broad
topics such as decision making or moral behavior, Gigerenzer says. He
heads a team of scientists with backgrounds ranging from ecology to
economics that studies heuristic reasoning. Members of this group have
found commonalities between a complex model of thinking and decision
making developed by psychologist John Anderson of Carnegie Mellon
University in Pittsburgh and a simple decision-making rule that is
surprisingly effective in certain situations.
The rule goes like this: If an experimental subject is asked to
make a choice where one of two options is recognized, the subject will
pick the familiar item. In studies of German and U.S. students, each
group did better at identifying the larger city from pairs of choices in
foreign countries than from pairs in their homelands. Partial ignorance
about foreign cities led the students to choose the most familiar city.
Since better-known cities tend to be especially large ones, the
students’ simple tactic worked surprisingly well. Recognition-guided
choices weren’t an option for pairs of familiar cities in students’
native lands.
Full disclosure
For decades, popular research tools, from statistical methods to
computers, have been proposed as models of how people think. Once a
research tool gains traction as a theory of the mind — say, the notion
of the mind as an information-processing computer — creative thinking
about alternative theories becomes increasingly difficult, Gigerenzer
says.
That may be so, but psychologist Uri Simonsohn of the University
of Pennsylvania in Philadelphia believes that the researchers’ efforts
to upgrade statistical practices can coexist with hypothesis competition
and theory integration.
In a 2011 paper in
Psychological Science that has become
a manifesto for those aiming to minimize published results that vanish
on closer inspection, Simonsohn and his colleagues recommended ways to
discourage researchers from cherry-picking data to include in final
reports, altering experimental conditions that don’t work as planned and
using other tactics that disguise statistical weakness.
Some researchers propose using a statistical technique known as
Bayesian analysis that estimates which of several hypotheses best
explains a set of results. But despite the strengths of Bayesian
statistics, investigators can still exclude inconvenient data or
hypotheses from this approach, Simonsohn holds.
In the end, no statistical procedure can thrust psychological
research into the championship ring, where losses sting but unexpected
wins reap big rewards, Fiedler says. In scientific cultures that
encourage clear predictions and open debate, even vanquished predictions
get respect for having helped to advance knowledge.
“It is a good morning exercise for a research scientist to
discard a pet hypothesis every day before breakfast,” the late
ethologist Konrad Lorenz wrote. “It keeps him young.”
The lesson of Clever Hans
Karl Krall
Any scientist will admit that unconscious cuing by an
experimenter can introduce bias into testing. A German named William von
Osten and his horse Hans unwittingly demonstrated that — and inspired
the term Clever Hans effect. Von Osten became famous in 1891 for public
displays of Hans’ ability to perform mathematical calculations and other
feats by tapping his hoof. No cheating was apparent, but in 1907
psychologist Oskar Pfungst investigated claims about Hans’ intelligence.
Pfungst had different experimenters ask questions standing at varying
distances from Hans. Sometimes Hans wore blinders; sometimes the
experimenters knew the answers to their own questions and sometimes they
didn’t. Pfungst discovered not only that Hans needed visual contact
with the questioner but also that Hans couldn’t answer a question when
the experimenter didn’t know the answer. Conclusion: Although
questioners were not consciously cuing Hans to start or stop tapping,
their facial expressions or involuntary movements were enough for Clever
Hans to catch on.
— Bruce Bower
No comments:
Post a Comment