Let’s move to slide two to begin. Experiments allow for causal
inference, that is, by running an experiment, we can determine whether
one factor causes another. Remember with correlations we cannot do this.
Experiments allow for causal inference by using random assignments.
Random assignment means that as participants show up to participate in
an experiment, they are randomly assigned to one condition of an
experiment or another, that is, everyone has an equal opportunity of
being in either the control or treatment condition in your experiment.
This is not the same as random sampling. Random sampling means that from
the population in which we’re interested, you randomly choose people to
participate in your experiments. We need to know what an independent and
what a dependent variable is. An independent variable is what the
experimenter does, that is, what we’re manipulating. For example, if
we’re studying aggression, we might choose to manipulate the independent
variable as the temperature of the room. This would be something that
the experimenter would do. Bring some participants randomly into a hot
room and other participants be randomly chosen to be in a cooler room.
Then our dependent variable might be how much they aggress, perhaps how
frequently they yell at other participants or choose to play very loud
music in the presence of the other participant. Let’s talk about another
study.
Let’s move on to slide three. Ballmeister and colleagues in 1998 were
interested in studying willpower. Essentially they believed that
exerting willpower took cognitive resources. By exerting willpower,
trying not to engage in a tempting behavior, your cognitive resources
will be depleted. In this experiment, what they did is had people
randomly assigned to one of three conditions. What they called a radish
condition, a controlled condition, or a cookie condition. Participants
came into a room, sat down and were told that they were participating in
a case perception test. In this test, with the radish condition, they
were shown a bowl of radishes as well as a plate of cookies and were
asked to eat two or three of the radishes and none of the cookies. They
would not be provided with any water for several minutes. The
experimenter then left the room and determined whether or not the
participant actually ate the radishes by looking at what was still
remaining in terms of radishes and cookies upon return.
In the cookie condition, participants again were shown a bowl of
radishes and a plate of cookies. This time they were asked to eat two or
three cookies and no radishes. Then the experimenter left the room. Upon
returning, examined the plate of cookies and radishes to determine what
had been eaten.
In the control condition, there was no food present. Participants were
then asked to work on a task. It was an impossible puzzle task. This
means that there was no solution and the dependent variable here is how
long the participants would continue to try to solve the puzzle.
We see that in the radish condition, they only persisted for less
than 10 minutes, while in the cookie and control conditions, people
worked 15 to 20 minutes on the task. This implies that when we’re eating
radishes in the presence of cookies and trying very hard not to eat the
cookies, that is using willpower to eat only radishes, we actually use
cognitive resources that are then not available to allow us to persist
in a cognitive task.
Let’s move on to slide four. Construct versus operation. In the
experiment, we have both, construct and operation. The definition of
construct is that they are the broad concepts or ideas that we’re
interested in studying. For example, Ballmeister and colleagues were
interested in studying the broad concept of cognitive resources and the
broad concept of willpower. The operations of the specific conceptualization or operationalizations of constructs. What this means is that in the Ballmeister
study, they specifically conceptualized cognitive resource depletion by
looking at persistence on an impossible task. The broad
concept of willpower is conceptualized with one’s ability to refuse to
eat cookies or to deny themselves cookies when forced to eat radishes.
Let’s go on to talk about measures in slide five. When we’re measuring a
dependent variable, there are many ways we can do this. One is to use
repeated measures. This means that we look at your responses over time,
so if we’re interested in measuring your response to attractive faces,
instead of showing you one attractive face and measuring your response,
we might show you several attractive faces, perhaps 50. This would be a
repeated measure, getting your response on each face. We could also use
multiple measures, so if we’re interested in how you respond to
attractiveness, we could show you photographs of faces would be one way
to assess same constructive attractiveness and your response to it. We
could also look at your ability to interact with a person who would be
perceived as attractive, we could provide not only still photos but also
videos of attractiveness and then look at your reactions to those
attractive people. We can also measure dependent variables from other
means, be it observations, that is just simply watching people. For
example, in Ballmeister’s confederate observed how many cookies had been
eaten. They could have also relied on a self-report measure, simply
asking people if they had eaten cookies in the radish condition or
radishes in the cookie condition. What they did look at was performance
on the cognitive tasks, that is how long did people persist, how hard
did they try. If they had used a possible task, they could also look at
performance in terms of how often people managed to solve the puzzle.
Let’s move on to slide six. In our course, we’re often going to talk
about the differences between two groups of people or two conditions.
When we do this, we’re talking about mean differences. One example would
be that women are shorter than men. This is true; women are shorter than
men on average. This does not mean that all women are shorter than all
men or that all men are taller than all women. In fact, we all know that
there are some men who are shorter than most women. A D statistic is one
way we can talk about the mean difference. The D statistic is calculated
by taking the mean of group one minus the mean of group two and dividing
it by a full standard error. For the purposes of this course, you need
to understand what the D statistic means. You do not need to be able to
calculate a pooled standard error.
Let’s move on to slide seven. Many times during this course, as you hear
about these mean differences, you might want to raise your hand and say
well my Aunt Tilda would never buy that or my Uncle Joe is not attracted
to people that have those features. These people might be outliers. They
can influence our findings but usually they would only weaken our effect
rather than strengthen it and by randomly assigning people to
conditions, we remove this concern that we haven’t encompassed everyone
in our study. Remember our concern in social psychology is with the mean
difference, not the individual case. Submissions on the other hand, are
interested in individual cases. Whether or not that is a good thing is
something that should be considered and is often debated.
Let’s move on to slide eight. Significance. Many times you’ll hear in
this course that there was a significant finding. When they say it is a
significant finding, typically it means a statistically significant
finding. We can also discuss findings in terms of their practical
significance. Statistical significance is not always practical
significance. For example, if I find that there’s one point difference
in terms of openness on the seven point scale between men and women,
that may be a statistically significant difference if I have enough
people in my sample, but may not translate to a practical difference.
That is, the way in which men and women interact generally may not
reflect this one point difference. There is need for careful
consideration, however. SAT scores are one of these things. A one point
difference where women score one point higher than men on verbal SAT or
that men score one point higher on the math section of the SAT is one
place where we need to be very careful. It is certainly statistically
significant as hundreds of people take the SATs every year. The
practical significance is not necessarily in whether or not that one
point means anything in terms of mental ability, but it could mean the
difference between getting into a school or not and this is why we need
to carefully consider the way in which we use statistics and the way in
which we dismiss those statistics thinking about only practical
significance.
In order for something to be statistically significant, we typically
require that p be less than .05. This is the convention, although it’s
very widely accepted. The idea here is that if you have a p less than
.05, that means that 5 times out of 100 or less than 5 times out of 100,
you would expect to get these results, the same results that you had in
the experiment simply by random chance. That is, it’s very rare that you
would be able obtain exactly these data points by chance, implying
that there is something about the manipulated or independent variable
that systematically influences the dependent variable.
Let’s move on to slide nine. We can also talk about realism in
psychological experiments. There are two types; mundane realism and
psychological realism. Mundane realism; how well does the lab study
reflect the outside world. Is it realistic in terms of it’s everyday
experience? Certainly most people don’t find themselves in a
psychological lab everyday, so often lab studies do not have much
mundane realism. However, if we’re interested in how students score on
an exam in a room painted pink versus a room painted blue, the study
could have a high level of mundane realism. That is, students
often take exam in their day to day life.
Psychological realism is much more important in terms of lab studies,
that is while most people don’t find themselves in a lab study, the
types of decisions or activities that we have them do are very similar
to the decisions and activities that they do in the everyday world
psychologically. That is, choosing between two blenders in a lab setting
would be very similar to the same psychological process you would use if
you were at a department store choosing between two blenders.
Let’s go on to slide ten and talk about validity. There are three types
of validity with which you need to be concerned. The first is construct
validity. This is the idea that operations are good measures of
construct. An example would be while we’re interested in the construct
of love, we might operationalize showing love through a number of kisses
given during the day. This may have good construct validity or it may
not, depending on the couple. For example if one person is ill and
contagious, there may be less kissing, therefore, it may not be a very
good operation of the construct for that particular couple on that
particular day.
Internal validity. Causality by ruling out alternative explanations.
Experiment is high in internal validity, to the extent that we can infer
causation and not come up with another reason these results might be
obtained. We go back to our pink and blue room in which we’re having
students take exams and we find that those in the pink room score much
better than those in the blue room, we might be able to say that has
high internal validity. There doesn’t seem to be any other reason that
the people in the pink room would do differently than the people in the
blue room as long as they were randomly assigned. However, internal
validity could be questioned if students in the pink room had always
been in the pink room and also took their exam in the pink room. While
students in the blue room were originally in the pink room during the
course, but later removed to the blue room to take the exam.
External validity is our ability to generalize the findings, that a
study has good external validity to the extent that one can say that
this is true not only in this setting or this lab, but also in other
settings, in other labs with other types of people.
Let’s move on to slide eleven. Reliability. There are three types;
interjudge reliability, inter-item and test/retest reliability.
Interjudge reliability means that when we have people code interactions.
For example, we might have someone interact with another person and then
we need to know how friendly they were to one another. We would bring in
two coders, have them watch the tape and make several ratings about the
friendliness of each person. The extent to which the two judges agree
about the friendliness of each individual would be considered the
interjudge reliability. If they agree, we have good interjudge
reliability; if they disagree, then we would have poor interjudge
reliability.
Inter-item reliability means that on any given test, for example a test
in this course over this chapter, each item correlates well with the
other item. This would be good inter-item reliability. If on the exam
there was one question that did not correlate with their performance on
the remaining items, then the test would have poor inter-item
reliability. That is, all the items on the test should be attempting to
measure essentially the same thing; your knowledge of this chapter.
Test/retest reliability is frequently used for traits, ideas, attitudes
that we believe we need to remain stable over time. This is not the same
as the pretest and a posttest. For example, if I believe that your high
in need for cognition and that I’ve developed a skill that can
accurately measure that, I might give you a test once in January,
measure your needs for cognition then, which I don’t expect to change, I
expect that to be a stable trait and then retest your need for cognition
in May. If I get no change between the test and retest, then I have good
test/retest reliability. If there is a significant change, then I have
poor test/retest reliability.
Let’s move on to slide twelve. Replication and meta-analysis.
Replication is the idea that we need to find the same effect again and
again, that is finding the same effect in a different study or with
different people using essentially all the same methods used in the
original study. The more often a study is replicated, the more sure we
can be about the findings or results of those studies. Meta-analysis is
one way to statistically combine effects from all the studies done on a
particular construct. How this used to be done is via lit review or tally
box method, that is, prior to the development of meta-analysis,
researchers specifically read through the literature and then simply
state their opinion or write a literature review in which they
summarized the findings and then decided which effects were most powerful
and how large those effects were. Another more sophisticated method was
the tally box method. In this method researchers went through the
literature and did a tally box mark or a tally mark for each study that
found the effects and a tally in a different column for each study that
did not find the effects. In the end, if there were five studies that
found the effect, two studies that didn’t, the researcher would conclude
that the effect was there. That is, more studies found it than not.
However, what is not taken into account by these four methods is the
power in each study or the preciseness of the method. Meta-analysis
takes all these into account including how many people were involved in
each study, how strong the effect that was found was, that is if the
means were different, how different were they. Meta-analysis results can
often be relied upon as they specifically combine the effects from all
studies and therefore do not leave anything out.
Let’s move on to slide thirteen. The Zimbardo’s Stanford Prison
Experiment. You should visit this website. It is the beginning of ethics
in social psychology. That is, we do not live on an island where we can
take people and have them marry or have relationships with people with
whom they would like to see what happens if those relationships exist.
We cannot put people on an island and manipulate our independent
variables. There are ethical considerations. The true nature of these
ethical considerations was began by the Zimbardo’s Stanford Prison
Experiment. For more information, please visit this website. This
material will be covered on the exam.
Let’s move on to slide fourteen to discuss the human subject board.
After Zimbardo’s experiment, there was, at most universities, the development
of the human subjects board in which all protocols are reviewed. That is
before studies can be done, a researcher must write a protocol and send it
to the human subjects board for review. The human subjects board then
determines whether or not the procedures are ethical, whether or not the
participants are going to be accurately informed and so on. This has had
a broad impact on science. Some students in Social Psychology courses ask questions that
we simply cannot answer due to ethical concerns. It would be wonderful
to know what happens if someone is marrying someone who is less
attractive than they are, but because we cannot randomly assign people
to get married, we have no way to answer these sorts of questions.
This concludes the lesson.