week 5 -for MathematicsExpert
rachelsaad
432
✪ Graphing Correlations: The Scatter Diagram 434
✪ Patterns of Correlation 437
✪ The Correlation Coefficient 443
✪ Significance of a Correlation Coefficient 452
✪ Correlation and Causality 456
✪ Issues in Interpreting the Correlation Coefficient 458
✪ Effect Size and Power for the Correlation Coefficient 464
This chapter is about a statistical procedure that allows you to look at the rela-tionship between two groups of scores. To give you an idea of what we mean,let’s consider some common real-world examples. Among students, there is a relationship between high school grades and college grades. It isn’t a perfect relation- ship, but generally speaking students with better high school grades tend to get bet- ter grades in college. Similarly, there is a relationship between parents’ heights and the adult height of their children. Taller parents tend to give birth to children who grow up to be taller than the children of shorter parents. Again, the relationship isn’t perfect, but the general pattern is clear. Now we’ll look at an example in detail.
One hundred thirteen married people in the small college town of Santa Cruz, California, responded to a questionnaire in the local newspaper about their marriage. [This was part of a larger study reported by Aron and colleagues (2000).] As part of the questionnaire, they answered the question, “How exciting are the things you do
✪ Controversy: What Is a Large Correlation? 466
✪ Correlation in Research Articles 467
✪ Summary 469
✪ Key Terms 471
✪ Example Worked-Out Problems 471
✪ Practice Problems 474
✪ Using SPSS 482
✪ Chapter Notes 485
Correlation
Chapter Outline
CHAPTER 11
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 433
T I P F O R S U C C E S S You can learn most of the material in this chapter if you have mas- tered Chapters 1 and 2; but if you are reading this before having studied Chapters 3 through 7, you should not try to read the material near the end of this chapter on the significance of a correlation coeffi- cient or on effect size and power.
together with your partner?” using a scale from 1, not exciting at all to 5, extremely exciting. The questionnaire also included a standard measure of marital satisfaction (that included items such as, “In general, how often do you think that things between you and your partner are going well?”).
The researchers were interested in finding out the relationship between doing ex- citing things with a marital partner and the level of marital satisfaction people re- ported. In other words, they wanted to look at the relationship between two groups of scores: the group of scores for doing exciting things and the group of scores for mar- ital satisfaction. As shown in Figure 11–1, the relationship between these two groups of scores can be shown very clearly using a graph. The horizontal axis is for people’s answers to the question, “How exciting are the things you do together with your part- ner?” The vertical axis is for the marital satisfaction scores. Each person’s score on the two variables is shown as a dot.
The overall pattern is that the dots go from the lower left to the upper right. That is, lower scores on the variable “doing exciting activities with your partner” more often go with lower scores on the variable “marital satisfaction,” and higher with higher. So, in general, this graph shows that the more that people did exciting activities with their partner, the more satisfied they were in their marriage. Even though the pattern is far from one to one, you can see a general trend. This general pattern is of high scores on one vari- able going with high scores on the other variable, low scores going with low scores, and mediums with mediums. This is an example of a correlation.
A correlation describes the relationship between two variables. More precisely, the usual measure of a correlation describes the relationship between two equal-interval numeric variables. As you learned in Chapter 1, the differences between values for
Exciting Activities with Partner
60
50
40
30
20
10
0 1 2 3 4 5
M ar
ita l S
at is
fa ct
io n
0
Figure 11–1 Scatter diagram showing the correlation for 113 married individuals be- tween doing exciting activities with their partner and their marital satisfaction. (Data from Aron et al., 2000.)
correlation association between scores on two variables.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
434 Chapter 11
equal-interval numeric variables correspond to differences in the underlying thing being measured. (Most psychologists consider scales like a 1-to-10 rating scale as approx- imately equal-interval scales.) There are countless examples of correlations: in chil- dren, there is a correlation between age and coordination skills; among students, there is a correlation between amount of time studying and amount learned; in the market- place, we often assume that a correlation exists between price and quality—that high prices go with high quality and low with low.
This chapter explores correlation, including how to describe it graphically, dif- ferent types of correlations, how to figure the correlation coefficient (which gives a number for the degree of correlation), the statistical significance of a correlation co- efficient, issues about how to interpret a correlation coefficient, and effect size and power for a correlation coefficient.
Graphing Correlations: The Scatter Diagram Figure 11–1 shows the correlation between exciting activities and marital satisfac- tion and is an example of a scatter diagram (also called a scatterplot). A scatter diagram shows you at a glance the pattern of the relationship between the two variables.
How to Make a Scatter Diagram There are three steps to making a scatter diagram:
❶ Draw the axes and decide which variable goes on which axis. Often, it doesn’t matter which variable goes on which axis. However, sometimes the re- searchers are thinking of one of the variables as predicting or causing the other. In that case, the variable that is doing the predicting or causing goes on the hor- izontal axis and the variable that is being predicted about or caused goes on the vertical axis. In Figure 11–1, we put exciting activities on the horizontal axis and marital satisfaction on the vertical axis. This was because the study was based on a theory that the more the activities that a couple does together are exciting, the more the couple is satisfied with their marriage. (We will have more to say about this later in the chapter when we discuss causality and also in Chapter 12 when we discuss prediction.)
❷ Determine the range of values to use for each variable and mark them on the axes. Your numbers should go from low to high on each axis, starting from where the axes meet. Your low value on each axis should be 0.
Each axis should continue to the highest value your measure can possibly have. When there is no obvious highest possible value, make the axis go to a value that is as high as people ordinarily score in the group of people of interest for your study. Note that scatter diagrams are usually made roughly square, with the horizontal and vertical axes being about the same length (a 1:1 ratio).
❸ Mark a dot for each pair of scores. Find the place on the horizontal axis for the first pair of scores on the horizontal-axis variable. Next, move up to the height for the score for the first pair of scores on the vertical-axis variable. Then mark a clear dot. Continue this process for the remaining pairs of scores. Some- times the same pair of scores occurs twice (or more times). This means that the dots for these pairs would go in the same place. When this happens, you can put a second dot as near as possible to the first—touching, if possible—but making it clear that there are in fact two dots in the one place. Alternatively, you can put the number 2 in that place.
scatter diagram graph showing the relationship between two variables: the values of one variable are along the horizontal axis and the values of the other variable are along the vertical axis; each score is shown as a dot in this two- dimensional space.
T I P F O R S U C C E S S If you’re in any way unsure about what a numeric equal-interval variable is, be sure to review the Chapter 1 material on kinds of variables.
T I P F O R S U C C E S S When making a scatter diagram, it is easiest if you use graph paper.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 435
An Example Suppose a researcher is studying the relationship of sleep to mood. As an initial test, the researcher asks six students in her morning seminar two questions:
1. How many hours did you sleep last night? 2. How happy do you feel right now on a scale from 0, not at all happy, to 8,
extremely happy?
The (fictional) results are shown in Table 11–1. (In practice, a much larger group would be used in this kind of research. We are using an example with just six to keep things simple for learning. In fact, we have done a real version of this study. Results of the real study are similar to what we show here, except not as strong as the ones we made up to make the pattern clear for learning.)
❶ Draw the axes and decide which variable goes on which axis. Because sleep comes before mood in this study, it makes most sense to think of sleep as the predictor. Thus, as shown in Figure 11–2a, we put hours slept on the horizontal axis and happy mood on the vertical axis.
8
7
6
5
4
3
2
1
0
H ap
py M
oo d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night
(d)
H ap
py M
oo d
Hours Slept Last Night
(a)
8
7
6
5
4
3
2
1
0
H ap
py M
oo d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night
(b)
8
7
6
5
4
3
2
1
0
H ap
py M
oo d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night
(c)
❶ ❷
❸
Figure 11–2 Steps for making a scatter diagram. (a) ❶ Draw the axes and decide which variable goes on which axis—the predictor variable (Hours Slept Last Night) on the horizon- tal axis, the other (Happy Mood) on the vertical axis. (b) ❷ Determine the range of values to use for each variable and mark them on the axes. (c) ❸ Mark a dot for the pair of scores for the first student. (d) ❸ continued: Mark dots for the remaining pairs of scores.
Table 11–1 Hours Slept Last Night and Happy Mood Example (Fictional Data)
Hours Slept Happy Mood
7 4
5 2
8 7
6 2
6 3
10 6
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
How are you doing?
1. What does a scatter diagram show, and what does it consist of? 2. (a) When it is the kind of study in which one variable can be thought of as pre-
dicting another variable, which variable goes on the horizontal axis? (b) Which goes on the vertical axis?
3. Make a scatter diagram for the following scores for four people who were each tested on two variables, X and Y. X is the variable we are predicting from; it can have scores ranging from 0 to 6. Y is the variable being predicted; it can have scores from 0 to 7.
0123456
Y
X
7
6
5
4
3
2
1
0
Figure 11–3Scatter diagram for scores in “How are you doing?” question 3.
436 Chapter 11
❷ Determine the range of values to use for each variable and mark them on the axes. For the horizontal axis, we start at 0 as usual. We do not know the maxi- mum possible, but let us assume that students rarely sleep more than 12 hours. The vertical axis goes from 0 to 8, the lowest and highest scores possible on the happiness question. See Figure 11–2b.
❸ Mark a dot for each pair of scores. For the first student, the number of hours slept last night was 7. Move across to 7 on the horizontal axis. The happy mood rating for the first student was 4, so move up to the point across from the 4 on the vertical axis. Place a dot at this point, as shown in Figure 11–2c. Do the same for each of the other five students. The result should look like Figure 11–2d.
Person X Y
A 3 4 B 6 7 C 1 2 D 4 6
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Patterns of Correlation Linear and Curvilinear Correlations In each example so far, the pattern in the scatter diagram very roughly approximates a straight line. Thus, each is an example of a linear correlation. In the scatter diagram for the study of happy mood and sleep (Figure 11–2d), you could draw a line show- ing the general trend of the dots, as we have done in Figure 11–4. Notice that the scores do not all fall right on the line. Notice, however, that the line does describe the general tendency of the scores. (In Chapter 12 you learn the precise rules for draw- ing such a line.)
Sometimes, however, the general relationship between two variables does not follow a straight line at all, but instead follows the more complex pattern of a curvilinear correlation. Consider, for example, the relationship between a person’s level of kindness and the degree to which that person is desired by others as a poten- tial romantic partner. There is evidence suggesting that, up to a point, a greater level of kindness increases a person’s desirability as a romantic partner. However, beyond that point, additional kindness does little to increase desirability (Li et al., 2002). This particular curvilinear pattern is shown in Figure 11–5. Notice that you could not draw a straight line to describe this pattern. Some other examples of curvilinear relation- ships are shown in Figure 11–6.
Correlation 437
linear correlation relation between two variables that shows up on a scatter diagram as the dots roughly following a straight line.
curvilinear correlation relation be- tween two variables that shows up on a scatter diagram as dots following a sys- tematic pattern that is not a straight line.
Answers
1.A scatter diagram is a graph that shows the relation between two variables. One axis is for one variable; the other axis, for the other variable. The graph has a dot for each individual’s pair of scores. The dot for each pair is placed above that of the score for that pair on the horizontal axis variable and directly across from the score for that pair on the vertical axis variable.
2.(a) The variable that is doing the predicting goes on the horizontal axis. (b) The variable that is being predicted goes on the vertical axis.
3.See Figure 11–3.
8
7
6
5
4
3
2
1
0
H ap
py M
oo d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night
Figure 11–4 Scatter diagram from Figure 11–2d with a line drawn to show the general trend.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
438 Chapter 11
Kindness
D es
ir ab
ili ty
Figure 11–5 Example of a curvilinear relationship: desirability and kindness.
Pe rc
en t W
ho R
em em
be r
E ac
h It
em
Beginning Middle End
Position of Item in the List
(b)
Fe el
in g
0
Stimulus Complexity
(a)
R at
e of
S ub
st itu
tio n
of D
ig its
f or
S ym
bo ls
0 1 2 3 4
Motivation
(c)
5
+
−
Simple, familiar
Simple, complex, novel, familiar
Complex, novel
Figure 11–6 Examples of curvilinear relationships: (a) the way we feel and the complex- ity of a stimulus; (b) the number of people who remember an item and its position on a list; and (c) children’s rate of and motivation for substituting digits for symbols.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 439
no correlation no systematic relation- ship between two variables.
In co
m e
Shoe Size
Figure 11–7 Two variables with no association with each other: income and shoe size (fictional data).
The usual way of figuring the correlation (the one you learn shortly in this chap- ter) gives the degree of linear correlation. If the true pattern of association is curvi- linear, figuring the correlation in the usual way could show little or no correlation. Thus, it is important to look at scatter diagrams to identify these richer relationships rather than automatically figuring correlations in the usual way, assuming that the only relationship is a straight line.
No Correlation It is also possible for two variables to be essentially unrelated to each other. For ex- ample, if you were to do a study of income and shoe size, your results might appear as shown in Figure 11–7. The dots are spread everywhere, and there is no line, straight or otherwise, that is any reasonable representation of a trend. There is simply no correlation.
Positive and Negative Linear Correlations In the examples so far of linear correlations, such as exciting activities and martial sat- isfaction, high scores go with high scores, lows with lows, and mediums with medi- ums. This is called a positive correlation. (One reason for the term “positive” is that in geometry, the slope of a line is positive when it goes up and to the right on a graph like this. Notice that in Figure 11–4 the positive correlation between happy mood and sleep is shown by a line that goes up and to the right.)
Sometimes, however, high scores on one variable go with low scores on the other variable and lows with highs. This is called a negative correlation. For example, in the newspaper survey about marriage, the researchers also asked about boredom with the relationship and the partner. Not surprisingly, the more bored a person was, the lower was the person’s marital satisfaction. That is, low scores on one variable went with high scores on the other. Similarly, the less bored a person was, the higher the marital satisfaction. This is shown in Figure 11–8, where we also put in a line to em- phasize the general trend. You can see that as it goes from left to right, the line slopes slightly downward.
Another example of a negative correlation is from organizational psychology. A well established finding in that field is that absenteeism from work has a negative
negative correlation relation between two variables in which high scores on one go with low scores on the other, mediums with mediums, and lows with highs; on a scatter diagram, the dots roughly follow a straight line sloping down and to the right.
positive correlation relation between two variables in which high scores on one go with high scores on the other, mediums with mediums, and lows with lows; on a scatter diagram, the dots roughly follow a straight line sloping up and to the right.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
440 Chapter 11
Bored with Relationship
60
50
40
30
20
10
0 2 4 6 10 12
M ar
ita l S
at is
fa ct
io n
80
Figure 11–8 Scatter diagram with the line drawn in to show the general trend for a neg- ative correlation between two variables: greater boredom with the relationship goes with lower marital satisfaction. (Data from Aron et al., 2000.)
linear correlation with satisfaction with the job (e.g., Mirvis & Lawler, 1977): that is, the higher the level of job satisfaction, the lower the level of absenteeism. Put another way, the lower the level of job satisfaction is, the higher the absenteeism be- comes. Research on this topic has continued to show this pattern all over the world (e.g., Punnett et al., 2007), and the same pattern is found for university classes: the more satisfied students are, the less they miss class (Yorges et al., 2007).
Strength of the Correlation What we mean by the strength of the correlation is how much there is a clear pat- tern of some particular relationship between two variables. For example, we saw that a positive linear correlation is when high scores go with highs, mediums with mediums, lows with lows. The strength (or degree) of such a correlation, then, is how much highs go with highs, and so on. Similarly, the strength of a negative lin- ear correlation is how much the highs on one variable go with the lows on the other, and so forth. In terms of a scatter diagram, there is a “large” (or “strong”) linear correlation if the dots fall close to a straight line (the line sloping up or down depending on whether the linear correlation is positive or negative). A perfect lin- ear correlation means all the dots fall exactly on the straight line. There is a “small” (or “weak”) correlation when you can barely tell there is a correlation at all; the dots fall far from a straight line. The correlation is “moderate” (also called a “medium” correlation) if the pattern of dots is somewhere between a small and a large correlation.
Importance of Identifying the Pattern of Correlation The procedure you learn in the next main section is for figuring the direction and strength of linear correlation. As we suggested earlier, the best approach to such a problem is first to make a scatter diagram and to identify the pattern of correla- tion. If the pattern is curvilinear, then you would not go on to figure the linear correlation. This is important because figuring the linear correlation when the
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 441
true correlation is curvilinear would be misleading. (For example, you might con- clude that there is little or no correlation when in fact there is a quite strong rela- tionship; it is just not linear.) You should assume that the correlation is linear, unless the scatter diagram shows a curvilinear correlation. We say this, because when the linear correlation is small, the dots will fall far from a straight line. In such situations, it can sometimes be hard to imagine a straight line that roughly shows the pattern of dots.
If the correlation appears to be linear, it is also important to “eyeball” the scatter diagram a bit more. The idea is to note the direction (positive or negative) of the lin- ear correlation and also to make a rough guess as to the strength of the correlation. Scatter diagrams with varying directions and strengths of correlation are shown in Figure 11–9. For example, scatter diagram (a) in Figure 11–9 shows a large positive correlation, because the dots fall relatively close to a straight line, with low scores
(a) (b)
(c) (d)
(e) (f)
Figure 11–9 Examples of scatter diagrams with different degrees of correlation.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
442 Chapter 11
How are you doing?
1. What is the difference between a linear and curvilinear correlation in terms of how they appear in a scatter diagram?
2. What does it mean to say that two variables have no correlation? 3. What is the difference between a positive and negative linear correlation?
Answer this question in terms of (a) the patterns in a scatter diagram and (b) what those patterns tell you about the relationship between the two variables.
4. For each of the scatter diagrams shown in Figure 11–10, say whether the pat- tern is roughly linear, curvilinear, or no correlation. If the pattern is roughly lin- ear, also say if it is positive or negative, and whether it is large, moderate, or small.
5. Give two reasons why it is important to identify the pattern of correlation in a scatter diagram before proceeding to figure the precise correlation.
going with low scores and highs with highs. Scatter diagram (d), however, shows a negative correlation (there is a general tendency for lows to be with highs and highs with lows) that is of a moderate size (the dots fall too far from a straight line to be a large correlation, but are not so far apart that it is a small correlation). Using a scat- ter diagram to examine the direction and approximate strength of correlation is im- portant because it lets you check to see whether you have made a major mistake when you then do the figuring you learn in the next section.
(a)
(c)
(b)
(d)
Figure 11–10 Scatter diagrams for “How are you doing?” question 4.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 443
product of deviation scores the result of multiplying the deviation score on one variable by the deviation score on another variable.
The Correlation Coefficient Looking at a scatter diagram gives you a rough idea of the relationship between two variables, but it is not a very precise approach. What you need is a number that gives the exact correlation (in terms of its direction and strength).
Logic of Figuring the Linear Correlation A linear correlation (when it is positive) means that highs go with highs and lows with lows. Thus, the first thing you need in figuring the correlation is some consis- tent way to measure what is a high score and what is a low score. An efficient way to solve this problem is to use deviation scores—that is, the raw score minus the mean ( for one variable and for the other variable). A raw score above the mean (that is, a high score) will always give a positive deviation score and a raw score below the mean (that is, a low score) will always give a nega- tive deviation score.
There is an additional and very important reason why deviation scores are so use- ful when figuring the correlation. It has to do with what happens if you multiply a score on one variable by a score on the other variable and get the product. When using deviation scores, this is called a product of deviation scores (or product of deviations). If you multiply a positive deviation score on one variable by a positive deviation score on another variable (each positive deviation score represents a raw score above the mean), you will always get a positive product. Further—and here is where it gets in- teresting—if you multiply a negative deviation score by a negative deviation score (each negative deviation score represents a raw score below the mean), you also get a positive product.
Y - MYdeviation scores = X - MX
Answers
1.In a linear correlation, the pattern of dots roughly follows a straight line (al- though with a small correlation, the dots will be spread widely around a straight line); in a curvilinear correlation, there is a clear systematic pattern to the dots, but it is not a straight line.
2.Two variables have no correlation when there is no pattern of relationship between them.
3.(a) In a scatter diagram for a positive linear correlation, the line that roughly describes the pattern of dots goes up and to the right; in a negative linear cor- relation, the line goes down and to the right. (b) In a positive linear correlation, the basic pattern is that high scores on one variable go with high scores on the other, mediums go with mediums, and lows go with lows; in a negative linear correlation, high scores on one variable go with low scores on the other, medi- ums go with mediums, and lows go with highs.
4.In Figure 11–10: (a) linear, negative, large; (b) curvilinear; (c) linear, positive, large; (d) no correlation.
5.Identifying whether the pattern of correlation in a scatter diagram is linear tells you whether it is appropriate to use the standard procedures for figuring a lin- ear correlation. If it is linear, identifying the direction and approximate strength of correlation before doing the figuring lets you check the results of your figur- ing when you are done.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
444 Chapter 11
So, if highs on one variable go with highs on the other, and lows on one go with lows on the other, the products of deviation scores always will be positive. Consid- ering a whole distribution of scores, suppose you take each person’s deviation score on one variable and multiply it by that person’s deviation score on the other variable. The result of doing this when highs go with highs and lows with lows is that the products all come out positive. If you sum up these products of deviation scores for all the people in the study, which are all positive, you will end up with a big positive number.
On the other hand, with a negative correlation, highs go with lows and lows with highs. In terms of deviation scores, this would mean positives with negatives and negatives with positives. Multiplied out, that gives all negative products of de- viations scores. If you add all these negative products together, you get a big nega- tive number.
Finally, suppose there is no linear correlation. In this situation, for some people highs on one variable would go with highs on the other variable (and some lows would go with lows), making positive products of deviations. For other people, highs on one variable would go with lows on the other variable (and some lows would go with highs), making negative products. Adding up these products for all the people in the study would result in the positive products and the negative products canceling each other out, giving a result around 0.
In each situation, we changed all the scores to deviation scores, multiplied the two deviation scores for each person by each other, and added up these products of devi- ations. The result was a large positive number if there was a positive linear correla- tion, a large negative number if there was a negative linear correlation, and 0 if there was no linear correlation.
Table 11–2 summarizes the logic up to this point. The table shows the effect on the correlation of different patterns of raw scores and resulting deviation scores. For example, the first row shows a high score on X going with a high score on Y. In this situation, the deviation score for variable X is a positive number (since X is a high number, above the mean of X ), and similarly the deviation score for variable Y is a positive number (since Y is a high number, above the mean of Y ). Thus, the product of these two positive deviation scores must be a positive number (since a positive number multiplied by a positive number always gives a positive number). The overall
Table 11–2 The Effect on the Correlation of Different Patterns of Raw Scores and Deviation Scores
Product of Pair of Scores Deviation Scores Deviation Scores
Effect on CorrelationX Y
High High Contributes to positive correlation
Low Low Contributes to positive correlation
High Low Contributes to negative correlation
Low High Contributes to negative correlation
Middle Any Zero , , or Zero Zero Makes correlation near zero
Any Middle , , or Zero Zero Zero Makes correlation near zero
Note: indicates a positive number; indicates a negative number-+
-+ -+
-+- --+ +-- +++
(X � MX)(Y � MY )Y � MYX � MX
T I P F O R S U C C E S S Test your understanding of correla- tion by covering up portions of Table 11–2 and trying to recall the hidden information.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 445
effect is that when a high score on X goes with a high score on Y, the pair of scores contribute toward making a positive correlation. The table shows that positive prod- ucts of deviation scores contribute toward making a positive correlation, negative products of deviation scores contribute toward making a negative correlation, and products of deviation scores that are zero (or close to zero) contribute toward making a correlation of zero.
However, you are still left with the problem of figuring the precise strength of a positive or negative correlation. The larger the number is (that is, the farther from zero), the stronger the correlation will be. But how large is large, and how large is not very large? You can’t judge from the sum of the products of deviations alone, which gets bigger just by adding the products of more persons together. For exam- ple, a study with 100 people would have a larger sum of products of deviations than the same study with only 25 people. The sum of the products also gets larger if the scores are on a more spread-out scale. For example, a study in which the scores on the two variables have a lot of variation, so they range from, say, 0 to 50, will have much larger products of deviation scores (and thus a larger sum of the products) than a study in which the scores on the two variables have less variation and range from, say, 0 to 10. This is because you are multiplying larger deviation scores by each other.
The upshot of all this is the sign ( or ) of the sum of the products of devia- tion scores tells you the direction of the correlation. And the bigger it is (ignoring the sign), the more positive or negative it is. But it is hard to know from the sum of the products of deviation scores just how strong the correlation is because the number of people in the study and the amount of variation of the scores for each variable both affect the size of the sum of the products of deviation scores.
The solution to finding the precise degree of correlation is to divide this sum of the products of deviations by a number that corrects for both the number of people in the study and the variation of the scores for each variable. It turns out that this num- ber is based on the sum of the squared deviations of each variable. This is because the more people there are in the study, the more squared deviations are being summed and because the more variation there is in the scores for each variable, the larger will be the squared deviations being summed. That is, to adjust our sum of products, we use a correction number that has two properties:
1. It gets larger with more people. 2. It gets larger as the scores for each variable have more variation.
These two properties of the correction number mean that it serves two very important purposes: it adjusts for the number of people in the study, and it adjusts for the different variation in scores for each variable.
The actual specific correction number that is used is the square root of what you get when you take the sum of squared deviations for each variable (the SS or sum of squares you figure when figuring the variance), multiply the two sums of squares by each other, and take the square root: However, we will turn to the for- mulas shortly.
So how do you actually use this number to make the correction? You divide the sum of products of deviations by this correction number. It turns out that the result of dividing the sum of the product of deviation scores by the correction number can never be more than , which would be a perfect positive linear correlation. It can never be less than , which would be a perfect negative linear correlation. In the situation of no linear correlation, the result is 0.
-1 +1
1(SSX) (SSY).
-+
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
446 Chapter 11
For a positive linear correlation that is not perfect (it is extremely rare to find a perfect correlation), the result of taking the sum of the products of deviation scores and dividing by the correction number is a number between 0 and . To put this an- other way, if the general trend of the dots is upward and to the right, but they do not fall exactly on a single straight line, the result of this process is between 0 and . The same rule holds for negative correlations: they fall between 0 and . So, over- all, a correlation varies from to .
Interpreting the Correlation Coefficient The result of dividing the sum of the products of deviation scores by the correction number is called the correlation coefficient. It is also called the Pearson correlation coefficient (or the Pearson product-moment correlation coefficient, to be very traditional). It is named after Karl Pearson (whom you meet in Box 13–1). Pearson, along with Francis Galton (see Box 11–1 in this chapter), played a major role in developing the correlation coefficient. The correlation coefficient is abbreviated by
+1-1 -1
+1
+1
T I P F O R S U C C E S S If you figure a correlation coeffi- cient to be larger than or less than , you have made a mistake in your figuring.
-1 +1
correlation coefficient (r) measure of degree of linear correlation between two variables ranging from (a perfect negative linear correlation) through 0 (no correlation) to (a perfect positive correlation).
+1
-1
BOX 11–1 Galton: Gentleman Genius Francis Galton is credited with inventing the correlation statis- tic. (Karl Pearson, the hero of our Chapter 13, worked out the formulas, but Pearson was a stu- dent of Galton and gave Galton all the credit.) Statistics at this time (around the end of the 19th century) was a tight little British club. In fact, most of science was an only slightly larger club.
Galton also was influenced greatly by his own cousin, Charles Darwin.
Galton was a typical eccentric, independently wealthy gentleman scientist. Aside from his work in statistics, he possessed a medical degree, had explored “darkest Africa,” invented glasses for reading underwater, exper- imented with stereoscopic maps, dabbled in meteorology and anthropology, and wrote a paper about receiving in- telligible signals from the stars.
Above all, Galton was a compulsive counter. Some of his counts are rather infamous. Once while attending a lec- ture he counted the fidgets of an audience per minute, look- ing for variations with the boringness of the subject matter. While twice having his picture painted, he counted the artist’s brush strokes per hour, concluding that each portrait required an average of 20,000 strokes. While walking the streets of various towns in the British Isles, he classified the beauty of the female inhabitants by fingering a recording device in his pocket to register good, medium, or bad.
Galton’s consuming interest, however, was the count- ing of geniuses, criminals, and other types in families. He wanted to understand how each type was produced so that science could improve the human race by encouraging governments to enforce eugenics—selective breeding for intelligence, proper moral behavior, and other qualities— to be determined, of course, by the eugenicists. (Eugenics has since been generally discredited.) The concept of cor- relation came directly from his first simple efforts in this area, the study of the relation of the height of children to their parents.
At first, Galton’s method of exactly measuring the ten- dency for “one thing to go with another” seemed almost the same as proving the cause of something. For exam- ple, if it could be shown mathematically that most of the brightest people came from a few highborn British fami- lies and most of the least intelligent people came from poor families, that seemed at first to “prove” that intelli- gence was caused by the inheritance of certain genes (pro- vided that you were prejudiced enough to overlook the differences in educational opportunities). Now the study only proves that if you were a member of one of those highborn British families, history would make you a prime example of how easy it is to misinterpret the meaning of a correlation.
You can learn more about Galton on the following Web page: http://www-history.mcs.st-andrews.ac.uk/Biographies/ Galton.html.
Sources: Peters (1987); Salsburg (2001); Tankard (1984).
Corbiss/Bettman
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 447
(a) (b)
(c) (d)
(e) (f)
r = .81 r = −.75
r = .46 r = −.42
r = .16 r = −.18
Figure 11–11 Examples of scatter diagrams and correlation coefficients for different degrees of linear correlation.
the letter r, which is short for regression, an idea closely related to correlation (see Chapter 12).
The sign ( or ) of a correlation coefficient tells you the direction of the linear correlation between two variables (a positive correlation or a negative cor- relation). The actual value of the correlation coefficient—from a low of 0 to a high of 1, ignoring the sign of the correlation coefficient—tells you the strength of the linear correlation. So, a correlation coefficient of represents a larger linear cor- relation than a correlation of . Similarly, a correlation of represents a larger linear correlation than (since .90 is bigger than .85). Another way of thinking of this is that, in a scatter diagram, the closer the dots are to falling on a single straight line, the larger the linear correlation. Figure 11–11 shows the scatter diagrams from Figure 11–9, with the correlation coefficient shown for each
+ .85 - .90+ .42
+ .85
-+
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
448 Chapter 11
T I P F O R S U C C E S S When changing the raw scores to deviation scores, it is easiest (and you will make fewer mistakes) if you do all the deviation scores for one variable and then all the devia- tion scores for the other variable. Also, to make sure you have done it correctly, when you finish all the deviation scores for a variable, add them up; they should add up to 0 (within rounding error).
scatter diagram. Be sure that the correlation coefficient for each scatter diagram agrees roughly with the correlation coefficient you would expect based on the pat- tern of dots.
Formula for the Correlation Coefficient The correlation coefficient, as we have seen, is the sum of the products of deviation scores divided by a correction number that takes into account the number of people and the variation on each variable being correlated. Put as a formula,
(11–1)
r is the correlation coefficient. is the deviation score for each person on the X variable and is the deviation score for each person on the Y variable; ( )( ) is the product of deviation scores for each person; and
is the sum of the products of deviation scores over all the people in the study. is the sum of squared deviations for the X variable and is the sum of squared deviations for the Y variable.1
Steps for Figuring the Correlation Coefficient Here are the steps for figuring the correlation coefficient.
❶ Change the scores for each variable to deviation scores. Figure the mean of each variable. Then subtract each variable’s mean from each of its scores. (This is just what you have been doing all along as part of figuring the variance.)
❷ Figure the product of the deviation scores for each pair of scores. That is, for each pair of scores, multiply the deviation score on one variable by the deviation score on the other variable.
❸ Add up all the products of the deviation scores. ❹ For each variable, square each deviation score. ❺ Add up the squared deviation scores for each variable. ➏ Multiply the two sums of squared deviations and take the square root of the
result. This creates a correction number. ❼ Divide the sum of the products of deviation scores from Step ❸ by the cor-
rection number from Step ➏.
An Example Let us try these steps with the sleep and mood example.
❶ Change the scores for each variable to deviation scores. Starting with the number of hours slept last night, the mean is 7 (sum of 42 divided by 6 stu- dents). The deviation score for the first student’s sleep score is . We figured the rest of the deviation scores for each variable and show them in the
and columns in Table 11–3. ❷ Figure the product of the deviation scores for each pair of scores. For the first
student, multiply 0 by 0 to give 0. The products of deviation scores for all the stu- dents are shown in the last column of Table 11–3.
Y - MYX - MX
7 - 7 = 0
SSYSSX
© 3(X - MX)(Y - MY)4 Y - MYX - MX
Y - MY X - MX
r = g3(X - MX)(Y - MY)4
2(SSX)(SSY)
The correlation coefficient is the sum, over all the people in the study, of the product of each person’s two deviation scores, divided by the result of taking the square root of what you get when you multiply the sum of everyone’s squared deviation scores on the X variable by the sum of everyone’s squared deviation scores on the Y variable.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 449
❸ Add up all the products of the deviation scores. Adding up all the products of the deviation scores, as shown in Table 11–3, gives a sum of 16.
❹ For each variable, square each deviation score. For the first student, the squared deviation for the sleep variable is 0 multiplied by 0, which is 0. The squared de- viation scores for all the students for the sleep variable are shown in the
column of Table 11–3. The squared deviation scores for all the stu- dents for the happy mood variable are shown in the column.
❺ Add up the squared deviation scores for each variable. As shown in Table 11–3, the sum of squared deviations for the sleep variable is 16 and the sum of squared deviations for the happy mood variable is 22.
➏ Multiply the two sums of squared deviations and take the square root of the result. Multiplying 16 by 22 is 352, and the square root of 352 is 18.76.
❼ Divide the sum of the products of deviation scores from Step ❸ by the cor- rection number from Step ➏. Dividing 16 by 18.76 gives a result of .85. This is the correlation coefficient. (Note that correlation coefficients are usually rounded to two decimal places.)
In terms of the correlation coefficient formula,
Because this correlation coefficient is positive and near 1, the highest possible value, this is a very large positive linear correlation.
A Second Example Suppose that a memory researcher does an experiment to test a theory predicting that the number of exposures to a word increases the chance that the word will be re- membered. One research participant is randomly assigned to be exposed to the list of 10 words once, one participant to be exposed to the list twice, and so forth, up to a total of eight exposures to each word. This makes eight participants in all, one for
r = g3(X - MX)(Y - MY)4
2(SSX)(SSY) =
16
18.76 = .85
(Y - MY)2 (X - MX)2
Table 11–3 Figuring the Correlation Coefficient for the Sleep and Mood Study (Fictional Data)
Number of Hours Slept (X ) Happy Mood (Y )
Deviation Deviation Products of ❷ Deviation ❶ Squared ❹ Deviation ❶ Squared ❹ Deviation Scores
X Y
7 0 0 4 0 0 0
5 4 2 4 4
8 1 1 7 3 9 3
6 1 2 4 2
6 1 3 1 1
10 3 9 6 2 4 6
❺ ❺ ❸
❼
❻
r = ©3(X - MX )(Y - MY )4
2(SSX )(SSY ) =
16
2(16)(22) =
16
2352 =
16 18.76
= .85
M = 4M = 7
© = 16© = SSY = 22© = 24© = SSX = 16© = 42
-1-1 -2-1
-2-2
(X � MX) (Y � MY)(Y � MY ) 2Y � MY(X � MX )
2X � MX
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
450 Chapter 11
each of the eight levels of exposure. The researchers record how many of the 10 words each participant is able to remember. Results are shown in Table 11–4. (An actual study of this kind would probably show a pattern in which the relative improvement in recall is less at higher numbers of exposures.) The steps for figuring the correlation coefficient are shown in Table 11–5.
❶ Change the scores for each variable to deviation scores. The mean of the number of exposures is 4.5. Thus, the first exposure score of 1 gives a deviation score of . Using the same procedure for all the other scores gives the deviation scores shown in the and columns in Table 11–5.
❷ Figure the product of the deviation scores for each pair of scores. For the first person, multiply by to give 7. The products of deviation scores for all the scores are shown in the last column of Table 11–5.
❸ Add up all the products of the deviation scores. Adding up all the products of the deviation scores, as shown in Table 11–5, gives a sum of 30.
❹ For each variable, square each deviation score. For the first person, the squared deviation for the number of exposures variable is multiplied by , which is 12.25. The squared deviation scores for all the scores are shown in the
and ( columns of Table 11–5. ➎ Add up the squared deviation scores for each variable. As shown in Table
11–5, the sum of squared deviations for the number of exposures variable is 42, and the sum of squared deviations for the number of words recalled variable is 32.
➏ Multiply the two sums of squared deviations and take the square root of the result. Multiplying 42 by 32 is 1344, and the square root of 1344 is 36.66.
❼ Divide the sum of the products of deviation scores from Step ❸ by the cor- rection number from Step ➏. Dividing 30 by 36.66 gives a result of .82. This is the correlation coefficient.
Y - MY)2(X - MX)2
-3.5-3.5
-2-3.5
Y - MYX - MX 1 - 4.5 = -3.5
Table 11–4 Effect of Number of Exposures to Words on the Number of Words Recalled (Fictional Data)
Number of Number of Exposures Words Recalled
1 3
2 2
3 6
4 4
5 5
6 5
7 6
8 9
Table 11–5 Figuring the Correlation Coefficient for the Effect of Number of Exposures to Each Word on the Number of Words Recalled (Fictional Data)
Number of Exposures (X ) Number of Words Recalled (Y )
Deviation Deviation Products of ❷ Deviation ❶ Squared ❹ Deviation ❶ Squared ❹ Deviation Scores
X Y
1 12.25 3 4 7.0
2 6.25 2 9 7.5
3 2.25 6 1 1
4 .25 4 1 .5
5 .5 .25 5 0 0 0
6 1.5 2.25 5 0 0 0
7 2.5 6.25 6 1 1 2.5
8 3.5 12.25 9 4 16 14
➎ ➎ ❸
➐
❻
r = ©3(X - MX )(Y - MY )4
2(SSX )(SSY ) =
30
2(42)(32) =
30
21344 =
30 36.66
= .82
M = 5M = 4.5
© = 30© = SSY = 32© = 40© = SSX = 42© = 36
-1- .5 -1.5-1.5
-3-2.5 -2-3.5
(X � MX )(Y � MY )(Y � MY ) 2Y � MY(X � MX )
2X � MX
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 451
In terms of the correlation coefficient formula,
Because this correlation coefficient is positive and near 1, the highest possible value, this is a very large positive linear correlation.
r = g3(X - MX)(Y - MY)4
2(SSX)(SSY) =
30
36.66 = .82
How are you doing?
1. Why do we change the scores for each variable into deviation scores in the first step of figuring the correlation coefficient?
2. Explain the logic of using the sum of the products of deviation scores as the numerator of the formula for the correlation coefficient.
3. When figuring the correlation coefficient, why do you divide the sum of the products of deviation scores by a correction number?
4. Write the formula for the correlation coefficient and define each of the symbols. 5. Figure the correlation coefficient for the following scores for three people who
were each tested on two variables, X and Y.
Person X Y
K 5 10 L 4 10 M 3 13
4.Formula for the correlation coefficient: . ris the corre-
lation coefficient; is the symbol for sum of—add up all the scores that fol- low (in this formula, you add up all the products of deviation scores that follow);
is the deviation score for each person on the Xvariable; Y�is the deviation score for each person on the Yvariable; is the sum of squared deviations for the Xvariable; is the sum of squared deviations for the Yvariable.
5.As shown in Table 11–6, . r=-.87
SSY
SSX
MY X-MX
g
r= g3(X-MX)(Y-MY)4
1(SSX)(SSY)
Table 11–6Figuring the Correlation Coefficient for “How are you doing?”Question 5
XY
DeviationDeviationProducts of ❷ Deviation ❶Squared ❹Deviation ❶Squared ❹Deviation Scores
XY
511101
4001010
311324
❺❺❸
❼
❻
r= ©3(X-MX)(Y-MY)4
2(SSX)(SSY) =
-3
2(2)(6) =
-3
212 =
-3 3.46
=-.87
M=11 M=4
©=-3 ©=SSY=6 ©=33 ©=SSX=2 ©=12 -2 -1
-1 -1 -1
(X�MX)(Y�MY) (Y�MY)2 Y�M Y (X�MX)2 X�M
X
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
452 Chapter 11
T I P F O R S U C C E S S You will not be able to make much sense of this section if you have not yet studied Chapters 3 through 7.
Answers
1.We change the scores for each variable into deviation scores because devia- tion scores show directly what is a high score and what is a low score.
2.When both deviation scores are positive (which represent scores above the mean) or both deviation scores are negative (which represent scores below the mean), the products of the deviation scores in each case are positive. Across a whole distribution of high with high and low with low scores, the sumof the products of deviation scores gives a large positive number (indi- cating a positive correlation between the two variables). However, when one deviation score is positive (which represents a score above the mean) and theother deviation score is negative (which represents a score below the mean), the product of the deviation scores is negative. Across a whole distri- bution of high with low (and low with high) scores, the sum of the products of deviation scores gives a large negative number (indicating a negative corre- lation between the two variables). However, when there is no linear correla- tion, the sum of the products of deviation scores will be close to zero, because the positive and negative products of deviation scores will cancel each other out.
3.You divide the sum of the products of deviation scores by a correction num- ber because, otherwise, the more people there are in the study, and the greater the variability of each variable’s scores, the bigger the sum of the products of deviation scores will be, even if the degree of correlation is the same. Dividing by the correction number (which is the result of taking the square root of the result of multiplying the sum of squares for the Xvariable by the sum of squares for the Yvariable) corrects for this.
Significance of a Correlation Coefficient The correlation coefficient is a descriptive statistic, like the mean or standard deviation. The correlation coefficient describes the linear relationship between two variables. However, in addition to describing this relationship, we may also want to test whether it is statistically significant. In the case of a correlation, the question is usually whether it is significantly different from zero. That is, the null hypothesis in hypothesis testing for a correlation is usually that in the population the true relation between the two variable is no correlation ( ).2
The overall logic is much like that we have considered for the various t test and analysis of variance situations discussed in previous chapters. Suppose for a particu- lar population we had the distribution of two variables, X and Y. And suppose further that in this population there was no correlation between these two variables. The scatter diagram might look like that shown in Figure 11–12. Thus, if you were to consider the dot for one random person from this scatter diagram, the scores might be
and . For another random person, it might be and . For a third person, and . The correlation for these three persons would be
. If you then took out another three persons and figured the correlation it might come out to . Presuming there was no actual correlation in the population, if you did this lots and lots of times, you would end up with a distribution of correla- tions with a mean of zero. This is a distribution of correlations of three persons’ each. As shown in Figure 11–13, it would have a mean of zero and be spread out in both directions up to a maximum of 1 and a minimum of .-1
r = - .12 r = .24
Y = 5X = 3 Y = 1X = 2Y = 2X = 4
r = 0
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 453
Y
X
Figure 11–12 Scatter diagram for variables X and Y for a population in which there is no relationship between X and Y.
It would actually be possible to figure out the cutoffs for significance on such a distribution of correlation coefficients, just as we did for example for the F distribu- tion. Then you could just compare your actual r to that cutoff to see if it was signifi- cant. However, we do not need to introduce a whole new distribution with its own tables and such. It turns out that we can figure out a number based on the correlation coefficient that will follow a t distribution. This number is figured using the follow- ing formula:
(11–2)
Notice that in this formula if , . This is because the numerator would be 0 and the result of dividing 0 by any number is 0. Also notice that the bigger the r, the bigger the t.
If you were to take three persons’ scores at random from the distribution with no true correlation, you could figure this t value. For example, for the first three- person example we just considered, the correlation was .24. So,
If you took a large number of such samples of three persons each, computed the correlation and then the t for each, you would eventually have a distribution of t scores. And here is the main point: you
2(1 - .242)>(3 - 2) = .24>2(.9424)>(1) = .25. t = .24>
t = 0r = 0
t = r
2(1 - r2)>(N - 2)
−1 0 +1 r (correlation coefficient)
Figure 11–13 Distribution of correlation coefficients for a large number of samples ( ) drawn from a population with no correlation between variables X and Y.N = 3
The t score for a correlation coefficient is the result of dividing the correlation coefficient by the square root of what you get when you divide one minus the correlation coefficient squared by two less than the number of people in the study.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
454 Chapter 11
could then compare the t score figured in this way for the actual correlation in the study, using the standard t table cutoffs.
As usual with the t statistic, there are different t distributions for different de- grees of freedom. In the case of the t test for a correlation, df is the number of people in the sample minus 2. (We subtract 2 because the whole figuring involved two dif- ferent means, the mean of X and the mean of Y.) In terms of a formula,
(11–3)
Finally, note that the t value will be positive or negative, according to whether your correlation is positive or negative. Thus, as with any t test, the t test for a correlation can be either one-tailed or two-tailed. A one-tailed test means that the researcher has predicted the sign ( or ) of the correlation. However, in practice, even when a re- searcher expects a certain direction of correlation, correlations are usually tested with two-tailed tests.
An Example In the sleep and mood study example, let’s suppose that the researchers predicted a correlation between number of hours slept and happy mood the next day, to be tested at the .05 level, two-tailed.
❶ Restate the question as a research hypothesis and a null hypothesis about the populations. There are two populations:
Population 1: People like those in this study. Population 2: People for whom there is no correlation between number of hours slept the night before and mood the next day.
The null hypothesis is that the two populations have the same correlation. The research hypothesis is that the two populations do not have the same correlation.
❷ Determine the characteristics of the comparison distribution. The compari- son distribution is a t distribution with . (That is,
.) ❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. The t table (Table A–2 in the Appendix) shows that for a two-tailed test at the .05 level, with 4 degrees of freedom, the cut- off t scores are 2.776 and .
❹ Determine your sample’s score on the comparison distribution. We figured a correlation of . Applying the formula to find the equivalent t, we get
❺ Decide whether to reject the null hypothesis. The t score of 3.23 for our sam- ple correlation is more extreme than a cutoff t score of 2.776. Thus, we can re- ject the null hypothesis and the research hypothesis is supported.
Assumptions for the Significance Test of a Correlation Coefficient The assumptions for testing the significance of a correlation coefficient are similar to those for the t test for independent means and analysis of variance. In those situations you have to assume the population for each group follows a normal distribution and
t = r
2(1 - r2)>(N - 2) =
.85
2(1 - .852)>(6 - 2) =
.85
2.0694 = 3.23
r = .85
-2.776
6 - 2 = 4 df = N - 2 =df = 4
-+
df = N - 2 The degrees of freedom for the t test for a correlation are the number of people in the sample minus 2.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 455
has the same variance as the population for the other groups. With the correlation you have to assume that:
1. The population of each variable (X and Y) follows a normal distribution. Actually you also assume that the relationship between the two variables also follows a normal curve. This creates what is called a bivariate normal distribu- tion. In practice, however, we usually check whether we have met the require- ment by checking whether the distribution in the sample for each of our variables is roughly normal.
2. There is an equal distribution of each variable at each point of the other variable. For example, in a scatter diagram, if there is much more variation at the low end than at the high end (or vice versa), this suggests a problem. In prac- tice, you should look at the scatter diagram for your study to see if it looks like the dots are much more spread out at the low or high end (or both). A lot of dots in the middle are to be expected. So long as the greater number of dots in the middle are not a lot more spread out than those at either end, this does not sug- gest a problem with the assumptions.
Like the t tests you have already learned and like the analysis of variance, the t test for the significance of a correlation coefficient is pretty robust to all but extreme violations of its assumptions.
How are you doing?
1. What is the usual null hypothesis in hypothesis testing with a correlation coef- ficient?
2. Write the formula for testing the significance of a correlation coefficient, and de- fine each of the symbols.
3. Use the five steps of hypothesis testing to determine whether a correlation co- efficient of from a study with a sample of 60 people is significant at the .05 level, two-tailed.
4. What are the assumptions for the significance test of a correlation coefficient?
r = - .31
Appendix) shows that for a two-tailed test at the .05 level, with 58 degrees of freedom, the cutoff tscores are 2.004 and (we used the cutoffs for , the closest dfin the table below 58).
❹Determine your sample’s score on the comparison distribution. The cor- relation in the study was . Applying the formula to find the equivalent t, we get
❺Decide whether to reject the null hypothesis. The tscore of for our sample correlation is more extreme than a cutoff tscore of . Thus, we can reject the null hypothesis and the research hypothesis is supported.
4.The population of each variable (and the relationship between them) follows a normal distribution, and there is an equal distribution of each variable at each point of the other variable.
-2.004 -2.48
t= r
2(1-r2)>(N-2)= -.31
2(1-(-.312))>(58)= -.31 .125
=-2.48.
-.31
df=55 -2.004
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
456 Chapter 11
Answers
1.In hypothesis testing with a correlation coefficient, the usual null hypothesis is that in the population the true relation between the two variables is no corre- lation ().
2.Formula for testing the significance of a correlation coefficient: t�
. tis the tstatistic for testing the significance of the
correlation coefficient; ris the correlation coefficient; Nis the number of peo- ple in the study.
3.❶Restate the question as a research hypothesis and a null hypothesis about the populations. There are two populations:
Population 1: People like those in this study. Population 2: People for whom there is no correlation between the two variables.
The null hypothesis is that the two populations have the same correlation. The research hypothesis is that the two populations do not have the same correlation.
❷Determine the characteristics of the comparison distribution. The comparison distribution is a tdistribution with . (That is,
.) ❸Determine the cutoff sample score on the comparison distribution at
which the null hypothesis should be rejected. The ttable (Table A–2in the
60-2=58 df=N-2 = df=58
r
2(1-r2)>(N-2)
r=0
Correlation and Causality If two variables have a significant linear correlation, we normally assume that there is something causing them to go together. However, you can‘t know the direction of causality (what is causing what) just from the fact that the two variables are correlated.
Three Possible Directions of Causality Consider the example with which we started the chapter, the correlation between doing exciting activities with your partner and satisfaction with the relationship. There are three possible directions of causality for these two variables:
1. It could be that doing exciting activities together causes the partners to be more satisfied with their relationship.
2. It could also be that people who are more satisfied with their relationship choose to do more exciting activities together.
3. Another possibility is that something like having less pressure (versus more pressure) at work makes people happier in their marriage and also gives them more time and energy to do exciting activities with their partner.
These three possible directions of causality are shown in Figure 11–14a. The principle is that for any correlation between variables X and Y, there are at
least three possible directions of causality:
1. X could be causing Y. 2. Y could be causing X. 3. Some third factor could be causing both X and Y.
These three possible directions of causality are shown in Figure 11–14b.
direction of causality path of causal effect; if X is thought to cause Y then the direction of causality is from X to Y.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 457
It is also possible (and often likely) that there is more than one direction of causal- ity making two variables correlated.
Ruling Out Some Possible Directions of Causality Sometimes you can rule out one or more of these three possible directions based on additional knowledge of the situation. For example, the correlation between sleep the night before and a happy mood the next day cannot be due to happy mood the next day causing you to sleep more the night before (causality doesn’t go backward in time). But we still do not know whether the sleep the night before caused the happy mood or some third factor, such as a general tendency to be happy, caused people both to sleep well and to be happy on any particular day.
Another way we can rule out alternative directions of causality is by conducting a true experiment. In a true experiment, participants are randomly assigned to a par- ticular level of a variable and then measured on another variable. An example of this is the study in which participants were randomly assigned (say, by flipping a coin) to different numbers of exposures to a list of words, and then the number of words they could remember was measured. There was an .82 correlation between number of exposures and number of words recalled. In this situation, any causality has to be from the variable that was manipulated (number of exposures) to the variable that is measured (words recalled). The number of words recalled can’t cause more expo- sures, because the exposures came first. And a third variable can’t be causing both number of exposures and words recalled because number of exposures was deter- mined randomly; nothing can be causing it other than the random method we used (such as flipping a coin).
Correlational Statistical Procedures versus Correlation Research Methods Discussions of correlation and causality in psychology research are often confused by there being two uses of the word correlation. Sometimes the word is used as the name of a statistical procedure, the correlation coefficient (as we have done in this
Exciting Activities
Marital Satisfaction
Exciting Activities
Marital Satisfaction
Low Work Pressure
Exciting Activities
Marital Satisfaction
X Y
X Y
Z
X Y
(a) (b)
Figure 11–14 Three possible directions of causality (shown with arrows) for a corre- lation for (a) the exciting activities and marital satisfaction example and (b) the general prin- ciple for any two variables X and Y.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
458 Chapter 11
chapter). At other times, the term correlation is used to describe a kind of research design. A correlational research design is any research design other than a true experiment. A correlational research design is not necessarily statistically analyzed using the correlation coefficient, and some studies using experimental research designs are most appropriately analyzed using a correlation coefficient. Hence the confusion. We recommend you take one or more research methods courses to learn more about research designs used in research in psychology.
How are you doing?
1. If anxiety and depression are correlated, what are three possible directions of causality that might explain this correlation?
2. If high school and college grades are correlated, what directions of causality can and cannot be ruled out by the situation?
3. A researcher randomly assigns participants to eat either zero or four cookies and then asks them how full they feel. The number of cookies eaten and feel- ing full are highly correlated. What directions of causality can and cannot be ruled out?
4. What is the difference between correlation as a statistical procedure and a cor- relational research design?
Answers
1.Being depressed can cause a person to be anxious; being anxious can cause a person to be depressed; some third variable (such as some aspect of heredity or childhood traumas) could be causing both anxiety and depression.
2.College grades cannot be causing high school grades (causality doesn’t go backward), but high school grades could be causing college grades (maybe knowing you did well in high school gives you more confidence), and some third variable (such as general academic ability) could be causing students to do well in both high school and college.
3.Eating more cookies can cause participants to feel full. Feeling full cannot have caused participants to have eaten more cookies, because how many cookies were eaten was determined randomly. Third variables can’t cause both, because how many cookies were eaten was determined randomly.
4.The statistical procedure of correlation is about using the formula for the cor- relation coefficient, regardless of how the study was done. A correlational re- search design is any research design other than a true experiment.
Issues in Interpreting the Correlation Coefficient There are a number of subtle cautions in interpreting a correlation coefficient.
The Correlation Coefficient and the Proportionate Reduction in Error or Proportion of Variance Accounted For A correlation coefficient tells you the direction and strength of a linear correlation. Bigger rs (values farther from 0) mean a higher degree of correlation. That is, an r of .60 is a larger correlation than an r of .30. However, most researchers would hold
correlational research design any research design other than a true experiment.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 459
that an r of .60 is more than twice as large as an r of .30. To compare correlations with each other, most researchers square the correlations (that is, they use instead of r). This is called, for reasons you will learn in an Advanced Topic section of Chapter 12, the proportionate reduction in error (and also the proportion of vari- ance accounted for).
For example, a correlation of .30 is an of .09 and a correlation of .60 is an of .36. Thus, a correlation of .60 is actually four times as large as one of .30 (that is, .36 is four times as big as .09).
Restriction in Range Suppose an educational psychologist studies the relation of grade level to knowledge of geography. If this researcher studied students from the entire range of school grade levels, the results might appear as shown in the scatter diagram in Figure 11–15a. That is, the researcher might find a large positive correlation. But suppose the researcher had studied students only from the first three grades. The scatter diagram (see Figure 11–15b) would show a much smaller correlation (the general increasing tendency is in relation to much more noise). However, the researcher would be mak- ing a mistake by concluding that grade level is only slightly related to knowledge of geography over all grades.
The problem in this situation is that the correlation is based on people who include only a limited range of the possible values on one of the variables. (In this example, there is a limited range of grade levels.) It is misleading to think of the correlation as if it applied to the entire range of values the variable might have. This situation is called restriction in range.
It is easy to make such mistakes in interpreting correlations. (You will occasion- ally see them even in published research articles.) Consider another example. Busi- nesses sometimes try to decide whether their hiring tests are correlated with how successful the persons hired turn out on the job. Often, they find very little relation- ship. What they fail to take into account is that they hired only people who did well on the tests. Their study of job success included only the subgroup of high scorers. This example is shown in Figure 11–16.
r2r2
r2
proportionate reduction in error ( ) measure of association between vari- ables that is used when comparing asso- ciations. Also called proportion of variance accounted for.
r2
restriction in range situation in which you figure a correlation but only a limited range of the possible values on one of the variables is included in the group studied.
K no
w le
dg e
of G
eo gr
ap hy
0 1 2 3 4 5 6 7 8 9 10 11 12
School Grade Level
(a)
K no
w le
dg e
of G
eo gr
ap hy
0 1 2 3
School Grade Level
(b)
Figure 11–15 Example of restriction in range comparing two scatter diagrams (a) when the entire range is shown (of school grade level and knowledge of geography) and (b) when the range is restricted (to the first three grades) (fictional data).
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
460 Chapter 11
Yet another example is any study that tries to correlate intelligence with other variables that uses only college students. The problem here is that college students do not include many lower or below-average intelligence students. Thus, a researcher could find a low correlation in such a study. But if the researcher did the same study with people who included the full range of intelligence levels, there could well be a high correlation.
Jo b
Su cc
es s
0 25 50 75
Test Score
(a) Persons tested
Jo b
Su cc
es s
0 25 50 75
Test Score
(b) Persons hired
Figure 11–16 Additional example of restriction in range comparing two scatter diagrams (a) when the entire range is shown (of all persons tested) and (b) when the range is restricted (to just those persons who were hired) (fictional data).
BOX 11–2 Illusory Correlation: When You Know Perfectly Well That If It’s Big, It’s Fat—and You Are Perfectly Wrong
The concept of correlation was not really invented by sta- tisticians. It is one of the most basic of human mental processes. The first humans must have thought in terms of correlation all the time—at least those who survived. “Every time it snows, the animals we hunt go away. Snow belongs with no animals. When the snow comes again, if we follow the animals, we may not starve.”
In fact, correlation is such a typically human and highly successful thought process that we seem to be psychologi- cally organized to see more correlation than is there—like the Aztecs, who thought that good crops correlated with human sacrifices (let’s hope they were wrong), and like the following examples from social psychology of what is called illusory correlation (Hamilton, 1981; Hamilton & Gifford, 1976; Johnson & Mullen, 1994).
Illusory correlation is the term for the overestimation of the strength of the relationship between two variables (the term has also had other special meanings in the past). Right away, you may think of some harmful illusory cor- relations related to ethnicity, race, gender, and age. One source of illusory correlation is the tendency to link two
infrequent and therefore highly memorable events. Sup- pose Group B is smaller than Group A, and in both groups one-third of the people are known to commit certain infre- quent but undesirable acts. In this kind of situation, re- search shows that Group B, whose members are less frequently encountered, will in fact be blamed for far more of these undesirable acts than Group A. This is true even though the odds are greater that a particular act was com- mitted by a member of Group A, since Group A has more members. The problem is that infrequent events stick to- gether in memory. Membership in the less frequent group and the occurrence of less frequent behaviors form an il- lusory correlation. One obvious consequence is that we remember anything unusual done by the member of a mi- nority group better than we remember anything unusual done by a member of a majority group.
Illusory correlation due to “paired distinctiveness” (two unusual events being linked in our minds) may occur because when we first encounter distinctive experiences, we think more about them, processing them more deeply so that they are more accessible in memory later (Johnson &
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 461
Mullen, 1994). If we encounter, for example, members of a minority we don’t see often, or negative acts that we rarely see or hear about, we really think about them. If they are paired, we study them both and they are quicker to return to memory. It also seems that we can continue to process in- formation about groups, people, and their behaviors with- out any awareness of doing so. Sometime along the way, or when we go to make a judgment, we overassociate the un- usual groups or people with the unusual (negative) behav- iors (McConnell et al., 1994). This effect is stronger when information about the groups or people is sparse, as if we try even harder in ambiguous situations to make sense of what we have seen (Berndsen et al., 2001).
Indeed, observing a single instance of a rare group show- ing some unusual behavior, a “one-shot illusory correlation,” is sufficient to create the effect (Risen et al., 2007).
Most illusory correlations, however, occur simply be- cause of prejudices. Prejudices are implicit, erroneous the- ories that we carry around with us. For example, we estimate that we have seen more support for an association between two social traits than we have actually seen:
driving skills and a particular age group; level of acade- mic achievement and a specific ethnic group; certain speech, dress, or social behaviors and residence in some region of the country. One especially interesting example is that most people in business believe that job satisfaction and job performance are closely linked, when in fact the correlation is quite low. People who do not like their jobs can still put in a good day’s work; people who rave about their job can still be lazy about doing it.
By the way, some people form their implicit theories impulsively and hold them rigidly; others seem to base them according to what they remember about people and change their theories as they have new experiences (McConnell, 2001). Which are you?
The point is, the next time you ask yourself why you are struggling to learn statistics, it might help to think of it as a quest to make ordinary thought processes more moral and fair. So, again, we assert that statistics can be downright romantic: it can be about conquering dark, evil mistakes with the pure light of numbers, subduing the lie of prejudices with the honesty of data.
Unreliability of Measurement Suppose the number of hours slept and mood the next day have a very high degree of correlation. However, suppose also that in a particular study the researcher had asked people about their sleep on a particular night three weeks ago and about their mood on the day after that particular night. There are many problems with this kind of study, but one is that the measurement of hours slept and mood would not be very accurate. For example, what a person recalls about how many hours were slept on a particular night three weeks ago is probably not very close to how many hours the per- son actually slept. Thus, the true correlation between sleep and mood could be high, but the correlation in the particular study might be quite low, just because there is lots of “random noise” (random inaccuracy) in the scores.
Here is another way to understand this issue: think of a correlation in terms of how close the dots in the scatter diagram fall to a straight line. One of the reasons why dots may not fall close to the line is inaccurate measurement.
Consider another example. Height and social power have been found in many studies to have a moderate degree of correlation. However, if someone were to do this study and measure each person’s height using an elastic measuring tape, the cor- relation would be much lower. Some other examples of not fully accurate measure- ment are personality questionnaires that include items that are difficult to understand (or are understood differently by different people), ratings of behavior (such as children’s play activity) that require some subjective judgment, or physiological mea- sures that are influenced by things like ambient magnetic fields.
Often in psychology research our measures are not perfectly accurate or reliable (this idea is discussed in more detail in Chapter 15). The result is that a correlation between any two variables is lower than it would be if you had perfect measures of the two variables.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
462 Chapter 11
The reduction in a correlation due to unreliability of measures is called attenuation. More advanced statistics texts and psychological measurement texts describe formulas for correction for attenuation that can be used under some condi- tions. However, studies using such procedures are relatively rare in most areas of psy- chology research.
The main thing to remember from all of this is that, to the extent the measures used in a study are less than perfectly accurate, the correlations reported in that study usu- ally underestimate the true correlation between the variables (the correlation that would be found if there was perfect measurement).
Influence of Outliers The direction and strength of a correlation can be drastically distorted by one or more individual’s scores on the two variables if each pair of scores is a very unusual com- bination. For example, suppose in the sleep and mood example that an additional per- son was added to the study who had not slept at all (0 hours sleep) and yet was extremely happy the next day (8 on the happiness scale). (Maybe the person was going through some sort of manic phase!) We have shown this situation in the scatter dia- gram in Figure 11–17. It turns out that the correlation, which without this added per- son was a large positive correlation ( ), now becomes a small to moderate negative correlation ( )!
As we mentioned in Chapter 2, extreme scores are called outliers (they lie out- side of the usual range of scores, a little like “outlaws”). Outliers are actually a prob- lem in most kinds of statistical analyses and we will have more to say about them in Chapter 14. However, the main point for now is this: if the scatter diagram shows one or more unusual combinations, you need to be aware that these individuals have an especially large influence on the correlation.
r = - .18 r = .85
8
7
6
5
4
3
2
1
0
H ap
py M
oo d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night
r = −.18
Figure 11–17 A scatter diagram for the hours slept last night and happy mood example (see Table 11–1 and Figure 11–2d) with an outlier combination of scores (0 hours slept and happy mood of 8) for an extra person (correlation is now compared to without the extra person).
r = .85r = - .18
outliers scores with an extreme (very high or very low) value in relation to the other scores in the distribution.
T I P F O R S U C C E S S If you feel you need some extra practice figuring a correlation coef- ficient, add the scores for this extra person to the scores shown in Table 11–1 and verify that the correlation is now indeed .r = - .18
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 463
What If There Is Some Curvilinearity? The Spearman Rho The correlation coefficient, as we have seen, describes the direction and strength of the linear relationship between two variables. It shows us how well the dots in a scat- ter diagram follow a straight line in which highs go with highs and lows go with lows (a positive correlation) or highs go with lows and lows with highs (a negative corre- lation). Sometimes however, as you saw earlier in the chapter, the pattern of dots fol- low a precise pattern, but that pattern is curved. For example, consider Figure 11–6b. In this example, highs go with highs, middle scores go with lows, and low scores go with highs. It is a kind of U shape. There are methods of figuring the degree to which the dots follow such a curved line; these procedures are considered in advanced text- books (e.g., Cohen et al., 2003).
Sometimes however, as shown in Figure 11–5, highs go with highs and lows with lows, but the pattern is still not quite linear. In these particular kinds of situa- tions we can in a sense straighten out the line and then use the ordinary correlation. One way this can be done is by changing all the scores to their rank order. So, sep- arately for each variable, you would rank the scores from lowest to highest (start- ing with 1 for the lowest score and continuing until all the scores have been ranked). This makes the pattern more linear. In fact, we could now proceed to figure the cor- relation coefficient in the usual way, but using the rank-order scores instead of the original scores. A correlation figured in this way is called Spearman’s rho. (It was developed in the 1920s by Charles Spearman, an important British psychologist who invented many statistical procedures to help him solve the problems he was work- ing on, mainly involving the nature and measurement of human intelligence.)
We discuss changing scores to ranks more generally in Chapter 14, and consider Spearman’s rho again in that context. We bring it up now, however, because in some areas of psychology it is common practice to use Spearman’s rho instead of the ordinary cor- relation coefficient, even if the dots do not show curvilinearity. Some researchers pre- fer Spearman’s rho because it works correctly even if the original scores are not based on true equal-interval measurement (as we discussed in Chapter 1). Finally, many researchers like to use Spearman’s rho because it is much less affected by outliers.
Spearman’s rho the equivalent of a correlation coefficient for rank-ordered scores.
How are you doing?
1. (a) What numbers do psychologists use when they compare the size of two correlation coefficients? (b) What are these numbers called? (c) How much larger is a correlation of .80 than a correlation of .20?
2. (a) What is restriction in range? (b) What is its effect on the correlation coefficient? 3. (a) What is unreliability of measurement? (b) What is its effect on the correla-
tion coefficient? 4. (a) What is the outlier combination of scores in the set of scores below?
(b) Why are outliers a potential problem with regard to correlation?
5. Give three reasons why a researcher might choose to use Spearman’s rho instead of the regular correlation coefficient.
X Y
10 41 8 35
12 46 9 37 2 70
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
464 Chapter 11
Effect Size and Power for the Correlation Coefficient The correlation coefficient itself is a measure of effect size. (Thus, in the study of sleep and mood, effect size was .) Cohen’s (1988) conventions for the corre- lation coefficient are .10 for a small effect size, .30 for a medium (or moderate) ef- fect size, and .50 for a large effect size.
Power for a correlation can be determined using a power table, a power software package, or an Internet power calculator. Table 11–7 gives the approximate power for the .05 significance level for small, medium, and large correlations, and one-tailed or two-tailed tests.3 For example, the power for a study with an expected medium effect size ( ), two-tailed, with 50 participants, is .57 (which is below the standard de- sired level of at least .80 power). This means that even if the research hypothesis is in fact true and has a medium effect size (that is, the two variables are correlated at
in the population), there is only a 57% chance that the study will produce a significant correlation.
Planning Sample Size Table 11–8 gives the approximate number of participants needed for 80% power for estimated small, medium, and large correlations, using one-tailed and two-tailed tests, all using the .05 significance level.4
r = .30
r = .30
r = .85
Answers
1.(a) When psychologists compare the size of two correlation coefficients, they use the correlation coefficients squared. (b) The correlation coefficient squared is called the proportionate reduction in error (or proportion of variance accounted for). (c) A correlation of .80 is 16 times larger than a correlation of .20 (for
; for , ; and .64 is 16 times larger than .04). 2.(a) Restriction in range is a situation in correlation in which the scores of the
group of people studied on one of the variables do not include the full range of scores that are found among people more generally. (b) The effect is often to drastically reduce the correlation compared to what it would be if people more generally were included in the study (presuming there would be a corre- lation among people more generally).
3.(a)Unreliability of measurement is when the procedures used to measure a particular variable are not perfectly accurate. (b) The effect is to make the correlation smaller than it would be if perfectly accurate measures were used (presuming there would be a correlation if perfectly accurate measures were used).
4.(a) The outlier combination of scores is the final pair of scores (and ). The other four pairs of scores all suggest a positive correlation
between variables Xand Y,but the final pair of scores is a very low score for variable Xand a very high score for variable Y.(b) Outliers have a larger effect on the correlation than other combinations of scores.
5.First, Spearman’s rho can be used in certain situations when the scatter dia- gram suggests a curvilinear relationship between two variables. Second, Spearman’s rho can be used in certain situations to figure a correlation when the original scores are not based on true equal-interval measurement. Finally, Spearman’s rho is less affected by outliers than the regular correlation coef- ficient.
Y=70 X=2
r2=.04 r=.20 r2=.64 r=.80,
T I P F O R S U C C E S S Do not read this section if you have not studied Chapters 3 through 7.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 465
Table 11–7 Approximate Power of Studies Using the Correlation Coefficient (r ) for Testing Hypotheses at the .05 Level of Significance
Effect Size
Two-tailed
Total N : 10 .06 .13 .33
20 .07 .25 .64
30 .08 .37 .83
40 .09 .48 .92
50 .11 .57 .97
100 .17 .86 *
One-tailed
Total N : 10 .08 .22 .46
20 .11 .37 .75
30 .13 .50 .90
40 .15 .60 .96
50 .17 .69 .98
100 .26 .92 *
*Power is nearly 1.
Small (r � .10)
Medium (r � .30)
Large (r � .50)
How are you doing?
1. What are the conventions for effect size for correlation coefficients? 2. What is the power of a study using a correlation, with a two-tailed test at the
.05 significance level, in which the researchers predict a large effect size and there are 50 participants?
3. How many participants do you need for 80% power in a planned study in which you predict a small effect size and will be using a correlation, two-tailed, at the .05 significance level?
Answers
1.The conventions for effect size and correlation coefficients: , small effect size; , medium effect size; , large effect size.
2.Power is .97. 3.The number of participants needed is 783.
r=.50 r=.30 r=.10
Table 11–8 Approximate Number of Participants Needed for 80% Power for a Study Using the Correlation Coefficient (r ) for Testing a Hypothesis at the .05 Significance Level
Effect Size
Small (r � .10)
Medium (r � .30)
Large (r � .50)
Two-tailed 783 85 28
One-tailed 617 68 22
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
466 Chapter 11
Controversy: What Is a Large Correlation? An ongoing controversy about the correlation coefficient is, “What is a large r?” Traditionally in psychology, a large correlation is considered to be about .50 or above, a moderate correlation to be about .30, and a small correlation to be about .10 (Cohen, 1988). In fact, in many areas of psychology it is rare to find correlations that are greater than .40. Even when we are confident that X causes Y, X will not be the only cause of Y. For example, doing exciting activities together may cause people to be happier in their marriage. (In fact, we have done a number of true experiments sup- porting this direction of causality; Aron et al., 2000.) However, exciting activities is still only one of a great many factors that affect marital satisfaction. All those other factors are not part of our correlation. No one correlation could possibly tell the whole story. Small correlations are also due to the unavoidably low reliability of many mea- sures in psychology.
It is traditional to caution that a low correlation is not very important even if it is statistically significant. (A small correlation can be statistically significant if the study includes a very large number of participants.)
Further, even experienced research psychologists tend to treat any particular size of correlation as meaning more of an association between two variables than it actu- ally does. Michael Oakes (1982) at the University of Sussex gave 30 research psychol- ogists the two columns of numbers shown in Table 11–9. He then asked them to estimate r (without doing any calculations). What is your guess? The intuitions of the British researchers (who are as a group at least as well trained in statistics as psychol- ogists anywhere in the world) ranged from to , with a mean of .24. You can figure the true correlation for yourself. It comes out to .50! That is, what psycholo- gists think a correlation of .50 means in the abstract is a much stronger degree of cor- relation than what they think when they see the actual numbers (which even at only look like .24).
Oakes (1982) gave a different group of 30 researchers just the X column and asked them to fill in numbers in the Y column that would come out to a correlation of .50 (again, just using their intuition and without any figuring). When Oakes figured the actual correlations from their answers, these correlations averaged .68. In other words, once again, even experienced researchers think of a correlation coefficient as meaning more linkage between the two variables than it actually does.
In contrast, other psychologists hold that small correlations can be very impor- tant theoretically. They also can have major practical implications in that small effects may accumulate over time (Prentice & Miller, 1992).
To demonstrate the practical importance of small correlations, Rosnow and Rosenthal (1989) give an example of a now famous study (Steering Committee of the Physicians’ Health Study Research Group, 1988) in which doctors either did or did not take aspirin each day. Whether or not they took aspirin each day was then correlated with heart attacks. The results were that taking aspirin was correlated with heart attacks.5 This means that taking aspirin explains only .1% ( which is .1%) of the variation in whether people get heart attacks. So taking aspirin is only a small part of what affects people getting heart attacks; 99.9% of the variation in whether people get heart attacks is due to other factors (diet, exercise, genetic factors, etc.). However, Rosnow and Rosenthal point out that this correlation of “only ” meant that among the more than 20,000 doctors who were in the study, there were 72 more heart attacks in the group that did not take aspirin. (In fact, there were also 13 more heart attack deaths in the group that did not take aspirin.) Certainly, this difference in getting heart attacks is a difference we care about.
- .034
r2 = -034 * -034 = .001, - .034
r = .50
+ .60- .20
Table 11–9 Table Presented to 30 Psychologists to Estimate r
X Y
1 1
2 10
3 2
4 9
5 5
6 4
7 6
8 3
9 11
10 8
11 7
12 12
Source: Based on Oakes (1982).
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 467
Another argument for the importance of small correlations emphasizes research methods. Prentice and Miller (1992) explain:
Showing that an effect holds even under the most unlikely circumstances possible can be as impressive as (or in some cases, perhaps even more impressive) than showing that it accounts for a great deal of variance. (p. 163)
Some examples they give are studies showing correlations between attractiveness and judgments of guilt or innocence in court cases (e.g., Sigall & Ostrove, 1975). The point is that “legal judgments are supposed to be unaffected by such extraneous fac- tors as attractiveness.” Thus, if studies show that attractiveness is associated with legal judgments even slightly, we are persuaded of just how important attractiveness could be in influencing social judgments in general.
Finally, you should be aware that there is even controversy about the widespread use of Cohen’s (1988) conventions for the correlation coefficient (that is, .10 for a small effect size, .30 for a medium effect size, and .50 for a large effect size). When proposing conventions for effect size estimates, such as the correlation coefficient (r), Cohen himself noted: “. . . these proposed conventions were set forth throughout with much diffidence, qualifications, and invitations not to employ them if possible. The values chosen had no more a reliable basis than my own intuition. They were offered as conventions because they were needed in a research climate characterized by a neglect of issues of [effect size] magnitude” (p. 532). Thus, some researchers strongly suggest that the magnitude of effects found in research studies should not be compared with Cohen’s conventions, but rather with the effects reported in previous similar research studies (Thompson, 2007).
Correlation in Research Articles Scatter diagrams are occasionally included in research articles. For example, Gump and colleagues (2007) conducted a study of the level of lead in children’s blood and the socioeconomic status of their family. The participants were 122 children who were taking part in an ongoing study of the developmental effects of environmental toxi- cants. Between the age of 2 and 3 years, a blood sample was taken from each child (with parental permission), and the amount of lead in each sample was determined with a laboratory test. The researchers measured the socioeconomic status of each child’s family based on the parents’ self-reported occupation and education level. As shown in Figure 11–18 Gump et al. (2007) used a scatter diagram to describe the relation- ship between childhood blood levels and family socioeconomic status. There was a clear linear negative trend, with the researchers noting “. . . increasing family SES [socioeconomic status] was significantly associated with declining blood levels” (p. 300). The scatter diagram shows that children from families with a higher socioe- conomic status had lower levels of lead in their blood. Of course, this is a correlational result; so it does not necessarily mean that family socioeconomic status directly in- fluences the amount of lead in children’s blood. It is possible that some other factor may explain this association, such as a person’s level of education.
Correlation coefficients are very commonly reported in research articles, both in the text of articles and in tables. The result with which we started the chapter would be described as follows: there was a positive correlation ( ) between excitement of activities done with partner and marital satisfaction. Usually, the statistical signif- icance of the correlation will also be reported; in this example, it would be , p � .05.
r = .51
r = .51
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
468 Chapter 11
Tables of correlations are common when several variables are involved. Usually, the table is set up so that each variable is listed down the left and also across the top. The correlation of each pair of variables is shown inside the table. This is called a correlation matrix.
Table 11–10 is a correlation matrix from a study of 114 expert Scrabble players (Halpern & Wai, 2007). (You may remember that we first mentioned this study in
0 10 20 30
Family Socioeconomic Status
C hi
ld ho
od B
lo od
L ea
d L
ev el
s (μ
g/ dL
)
40 50 60
2
4
6
8
10
12
14
Figure 11–18 Children’s family socioeconomic status (Hollingshead Index) as a func- tion of childhood lead levels. Source: Gump, B. B., Reihman, J., Stewart, P., Lonky, E., Darvill, T., & Matthews, K. A. (2007). Blood lead (Pb) levels: A potential environmental mechanism explaining the relation between socioeconomic status and cardiovascular reactivity in children. Health Psychology, 26, 296–304. Published by theAmerican Psychological Association. Reprinted with permission.
Table 11–10 Correlations with Official Scrabble Ratings (Experts Only)
Variable 1 2 3 4 5 6 7 8 9
1. Official Scrabble rating — .116 * .021 .227* .224*
2. Gender — .318* .094 .265* .104 .220* .242*
3. Current age — .167 .727** .088 .769** .515**
4. Age started playing Scrabble — .355* .233* .094 ** .058
5. Age started competing — .096 .112 .386* .121
6. Days of year playing Scrabble — .050
7. Hours per day playing Scrabble — .377*
8. Years of practice — .492**
9. Total hours playing —
* . ** .
Source: Halpern, D. F., & Wai, J. (2007). The world of competitive Scrabble: Novice and expert differences in visiospatial and verbal abilities. Journal of Experimental Psychology: Applied, 13, 79–94. Published by the American Psychological Association. Reprinted with permission.
p 6 .01p 6 .05
(Years * Hours)
- .134 - .196- .093
- .501 - .094 - .181 - .128- .202- .173- .178
correlation matrix common way of reporting the correlation coefficients among several variables in a research article; table in which the variables are named on the top and along the side and the correlations among them are all shown.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 469
Chapter 4.) The researchers asked the expert Scrabble players a series of questions about their Scrabble playing, including the age at which they started playing and the age at which they started competing, the number of days a year and the number of hours per day they play Scrabble, and the number of years they had been practicing. The expert Scrabble players also provided their official Scrabble rating to the re- searchers. Table 11–10 shows the correlations among all the study measures.
This example shows several features that are typical of the way correlation ma- trixes are laid out. First, notice that the correlation of a variable with itself is not given. In this example, a short line is put in instead; sometimes they are just left blank. Also notice that only the upper triangle is filled in. This is because the lower left triangle would contain exactly the same information. For example, the correlation of official Scrabble rating with current age (which is .116) has to be the same as the correlation of current age with official Scrabble rating. Another shortcut saves space across the page: the names of the variables are listed only on the side of the table, with the num- bers for them put across the top.
Looking at this example, among other results, you can see that there is a small to moderate negative correlation between official Scrabble rating and the age at which a person started competing in Scrabble. Also, there is a small to moderate correlation between official Scrabble rating and the years of practice. The asterisks—* and **— after some of the correlation coefficients tell you that those correlations are statisti- cally significant. The note at the bottom of the table tells you the significance levels associated with the asterisks.
1. When two variables are associated in a clear pattern (for example, when high scores on one consistently go with high scores on the other, and lows on one go with lows on the other) the two variables are correlated.
2. A scatter diagram shows the relation between two variables. The lowest to high- est possible values of one variable (the one you are predicting from if one vari- able can be thought of as predicting the other variable) are marked on the horizontal axis. The lowest to highest possible values of the other variable are marked on the vertical axis. Each individual’s pair of scores is shown as a dot.
3. When the dots in the scatter diagram generally follow a straight line, this is called a linear correlation. In a curvilinear correlation, the dots follow a line pattern other than a simple straight line. There is no correlation when the dots do not fol- low any kind of line. In a positive linear correlation, the line goes upward to the right (so that low scores go with lows, mediums with mediums, and highs with highs). In a negative linear correlation, the line goes downward to the right (so that low scores go with highs, mediums with mediums, and highs with lows). The strength of the correlation refers to the degree to which there is a clear pat- tern of relationship between the two variables.
4. The correlation coefficient (r) gives the precise linear correlation between two equal-interval numeric variables. The correlation coefficient is the product of the deviation scores ( and ) divided by a correction number that takes into account the number of people in the study and the variation of each variable’s scores. The correction number is figured as the square root of the re- sult of multiplying the sum of squared deviations for one variable ( ) by the sum of squared deviations for the other variable ( ). The correlation coeffi- cient is highly positive when there is a large positive linear correlation. This is
SSY
SSX
Y - MYX - MX
Summary
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
470 Chapter 11
because positive deviation scores are multiplied by positive, and negative by negative (giving all positive products of deviation scores). The correlation co- efficient is highly negative when there is a large negative linear correlation. This is because negative deviation scores are multiplied by positive deviation scores and positive by negative (giving all negative products of deviation scores). The correlation coefficient is 0 when there is no linear correlation. This is because positives are sometimes multiplied by positives and sometimes by negatives (and vice versa), so that positive and negative products of deviation scores can- cel each other out.
5. The sign ( ) of a correlation coefficient tells you the direction of the linear correlation between two variables. The actual value of the correlation coefficient (ignoring the sign) tells you the strength of the linear correlation. The maximum pos- itive value of r is . when there is a perfect positive linear correlation. The maximum negative value of r is . when there is a perfect negative linear correlation.
6. The statistical significance of a correlation coefficient can be tested by chang- ing the correlation coefficient into a t score and using cutoffs on a t distribution with degrees of freedom equal to the number of people in the study minus two. The t score for a correlation coefficient is the result of dividing the correlation coefficient by the square root of what you get when you divide one minus the correlation coefficient squared by two less than the number of people in the study. The null hypothesis for hypothesis testing with a correlation coefficient is that the true relation between the two variables in the population is no corre- lation ( ).
7. The assumptions for the significance test of a correlation coefficient are that the population of each variable (and the relationship between them) follows a nor- mal distribution, and that there is an equal distribution of each variable at each point of the other variable.
8. Correlation does not tell you the direction of causation. If two variables, X and Y, are correlated, the correlation could be because X is causing Y, Y is causing X, or a third factor is causing both X and Y.
9. Comparisons of the degree of linear correlation are considered most accurate in terms of the correlation coefficient squared ( ), called the proportionate reduc- tion in error or proportion of variance accounted for.
10. A correlation coefficient will be lower (closer to 0) than the true correlation if it is based on scores from a group selected for study that is restricted in its range of scores (compared to people in general) or if the scores are based on unreliable measures.
11. The direction and strength of a correlation can be drastically distorted by extreme combinations of scores called outliers.
12. Spearman’s rho is a special type of correlation based on rank-order scores. It can be used in certain situations when the scatter diagram suggests a curvilinear re- lationship between two variables. Spearman’s rho is less affected than the regu- lar correlation by outliers, and it works correctly even if the original scores are not based on true equal-interval measurement.
13. The correlation itself is a measure of effect size. Power and needed sample size for 80% power for a correlation coefficient can be determined using special power tables, a power software package, or an Internet power calculator.
14. Studies suggest that psychologists tend to think of any particular correlation coefficient as meaning more association than actually exists. However, small
r2
r = 0
r = -1-1 r = +1+1
+ or -
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 471
correlations may have practical importance and may also be impressive in demon- strating the importance of a relationship when a study shows that the correlation holds even under what would seem to be unlikely conditions.
15. Correlational results are usually presented in research articles either in the text with the value of r (and usually the significance level) or in a special table (a correla- tion matrix) showing the correlations among several variables.
correlation (p. 433) scatter diagram (p. 434) linear correlation (p. 437) curvilinear correlation (p. 437) no correlation (p. 439) positive correlation (p. 439)
negative correlation (p. 439) product of deviation scores (p. 443) correlation coefficient (p. 446) direction of causality (p. 456) correlational research
design (p. 458)
proportionate reduction in error (p. 459)
restriction in range (p. 459) outliers (p. 462) Spearman’s rho (p. 463) correlation matrix (p. 468)
Key Terms
Making a Scatter Diagram and Describing the General Pattern of Association Based on the class size and average achievement test scores for five elementary schools in the following table, make a scatter diagram and describe in words the general pat- tern of association.
Example Worked-Out Problems
Elementary School Class Size Achievement Test Score
Main Street 25 80 Casat 14 98 Harland 33 50 Shady Grove 28 82 Jefferson 20 90
Answer The steps in solving the problem follow; Figure 11–19 shows the scatter diagram with markers for each step.
❶ Draw the axes and decide which variable goes on which axis. It seems more reasonable to think of class size as predicting achievement test scores rather than the other way around. Thus, you can draw the axis with class size along the bottom. (However, the prediction was not explicitly stated in the problem; so the other direction of prediction is certainly possible. Thus, putting either vari- able on either axis would be acceptable.)
❷ Determine the range of values to use for each variable and mark them on the axes. We will assume that the achievement test scores go from 0 to 100. We don’t know the maximum class size; so we guessed 50. (The range of the variables
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
472 Chapter 11
was not given in the problem; thus any reasonable range would be acceptable as long as it includes the values of the scores in the actual study.)
❸ Mark a dot for each pair of scores. For example, to mark the dot for Main Street School, you go across to 25 and up to 80.
The general pattern is roughly linear. Its direction is negative (it goes down and to the right, with larger class sizes going with smaller achievement scores and vice versa). It is a quite large correlation, since the dots all fall fairly close to a straight line; it should be fairly close to –1. In words, it is a large, linear, negative correlation.
Figuring the Correlation Coefficient Figure the correlation coefficient for the class size and achievement test in the preced- ing example.
Answer You can figure the correlation using either the formula or the steps. The basic figur- ing is shown in Table 11–11 with markers for each of the steps.
Using the formula,
Using the steps,
❶ Change the scores for each variable to deviation scores. The mean of the class size is 24. Thus, the first class size score of 25 gives a deviation score of
. Using the same procedure for all the other scores gives the devi- ation scores shown in the and columns in Table 11–11.
❷ Figure the product of the deviation scores for each pair of scores. For the first school, multiply 1 by 0 to give 0. The products of deviation scores for all the scores are shown in the last column of Table 11–11.
Y-MYX- MX 25 - 24 = 1
r = g3(X-MX)(Y-MY)4
2(SSX)(SSY) =
-482 533.10
= - .90
100
90
80
70
60
50
40
30
20
10
A ch
ie ve
m en
t T es
t S co
re
0 5 10 15 20 25 30 35 40
Class Size
45 50
❸
❶
❷
❶
❷
Figure 11–19 Scatter diagram for scores in Example Worked-Out Problem. ❶ Draw the axes and decide which variable goes on which axis. ❷ Determine the range of values to use for each variable and mark them on the axes. ❸ Mark a dot for each pair of scores.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 473
❸ Add up all the products of the deviation scores. Adding up all the products of the deviation scores, as shown in Table 11–11, gives a sum of –482.
❹ For each variable, square each deviation score. For the first school, the squared deviation for the class size variable is 1 multiplied by 1, which is 1. The squared deviation scores for all the scores are shown in the and columns of Table 11–11.
➎ Add up the squared deviation scores for each variable. As shown in Table 11–11, the sum of squared deviations for the class size variable is 214 and the sum of squared deviations for the achievement test score variable is 1,328.
➏ Multiply the two sums of squared deviations and take the square root of the result. Multiplying 214 by 1,328 is 284,192 and the square root of 284,192 is 533.10.
❼ Divide the sum of the products of deviation scores from Step ❸ by the correction number from Step ➏. Dividing by 533.10 gives a result of
. This is the correlation coefficient.
Figuring the Significance of a Correlation Coefficient Figure whether the correlation between class size and achievement test score in the preceding example is statistically significant (use the .05 level, two-tailed).
Answer ❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are two populations:
Population 1: Schools like those in this study. Population 2: Schools for whom there is no correlation between the two variables.
- .90 -482
(Y - MY)2(X - MX)2
Table 11–11 Figuring the Correlation Coefficient Between Class Size and Achievement Test Score for the Example Worked-Out Problem
Class Size (X ) Achievement Test Score (Y )
Deviation ❶ Deviation
Squared ❹ Deviation ❶ Deviation
Squared ❹ Products of ❷
Deviation Scores
X Y
25 1 1 80 0 0 0
14 100 98 18 324
33 9 81 50 900
28 4 16 82 2 4 8
20 16 90 10 100
❺ ❺ ❸
❼
❻
r = ©3(X - MX )(Y - MY )4
2(SSX)(SSY ) =
-482
2(214)(1328) =
-482 533.10
= - .90
M = 80M = 24
© = -482© = SSY = 1328© = 400© = SSX = 214© = 120 -40-4
-270-30 -180-10
(X � MX )(Y � MY )(Y � MY ) 2Y � MY(X � MX )
2X � MX
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
474 Chapter 11
The null hypothesis is that the two populations have the same correlation. The research hypothesis is that the two populations do not have the same correlation.
❷ Determine the characteristics of the comparison distribution. The comparison distribution is a t distribution with . (That is,
❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. The t table (Table A–2 in the Appendix) shows that for a two-tailed test at the .05 level, with 3 degrees of freedom, the cut- off t scores are 3.182 and .
❹ Determine your sample’s score on the comparison distribution. The correlation in the study was –.90. Applying the formula to find the equivalent t, we get
❺ Decide whether to reject the null hypothesis. The t score of for our sample correlation is more extreme than a cutoff t score of . Thus, we can reject the null hypothesis and the research hypothesis is supported.
Outline for Writing Essays on the Logic and Figuring of a Correlation Coefficient
1. If the question involves creating a scatter diagram, explain how and why you cre- ated the diagram to show the pattern of relationship between the two variables. Explain the meaning of the term correlation. Mention the type of correlation (e.g., linear; positive or negative; small, moderate, or large) shown by the scat- ter diagram.
2. Explain the idea that a correlation coefficient tells you the direction and strength of linear correlation between two variables.
3. Outline and explain the steps for figuring the correlation coefficient. Be sure to mention that the first step involves changing the scores for each variable to de- viation scores. Describe how to figure the product of the deviation scores. Explain why the product of deviation scores will tend to be positive if the correlation is positive and will tend to be negative if the correlation is negative. Explain the two reasons why it is necessary to use a correction number to adjust the sum of the products of deviation scores. Describe how that correction number is figured and how it acts to adjust the sum of the products of deviation scores. Explain what the value of the correlation coefficient means in terms of the direction and strength of linear correlation.
4. Be sure to discuss the direction and strength of correlation of your particular re- sult. As needed for the specific question you are answering, discuss whether the correlation is statistically significant.
-3.182 -3.58
t = r
2(1 - r2)>(N-2) =
- .90 2(1 - (- .902))>(3)
= - .90 2.0633
= -3.58.
-3.182
df = N - 2 = 5 - 2 = 3.)df = 3
Practice Problems
These problems involve figuring. Most real-life statistics problems are done on a com- puter with special statistical software. Even if you have such software, do these prob- lems by hand to ingrain the method in your mind. To learn how to use a computer to solve statistics problems like those in this chapter, refer to the Using SPSS section
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 475
at the end of this chapter and the Study Guide and Computer Workbook that accompanies this text.
All data are fictional unless an actual citation is given.
Set I (for Answers to Set I Problems, see pp. 690–692) 1. For each of the following scatter diagrams, indicate whether the pattern is
linear, curvilinear, or no correlation; if it is linear, indicate whether it is posi- tive or negative and the approximate strength (large, moderate, small) of the correlation.
(a)
(c)
(e)
(b)
(d)
(f)
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
476 Chapter 11
(a) Make a scatter diagram of the scores; (b) describe in words the general pat- tern of correlation, if any; (c) figure the correlation coefficient; (d) figure whether the correlation is statistically significant (use the .05 significance level, two- tailed); (e) explain the logic of what you have done, writing as if you are speak- ing to someone who has never heard of correlation (but who does understand the mean, deviation scores, and hypothesis testing); and (f) give three logically pos- sible directions of causality, saying for each whether it is a reasonable direction in light of the variables involved (and why).
4. In a study of people first getting acquainted with each other, researchers exam- ined the amount of self-disclosure of one’s partner and one’s liking for one’s part- ner. Here are the results:
Pair Number Therapist Empathy Patient Satisfaction
1 70 4 2 94 5 3 36 2 4 48 1
(a) Make a scatter diagram of the scores; (b) describe in words the general pat- tern of correlation, if any; (c) figure the correlation coefficient; (d) figure whether the correlation is statistically significant (use the .05 significance level, two- tailed); (e) explain the logic of what you have done, writing as if you are speak- ing to someone who has never heard of correlation (but who does understand the mean, deviation scores, and hypothesis testing); and (f) give three logically pos- sible directions of causality, saying for each whether it is a reasonable direction in light of the variables involved (and why).
3. An instructor asked five students how many hours they had studied for an exam. Here are the hours studied and the students’ grades:
Hours Studied Test Grade
0 52 10 95
6 83 8 71 6 64
Partner’s Self-Disclosure Liking for Partner
8 7 7 9
10 6 3 7 1 4
2. A researcher studied the relation between psychotherapists’ degree of empathy and their patients’ satisfaction with therapy. As a pilot study, four patient–therapist pairs were studied. Here are the results:
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 477
(a) Make a scatter diagram of the scores; (b) describe in words the general pattern of correlation, if any; (c) figure the correlation coefficient; and (d) figure whether the correlation is statistically significant (use the .05 significance level, two-tailed).
5. The following have been prepared so that data sets B through D are slightly mod- ified versions of data set A. For each data set, (a) make a scatter diagram, (b) fig- ure the correlation coefficient, and (c) figure whether the correlation is statistically significant (use the .05 significance level, two-tailed).
Data Set A Data Set B Data Set C Data Set D
X Y X Y X Y X Y
1 1 1 1 1 5 1 1 2 2 2 2 2 2 2 4 3 3 3 3 3 3 3 3 4 4 4 5 4 4 4 2 5 5 5 4 5 1 5 5
6. For each of the following situations, indicate why the correlation coefficient might be a distorted estimate of the true correlation (and what kind of distortion you would expect): (a) Scores on two questionnaire measures of personality are correlated. (b) Comfort of living situation and happiness are correlated among a group of
millionaires. 7. What is the power of each of the following studies using a correlation coefficient
and the .05 significance level?
Effect Size (r ) N Tails
(a) .10 50 2 (b) .30 100 1 (c) .50 30 2 (d) .30 40 1 (e) .10 100 2
8. About how many participants are needed for 80% power in each of the follow- ing planned studies that will use a correlation coefficient and the .05 significance level?
Effect Size (r ) Tails
(a) .50 2 (b) .30 1 (c) .10 2
9. Chapman et al. (1997) interviewed 68 inner city pregnant women and their hus- bands (or boyfriends) twice during their pregnancy, once between three and six months into the pregnancy and again between six and nine months into the preg- nancy. Table 11–12 shows the correlations among several of their measures. (“Zero-Order Correlations” means the same thing as ordinary correlations.) Most important in this table are the correlations among women’s reports of their own stress, men’s reports of their partners’ stress, women’s perception of their partners’ support at the first and at the second interviews, and women’s depression at the first and at the second interviews.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
478 Chapter 11
Table 11–12 Zero-Order Correlations for Study Variables
Variable 1 2 3 4 5 6 7 8 9 10
1. Women’s report of stress —
2. Men’s report of women’s stress .17 —
3. Partner Support 1 * —
4. Partner Support 2 * .44*** —
5. Depressed Mood 1 .23* .10 ** —
6. Depressed Mood 2 .50*** .14 *** *** .55*** —
7. Women’s age .06 .16 .04 * * —
8. Women’s ethnicity .11 .13 —
9. Women’s marital status .01 .12 .24* .05 ** —
10. Parity .19 .13 .10 .16 .26* .31* —
* , ** , *** .
Source: Chapman, H. A., Hobfoll, S. E., & Ritter, C. (1997). Partners’ stress underestimations lead to women’s distress: A study of pregnant inner-city women. Journal of Personality and Social Psychology, 73, 418–425. Published by the American Psychological Association. Reprinted with permission.
p 6 .001p 6 .01p 6 .05
- .12- .17- .11 - .34- .20- .04- .18
- .02- .14- .16- .09- .19 - .09- .35- .24
- .41- .42 - .17- .34
- .18- .27 - .18- .28
Explain the results on these measures as if you were writing to a person who has never had a course in statistics. Specifically, (a) explain what is meant by a correlation coefficient using one of the correlations as an example; (b) study the table and then comment on the patterns of results in terms of which vari- ables are relatively strongly correlated and which are not very strongly corre- lated; and (c) comment on the limitations of making conclusions about the direction of causality based on these data, using a specific correlation as an ex- ample (noting at least one plausible alternative causal direction and why that alternative is plausible).
Set II 10. For each of the following scatter diagrams, indicate whether the pattern is lin-
ear, curvilinear, or no correlation; if it is linear, indicate whether it is positive or negative and the approximate strength (large, moderate, small) of the correlation.
(a) (b)
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 479
11. Make up a scatter diagram with 10 dots for each of the following situations: (a) perfect positive linear correlation, (b) large but not perfect positive linear correlation, (c) small positive linear correlation, (d) large but not perfect negative linear correlation, (e) no correlation, (f) clear curvilinear correlation.
For problems 12 to 14, do the following: (a) Make a scatter diagram of the scores; (b) describe in words the general pattern of correlation, if any; (c) figure the correlation coefficient; (d) figure whether the correlation is statistically sig- nificant (use the .05 significance level, two-tailed); (e) explain the logic of what you have done, writing as if you are speaking to someone who has never heard of correlation (but who does understand the mean, deviation scores, and hypoth- esis testing); and (f) give three logically possible directions of causality, indicat- ing for each direction whether it is a reasonable explanation for the correlation in light of the variables involved (and why).
12. Four research participants take a test of manual dexterity (high scores mean better dex- terity) and an anxiety test (high scores mean more anxiety). The scores are as follows.
(c)
(e)
(d)
(f)
Person Dexterity Anxiety
1 1 10 2 1 8 3 2 4 4 4 -2
13. Four young children were monitored closely over a period of several weeks to measure how much they watched violent television programs and their amount of violent behavior toward their playmates. The results were as follows:
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
480 Chapter 11
16. A developmental psychologist studying people in their eighties was interested in the relation between number of very close friends and overall health. The scores for six research participants follow.
Weekly Viewing of Number of Violent or Aggressive Child’s Code Number Violent TV (hours) Acts Toward Playmates
G3368 14 9 R8904 8 6 C9890 6 1 L8722 12 8
Student Family Goal Work Goal
A 7 5 B 6 4 C 8 2 D 3 9 E 4 1
14. Five college students were asked about how important a goal it is to them to have a family and about how important a goal it is for them to be highly successful in their work. Each variable was measured on a scale from 0, not at all important goal to 10, very important goal.
For problems 15 and 16, (a) make a scatter diagram of the scores; (b) describe in words the general pattern of correlation, if any; (c) figure the correlation coeffi- cient; and (d) figure whether the correlation is statistically significant (use the .05 significance level, two-tailed).
15. The Louvre Museum is interested in the relation of the age of a painting to pub- lic interest in it. The number of people stopping to look at each of 10 randomly selected paintings is observed over a week. The results are as shown:
Painting Title Approximate Age (Years) X Number of People Stopping to Look Y
The Entombment 465 68 Mys Mar Sainte Catherine 515 71 The Bathers 240 123 The Toilette 107 112 Portrait of Castiglione 376 48 Charles I of England 355 84 Crispin and Scapin 140 66 Nude in the Sun 115 148 The Balcony 122 71 The Circus 99 91
Research Participant Number of Friends X Overall Health Y
A 2 41 B 4 72 C 0 37 D 3 84 E 2 52 F 1 49
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 481
Effect Size (r ) N Tails
(a) .10 30 1 (b) .30 40 2 (c) .50 50 2 (d) .30 100 2 (e) .10 20 1
Effect Size (r ) Tails
(a) .10 1 (b) .30 2 (c) .50 1
17. What is the power of each of the following studies using a correlation coefficient and the .05 significance level?
18. About how many participants are needed for 80% power in each of the follow- ing planned studies that will use a correlation coefficient and the .05 significance level?
19. As part of a larger study, Speed and Gangstead (1997) collected ratings and nominations on a number of characteristics for 66 fraternity men from their fel- low fraternity members. The following paragraph is taken from their Results section:
. . . men’s romantic popularity significantly correlated with several characteris- tics: best dressed ( ), most physically attractive ( ), most outgo- ing ( ), most self-confident ( ), best trendsetters ( ), funniest ( ), most satisfied ( ), and most independent ( ). Unexpectedly, however, men’s potential for financial success did not signifi- cantly correlate with romantic popularity ( ). (p. 931)
Explain these results as if you were writing to a person who has never had a course in statistics. Specifically, (a) explain what is meant by a correlation coef- ficient using one of the correlations as an example; (b) explain in a general way what is meant by “significantly” and “not significantly,” referring to at least one specific example; and (c) speculate on the meaning of the pattern of results, tak- ing into account the issue of direction of causality.
20. Gable and Lutz (2000) studied 65 children, 3 to 10 years old, and their parents. One of their results was “Parental control of child eating showed a negative association with children’s participation in extracurricular activities (
)” (p. 296). Another result was “Parents who held less appropriate beliefs about children’s nutrition reported that their children watched more hours of tele- vision per day ( )” (p. 296). Explain these results as if you were writing to a person who has never had a course in statistics. Be sure to comment on possible directions of causality for each result.
21. Table 11–13 is from a study by Baldwin and colleagues (2006) that examined the associations among feelings of shame, guilt, and self-efficacy in a sample of 194 college students. Self-efficacy refers to people’s beliefs about their abil- ity to be successful at various things they may try to do. (For example, the stu- dents indicated how much they agreed with statements such as, “When I make
r = .36; p 6 .01
p 6 .01 r = .34;
r = .10
r = .28r = .32r = .37 r = .38r = .44r = .47
r = .47r = .48
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
482 Chapter 11
plans, I am certain I can make them work.”) Table 11-13 shows the correla- tions among the questionnaire measures of shame, guilt, general self-efficacy, social self-efficacy, and total self-efficacy (general self-efficacy plus social self-efficacy).
Explain the results as if you were writing to a person who has never had a course in statistics. Specifically, (a) explain what is meant by a correlation coef- ficient using one of the correlations as an example; (b) study the table and then comment on the patterns of results in terms of which variables are relatively strongly correlated and which are not very strongly correlated; and (c) comment on the limitations of making conclusions about the direction of causality based on these data, using a specific correlation as an example (noting at least one plau- sible alternative causal direction and why that alternative is plausible).
22. Arbitrarily select eight people from your class. Do each of the following: (a) Make a scatter diagram for the relation between the number of letters in each person’s first and last name; (b) figure the correlation coefficient for the relation between the number of letters in each person’s first and last name; (c) figure whether the correlation is statistically significant (use the .05 significance level, two-tailed); (d) describe the result in words; and (e) suggest a possible interpre- tation for your results.
Using SPSS
The U in the following steps indicates a mouse click. (We used SPSS version 15.0 for Windows to carry out these analyses. The steps and output may be slightly differ- ent for other versions of SPSS.)
In the following steps for the scatter diagram and correlation coefficient, we will use the example of the sleep and happy mood study. The scores for that study are shown in Table 11–1 on p. 435, the scatter diagram is shown in Figure 11–2 on p. 435, and the figuring for the correlation coefficient and its significance is shown in Table 11–3 on p. 449.
Creating a Scatter Diagram ❶ Enter the scores into SPSS. Enter the scores as shown in Figure 11–20. ❷ U Graphs.
Table 11–13 Correlations Among Shame, Guilt, and Self-Efficacy Subscales
1 2 3 4 5
1. Shame
2. Guilt .34**
3. General Self-efficacy ** .12
4. Social Self-efficacy * .47**
5. Total Self-efficacy ** .07 .94** .74**
* , ** . For all correlations, n is between 184 and 190.
Source: Baldwin, K. M., Baldwin, J. R., & Ewald, T. (2006). The relationship among shame, guilt, and self-efficacy. American Journal of Psychotherapy, 60, 1–21. Copyright © 2006 by The Association for the Advancement of Psychotherapy. Reprinted by permission of the publisher.
p 6 .001p 6 .01
- .29 - .06- .18
- .29
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 483
❸ U Legacy/Dialogs, U Scatter/Dot. A box will appear that allows you to select different types of scatter diagrams. You want the “Simple Scatter” diagram. This is selected as the default type of diagram; so you just need to U Define.
❹ U the variable called “mood” and then U the arrow next to the box labeled “Y axis.” This tells SPSS that the scores for the “mood” variable should go on the vertical (or Y) axis of the scatter diagram. U the variable called “sleep” and then U the arrow next to the box labeled “X axis.” This tells SPSS that the scores for the “sleep” variable should go on the horizontal (or X) axis of the scatter diagram.
❺ U OK. Your SPSS output window should look like Figure 11–21.
Finding the Correlation Coefficient ❶ Enter the scores into SPSS. Enter the scores as shown in Figure 11–20. ❷ U Analyze. ❸ U Correlate. ❹ U Bivariate. ❺ U on the variable called “mood” and then U the arrow next to the box labeled
“Variables.” U on the variable called “sleep” and then U the arrow next to the box labeled “Variables.” This tells SPSS to figure the correlation between the “mood” and “sleep” variables. (If you wanted to find the correlation between each of several variables, you would put all of them into the “Variables” box.) No- tice that by default SPSS carries out a Pearson correlation (the type of correlation you have learned in this chapter), gives the significance level using a two-tailed test, and flags statistically significant correlations using the .05 significance level. (Clicking the box next to “Spearman” requests Spearman’s rho, which is a spe- cial type of correlation we briefly discussed earlier in the chapter.)
Figure 11–20 SPSS data editor window for the fictional study of the relationship be- tween hours slept last night and mood.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
484 Chapter 11
❻ U OK. Your SPSS output window should look like Figure 11–22.
The table shown in Figure 11-22 is a small correlation matrix (there are only two variables). (If you were interested in the correlations among more than two vari- ables—which is often the case in psychology research—SPSS would produce a larger correlation matrix.) The correlation matrix shows the correlation coefficient (“Pear- son Correlation”), the exact significance level of the correlation coefficient [“Sig. (2-tailed)”], and the number of people in the correlation analysis (“N”). Note that two of the cells of the correlation matrix show a correlation coefficient of exactly 1. You can ignore these cells; they simply show that each variable is perfectly correlated with
Figure 11–21 An SPSS scatter diagram showing the relationship between hours slept last night and mood (fictional data).
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 485
itself. (In larger correlation matrixes all of the cells on the diagonal from the top left to the bottom right of the table will have a correlation coefficient of 1.) You will also notice that the remaining two cells provide identical information. This is because the table shows the correlations between sleep and mood and also between mood and sleep (which are, of course, identical correlations). So you can look at either one. (In a larger correlation matrix, you need only look either at all of the correlations above the diagonal that goes from top left to bottom right or at all of the correlations below that diagonal.) The correlation coefficient is .853 (which is usually rounded to two dec- imal places in research articles). The significance level of .031 is less than our .05 cutoff, which means that it is a statistically significant correlation. The asterisk (*) by the correlation of .853 also shows that it is statistically significant (at the .05 signifi- cance level, as shown by the note under the table).
Figure 11–22 SPSS output window for the correlation between hours slept and mood (fictional data).
1. There is also a “computational” version of this formula that is mathematically equivalent and thus gives the same result:
This formula is easier to use when computing by hand (or with a hand calcula- tor) when you have a large number of people in the study, because you don’t have
r = Ng (XY) - (gX)(gY)
23NgX2 - (gX)2423NgY2 - (gY)24
Chapter Notes
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
486 Chapter 11
to first figure out all the deviation scores. However, researchers rarely use com- putational formulas like this any more because the actual figuring is done by a computer. As a student learning statistics, it is much better to use the definitional formula (11–1). This is because when solving problems using the definitional formula, you are strengthening your understanding of what the correlation coef- ficient means. In all examples in this chapter, we use the definitional formula and we urge you to use it in doing the chapter’s practice problems.
2. As we noted in Chapter 3, statisticians usually use Greek letters to denote a pop- ulation parameter. The population parameter for a correlation is � (rho). However, for ease of learning (and to avoid potential confusion with a term we introduce later in the chapter) we use the ordinary letter r for both the correlation you fig- ure from a sample and the correlation in a population.
3. More complete tables are provided in Cohen (1988, pp. 84–95). 4. More complete tables are provided in Cohen (1988, pp. 101–102). 5. To figure the correlation between getting a heart attack and taking aspirin, you
would have to make the two variables into numbers. For example, you could make getting a heart attack equal 1 and not getting a heart attack equal 0; simi- larly, you could make being in the aspirin group equal 1 and being in the control group equal 0. It would not matter which two numbers you used for the two val- ues for each variable. Whichever two numbers you use, the result will come out the same after converting to deviation scores and using the correction number. The only difference that the two numbers you use makes is that the value that gets the higher number determines whether the correlation will be positive or negative.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.