stats hw due Sat 11:59pm

profileSchoolDaze
stat354_ch4b.pdf

O Z Scores 68

O The Normal Curve 73

O Sample and Population 83

O Probability 88

O Controversies: Is the Normal Curve Really So Normal? and Using Nonrandom Samples 93

• Z Scores, Normal Curves, Samples and Populations, and Probabilities in Research Articles 95

O Advanced Topic: Probability Rules and Conditional Probabilities 96

O Summary 97

• Key Terms 98

O Example Worked-Out Problems 99

O Practice Problems 102

O Using SPSS 105

O Chapter Notes 106

CHAPTER 3

Some Key Ingredients for Inferential Statistics

Z Scores, the Normal Curve, Sample versus Population, and Probability

Chapter Outline

IMETII'M'Ir919W1191.7P9MTIPlw

0 rdinarily, psychologists conduct research to test a theoretical principle or the effectiveness of a practical procedure. For example, a psychophysiologist might measure changes in heart rate from before to after solving a difficult problem. The measurements are then used to test a theory predicting that heart rate should change following successful problem solving. An applied social psychologist might examine

Before beginning this chapter, be sure you have mastered the mater- ial in Chapter 1 on the shapes of distributions and the material in Chapter 2 on the mean and stan- dard deviation.

67

68 Chapter 3

Z score number of standard deviations that a score is above (or below, if it is negative) the mean of its distribution; it is thus an ordinary score transformed so that it better describes the score's location in a distribution.

the effectiveness of a program of neighborhood meetings intended to promote water conservation. Such studies are carried out with a particular group of research partici- pants. But researchers use inferential statistics to make more general conclusions about the theoretical principle or procedure being studied. These conclusions go beyond the particular group of research participants studied.

This chapter and Chapters 4, 5, and 6 introduce inferential statistics. In this chapter, we consider four topics: Z scores, the normal curve, sample versus popula- tion, and probability. This chapter prepares the way for the next ones, which are more demanding conceptually.

Z Scores In Chapter 2, you learned how to describe a group of scores in terms and the mean and variation around the mean. In this section you learn how to describe a particular score in terms of where it fits into the overall group of scores. That is, you learn how to use the mean and standard deviation to create a Z score; a Z score describes a score in terms of how much it is above or below the average.

Suppose you are told that a student, Jerome, is asked the question, "To what extent are you a morning person?" Jerome responds with a 5 on a 7-point scale, where 1 = not at all and 7 = extremely. Now suppose that we do not know anything about how other students answer this question. In this situation, it is hard to tell whether Jerome is more or less of a morning person in relation to other students. However, suppose that we know for students in general, the mean rating (M) is 3.40 and the standard deviation (SD) is 1.47. (These values are the actual mean and standard deviation that we found for this question in a large sample of statistics students from eight different universities across the United States and Canada.) With this knowledge, we can see that Jerome is more of a morning person than is typical among students. We can also see that Jerome is above the average (1.60 units more than average; that is, 5 — 3.40 = 1.60) by a bit more than students typically vary from the average (that is, students typically vary by about 1.47, the standard deviation). This is all shown in Figure 3-1.

What Is a Z Score? A Z score makes use of the mean and standard deviation to describe a particular score. Specifically, a Z score is the number of standard deviations the actual score is above or below the mean. If the actual score is above the mean, the Z score is posi- tive. If the actual score is below the mean, the Z score is negative. The standard deviation now becomes a kind of yardstick, a unit of measure in its own right.

In our example, Jerome has a score of 5, which is 1.60 units above the mean of 3.40. One standard deviation is 1.47 units; so Jerome's score is a little more than 1 standard

SD SD SD SD ",!< >l< )-1-4 >l<

.4() 1.93 3.40 4.87 6.34

t Mean Jerome's

score (5)

Figure 3-1 Score of one student, Jerome, in relation to the overall distribution on the measure of the extent to which students are morning people.

David

Ryan

Z score: —3 —2 —1 0 +1 +2 +3

Times spoken per hour: 0 4 8 12 16 20 24

Some Key Ingredients for Inferential Statistics 69

Z score: —2 —1 0 +1 +2

Raw score: .46 1.93 3.40 4.87 6.34

Figure 3-2 Scales of Z scores and raw scores for the example of the extent to which students are morning people.

deviation above the mean. To be precise, Jerome's Z score is +1.09 (that is, his score of 5 is 1.09 standard deviations above the mean). Another student, Michelle, has a score of 2. Her score is 1.40 units below the mean. Therefore, her score is a little less than 1 stan- dard deviation below the mean (a Z score of -.95). So, Michelle's score is below the average by about as much as students typically vary from the average.

Z scores have many practical uses. As you will see later in this chapter, they are es- pecially useful for showing exactly where a particular score falls on the normal curve.

Z Scores as a Scale Figure 3-2 shows a scale of Z scores lined up against a scale of raw scores for our example of the degree to which students are morning people. A raw score is an ordi- nary score as opposed to a Z score. The two scales are something like a ruler with inches lined up on one side and centimeters on the other.

Changing a number to a Z score is a bit like converting words for measurement in various obscure languages into one language that everyone can understand—inches, cubits, and zingles (we made up that last one), for example, into centimeters. It is a very valuable tool.

Suppose that a developmental psychologist observed 3-year-old David in a lab- oratory situation playing with other children of the same age. During the observa- tion, the psychologist counted the number of times David spoke to the other children. The result, over several observations, is that David spoke to other children about 8 times per hour of play. Without any standard of comparison, it would be hard to draw any conclusions from this. Let's assume, however, that it was known from pre- vious research that under similar conditions, the mean number of times children speak is 12, with a standard deviation of 4. With that information, we can see that David spoke less often than other children in general, but not extremely less often. David would have a Z score of -1 (M = 12 and SD = 4, thus a score of 8 is 1 SD below Al), as shown in Figure 3-3.

Suppose Ryan was observed speaking to other children 20 times in an hour. Ryan would clearly be unusually talkative, with a Z score of +2 (see Figure 3-3). Ryan speaks not merely more than the average but more by twice as much as children tend to vary from the average!

raw score ordinary score (or any num- ber in a distribution before it has been made into a Z score or otherwise trans- formed).

Figure 3-3 Number of times each hour that two children spoke, shown as raw scores and Z scores.

Chapter 3

Formula to Change a Raw Score to a Z Score A Z score is the number of standard deviations by which the raw score is above or below the mean. To figure a Z score, subtract the mean from the raw score, giving the deviation score. Then divide the deviation score by the standard deviation. The formula is

A Z score is the raw score minus the mean, divided by the standard deviation.

X — M Z =

SD (3-1)

The raw score is the Z score multiplied by the standard deviation, plus the mean.

For example, using the formula for David, the child who spoke to other children 8 times in an hour (where the mean number of times children speak is 12 and the standard deviation is 4),

8-12 —4 Z=

4 4

Steps to Change a Raw Score to a Z Score O Figure the deviation score: subtract the mean from the raw score. • Figure the Z score: divide the deviation score by the standard deviation.

Using these steps for David, the child who spoke with other children 8 times in an hour,

O Figure the deviation score: subtract the mean from the raw score. 8 — 12 = —4.

@ Figure the Z score: divide the deviation score by the standard deviation. —4/4 = —1.

Formula to Change a Z Score to a Raw Score To change a Z score to a raw score, the process is reversed: multiply the Z score by the standard deviation and then add the mean. The formula is

X = (Z) (S D) + M (3-2)

Suppose a child has a Z score of 1.5 on the number of times spoken with another child during an hour. This child is 1.5 standard deviations above the mean. Because the standard deviation in this example is 4 raw score units (times spoken), the child is 6 raw score units above the mean, which is 12. Thus, 6 units above the mean is 18. Using the formula,

X = (Z)(SD) + M = (1.5)(4) + 12 = 6 + 12 = 18

Steps to Change a Z Score to a Raw Score O Figure the deviation score: multiply the Z score by the standard deviation. @ Figure the raw score: add the mean to the deviation score.

Using these steps for the child with a Z score of 1.5 on the number of times spoken with another child during an hour:

O Figure the deviation score: multiply the Z score by the standard deviation. 1.5 X 4 = 6.

@ Figure the raw score: add the mean to the deviation score. 6 + 12 = 18.

= —1

(1.00) Student 2

Z score: -2 i' -1 0 +1 I I I I

Raw score: .46 1.93 3.40 4.87

(6.00) Student I

1 +2 I

6.34

(2.00) (10.00) Student 2 Student 1

1 1 Z score: -3 -2 -1 0 +1 +2 +3

I F I I I I I Stress rating: -1.25 1.31 3.87 6.43 8.99 11.55 14.11

Some Key Ingredients for Inferential Statistics 71

Figure 3-4 Scales of Z scores and raw scores for the example of the extent to which students are morning people, showing the scores of two sample students.

Additional Examples of Changing Z Scores to Raw Scores and Vice Versa Consider again the example from the start of the chapter in which students were asked the extent to which they were a morning person. Using a scale from 1 (not at all) to 7 (extremely), the mean was 3.40 and the standard deviation was 1.47. Sup- pose a student's raw score is 6. That student is well above the mean. Specifically, using the formula,

X - M 6 - 3.40 2.60 Z = = 1.77

SD 1.47 1.47

That is, the student's raw score is 1.77 standard deviations above the mean (see Figure 3-4, Student 1). Using the 7-point scale (from 1 = not at all to 7 = extremely), to what extent are you a morning person? Now figure the Z score for your raw score.

Another student has a Z score of -1.63, a score well below the mean. (This stu- dent is much less of a morning person than is typically the case for students.) You can find the exact raw score for this student using the formula

X = (Z)(SD) + M = (-1.63)(1.47) + 3.40 = -2.40 + 3.40 = 1.00

That is, the student's raw score is 1.00 (see Figure 3-4, Student 2). Let's also consider some examples from the study of students' stress ratings.

The mean stress rating of the 30 statistics students (using a 0-10 scale) was 6.43 (see Figure 2-3), and the standard deviation was 2.56. Figure 3-5 shows the raw score and Z score scales. Suppose a student's stress raw score is 10. That student is well above the mean. Specifically, using the formula

X - M 10 - 6.43 3.57 Z

- = 1.39

SD 2.56 2.56

Figure 3-5 Scales of Z scores and raw scores for 30 statistics students' ratings of their stress level, showing the scores of two sample students. (Data based on Aron et al., 1995.)

72 Chapter 3

The student's stress level is 1.39 standard deviations above the mean (see Figure 3-5, Student 1). On a scale of 0-10, how stressed have you been in the last TA weeks? Figure the Z score for your raw stress score.

Another student has a Z score of —1.73, a stress level well below the mean. You can find the exact raw stress score for this student using the formula

X = (Z)(SD) + M = (-1.73)(2.56) + 6.43 = —4.43 + 6.43 = 2.00

That is, the student's raw stress score is 2.00 (see Figure 3-5, Student 2).

The Mean and Standard Deviation of Z Scores The mean of any distribution of Z scores is always 0. This is so because when you change each raw score to a Z score, you take the raw score minus the mean. So the mean is subtracted out of all the raw scores, making the overall mean come out to 0. In other words, in any distribution, the sum of the positive Z scores must always equal the sum of the negative Z scores. Thus, when you add them all up, you get 0.

The standard deviation of any distribution of Z scores is always 1. This is because when you change each raw score to a Z score, you divide by the standard deviation.

A Z score is sometimes called a standard score. There are two reasons: Z scores have standard values for the mean and the standard deviation, and, as we saw earlier, Z scores provide a kind of standard scale of measurement for any variable. (However, sometimes the term standard score is used only when the Z scores are for a distribu- tion that follows a normal curve.) 1

1. How is a Z score related to a raw score? 2. Write the formula for changing a raw score to a Z score, and define each of

the symbols. 3. For a particular group of scores, M = 20 and SD = 5. Give the Z score for

(a) 30, (b) 15, (c) 20, and (d) 22.5. 4. Write the formula for changing a Z score to a raw score, and define each of

the symbols. 5. For a particular group of scores, M = 10 and SD = 2. Give the raw score for

a Z score of (a) +2, (b) +.5, (c) 0, and (d) —3. 6. Suppose a person has a Z score for overall health of +2 and a Z score for

overall sense of humor of +1. What does it mean to say that this person is healthier than she is funny?

•ciownq ul abalene WOJJ. Ann Alleo!dAl eicload gonw moq Jo suaaat LAO Jownq ul abe.Jane agt anoqe sl eqs ueqi. 86EJOAE 1.1104 AJEA AIla0!PAT aidoed Lionw moq ui) gtieeq ul abe,iene eqt 8AOCIE 8.10W sl uosJad situ. .9

'17 (P) !ol. (0) (q) !i71. = + b = 0i. + (z)(z) = w + (as)(z) = x (e) •ueew NI. SI W :uon.e!A

-aPPJaPuala 01-11 a! as :WOOS Z Z :8.100S Mal NI a! X 'IN + (GS)(Z) = X 17 . S . (P) !O (0) !1- (q) Z = 9/01- = 9/(OZ - oc) = as/(o/ - x) = z (E) •E

•uop.einap pepuels agt si as !ueaw ay), si W :WOOS M8a age sl x :WOOS z a! Z 'OS/(1A1 — X) = Z 'Z

•ueew moied Jo anode si alOOS MEJ e suoileinap piepuels Jeciwnu OJOOS z y •

SJeMSUV

Some Key Ingredients for Inferential Statistics 73

The Normal Curve As noted in Chapter 1, the graphs of the distributions of many of the variables that psychologists study follow a unimodal, roughly symmetrical, bell-shaped curve. These bell-shaped smooth histograms approximate a precise and important mathe- matical distribution called the normal distribution, or, more simply, the normal curve.2 The normal curve is a mathematical (or theoretical) distribution. Re- searchers often compare the actual distributions of the variables they are studying (that is, the distributions they find in research studies) to the normal curve. They don't expect the distributions of their variables to match the normal curve perfectly (since the normal curve is a theoretical distribution), but researchers often check whether their variables approximately follow a normal curve. (The normal curve or normal distribution is also often called a Gaussian distribution after the astronomer Karl Friedrich Gauss. However, if its discovery can be attributed to anyone, it should really be to Abraham de Moivre—see Box 3-1.) An example of the normal curve is shown in Figure 3-6.

Why the Normal Curve Is So Common in Nature Take, for example, the number of different letters a particular person can remem- ber accurately on various testings (with different random letters each time). On some testings the number of letters remembered may be high, on others low, and on most somewhere in between. That is, the number of different letters a person can recall on various testings probably approximately follows a normal curve. Suppose that the person has a basic ability to recall, say, seven letters in this kind of memory task. Nevertheless, on any particular testing, the actual number re- called will be affected by various influences—noisiness of the room, the person's mood at the moment, a combination of random letters confused with a familiar name, and so on.

These various influences add up to make the person recall more than seven on some testings and less than seven on others. However, the particular combination of such influences that come up at any testing is essentially random; thus, on most testings, positive and negative influences should cancel out. The chances are not very good of all the negative influences happening to come together on a testing when none of the positive influences show up. Thus, in general, the person remem- bers a middle amount, an amount in which all the opposing influences cancel each other out. Very high or very low scores are much less common.

This creates a unimodal distribution with most of the scores near the middle and fewer at the extremes. It also creates a distribution that is symmetrical, because the number of letters recalled is as likely to be above as below the middle. Being a

normal distribution frequency distri- bution that follows a normal curve.

normal curve specific, mathematically defined, bell-shaped frequency distribu- tion that is symmetrical and unimodal; distributions observed in nature and in research commonly approximate it.

Figure 3 -6 A normal curve.

74 Chapter 3

BOX 3-1 de Moivre, the Eccentric Stranger Who Invented the Normal Curve

The normal curve is central to statistics and is the foun- dation of most statistical theories and procedures. If any one person can be said to have discovered this fundamen- tal of the field, it was Abraham de Moivre. He was a French Protestant who came to England at the age of 21 because of religious persecution in France, which in 1685 denied Protestants all their civil liberties. In England, de Moivre became a friend of Isaac Newton, who was sup- posed to have often answered questions by saying, "Ask Mr. de Moivre—he knows all that better than I do." Yet because he was a foreigner, de Moivre was never able to rise to the same heights of fame as the British-born math- ematicians who respected him so greatly.

Abraham de Moivre was mainly an expert on chance. In 1733, he wrote a "method of approximating the sum of the terms of the binomial expanded into a series." His paper essentially described the normal curve. The de- scription was only in the form of a law, however; de Moivre never actually drew the curve itself. In fact, he was not very interested in it.

Credit for discovering the normal curve is often given to Pierre Laplace, a Frenchman who stayed home; or Karl Friedrich Gauss, a German; or Thomas Simpson, an Eng- lishman. All worked on the problem of the distribution of errors around a mean, even going so far as describing the curve or drawing approximations of it. But even without drawing it, de Moivre was the first to compute the areas under the normal curve at 1, 2, and 3 standard deviations, and Karl Pearson (discussed in Chapter 13, Box 13-1), a distinguished later statistician, felt strongly that de Moivre was the true discoverer of this important concept.

In England, de Moivre was highly esteemed as a man of letters as well as of numbers, being familiar with all the classics and able to recite whole scenes from his beloved Moliere's Misanthropist. But for all his feelings for his native France, the French Academy elected him a foreign member of the Academy of Sciences just before his death. In England, he was ineligible for a university position because he was a foreigner there as well. He re- mained in poverty, unable even to marry. In his earlier years, he worked as a traveling teacher of mathematics. Later, he was famous for his daily sittings in Slaughter's Coffee House in Long Acre, making himself available to gamblers and insurance underwriters (two professions equally uncertain and hazardous before statistics were refined), who paid him a small sum for figuring odds for them.

De Moivre's unusual death generated several legends. He worked a great deal with infinite series, which always converge to a certain limit. One story has it that de Moivre began sleeping 15 more minutes each night until he was asleep all the time, then died. Another version claims that his work at the coffeehouse drove him to such despair that he simply went to sleep until he died. At any rate, in his 80s he could stay awake only four hours a day, although he was said to be as keenly intellectual in those hours as ever. Then his wakefulness was reduced to 1 hour, then none at all. At the age of 87, after eight days in bed, he failed to wake and was declared dead from "somnolence" (sleepiness).

Sources: Pearson (1978); Tankard (1984).

unimodal symmetrical curve does not guarantee that it will be a normal curve; it could be too flat or too pointed. However, it can be shown mathematically that in the long run, if the influences are truly random, and the number of different influences being combined is large, a precise normal curve will result. Mathematical statisti- cians call this principle the central limit theorem. We have more to say about this principle in Chapter 5.

The Normal Curve and the Percentage of Scores Between the Mean and 1 and 2 Standard Deviations from the Mean The shape of the normal curve is standard. Thus, there is a known percentage of scores above or below any particular point. For example, exactly 50% of the scores in a normal curve are below the mean, because in any symmetrical distribution half

34%

Z Scores —3 —2 —1 0 + +2 +3

14% 2'7(

14% 2%

Some Key Ingredients for Inferential Statistics 75

34%

Figure 3-7 Normal curve with approximate percentages of scores between the mean and 1 and 2 standard deviations above and below the mean.

the scores are below the mean. More interestingly, as shown in Figure 3-7, approxi- mately 34% of the scores are always between the mean and 1 standard deviation from the mean.

Consider IQ scores. On many widely used intelligence tests, the mean IQ is 100, the standard deviation is 16, and the distribution of IQs is roughly a normal curve (see Figure 3-8). Knowing about the normal curve and the percentage of scores between the mean and 1 standard deviation above the mean tells you that about 34% of people have IQs between 100, the mean IQ, and 116, the IQ score that is 1 stan- dard deviation above the mean. Similarly, because the normal curve is symmetrical, about 34% of people have IQs between 100 and 84 (the score that is 1 standard devi- ation below the mean), and 68% (34% + 34%) have IQs between 84 and 116.

There are many fewer scores between 1 and 2 standard deviations from the mean than there are between the mean and 1 standard deviation from the mean. It turns out that about 14% of the scores are between 1 and 2 standard deviations above the mean (see Figure 3-7). (Similarly, about 14% of the scores are between 1 and 2 standard de- viations below the mean.) Thus, about 14% of people have IQs between 116 (1 stan- dard deviation above the mean) and 132 (2 standard deviations above the mean).

You will find it very useful to remember the 34% and 14% figures. These fig- ures tell you the percentages of people above and below any particular score whenever you know that score's number of standard deviations above or below the mean. You can also reverse this approach and figure out a person's number of stan- dard deviations from the mean from a percentage. Suppose you are told that a per- son scored in the top 2% on a test. Assuming that scores on the test are approximately normally distributed, the person must have a score that is at least 2 standard deviations above the mean. This is because a total of 50% of the scores are above the mean, but 34% are between the mean and 1 standard deviation above

68 84 100 116 132

IQ Scores

Figure 3-8 Distribution of IQ scores on many standard intelligence tests (with a mean of 100 and a standard deviation of 16).

wIll17111111711111ATT71171111.1 Remember that negative Z scores are scores below the mean and positive Z scores are scores above the mean.

normal curve table table showing percentages of scores associated with the

normal curve; the table usually includes

percentages of scores between the mean

and various numbers of standard devia-

tions above the mean and percentages of

scores more positive than various num-

bers of standard deviations above the

mean.

Chapter 3

the mean, and another 14% are between 1 and 2 standard deviations above the mean. That leaves 2% of scores (that is, 50% – 34% – 14% = 2%) that are 2 standard deviations or more above the mean.

Similarly, suppose you were selecting animals for a study and needed to consider their visual acuity. Suppose also that visual acuity was normally distributed and you wanted to use animals in the middle two-thirds (a figure close to 68%) for visual acuity. In this situation, you would select animals that scored between 1 standard deviation above and 1 standard deviation below the mean. (That is, about 34% are between the mean and 1 standard deviation above the mean and another 34% are be- tween the mean and 1 standard deviation below the mean.) Also, remember that a Z score is the number of standard deviations that a score is above or below the mean— which is just what we are talking about here. Thus, if you knew the mean and the standard deviation of the visual acuity test, you could figure out the raw scores (the actual level of visual acuity) for being 1 standard deviation below and 1 standard de- viation above the mean (that is, Z scores of –1 and +1). You would do this using the methods of changing raw scores to Z scores and vice versa that you learned earlier in this chapter, which are Z = (X – M)/ SD and X = (Z)(SD) + M.

The Normal Curve Table and Z Scores The 50%, 34%, and 14% figures are important practical rules for working with a group of scores that follow a normal distribution. However, in many research and ap- plied situations, psychologists need more accurate information. Because the normal curve is a precise mathematical curve, you can figure the exact percentage of scores between any two points on the normal curve (not just those that happen to be right at 1 or 2 standard deviations from the mean). For example, exactly 68.59% of scores have a Z score between +.62 and –1.68; exactly 2.81% of scores have a Z score be- tween +.79 and +.89; and so forth.

You can figure these percentages using calculus, based on the formula for the normal curve. However, you can also do this much more simply (which you are probably glad to know!). Statisticians have worked out tables for the normal curve that give the percentage of scores between the mean (a Z score of 0) and any other Z score (as well as the percentage of scores in the tail for any Z score).

We have included a normal curve table in the Appendix (Table A-1, pp. 664– 667). Table 3-1 shows the first part of the full table. The first column in the table lists the Z score. The second column, labeled "% Mean to Z," gives the percentage of scores between the mean and that Z score. The shaded area in the curve at the top of the col- umn gives a visual reminder of the meaning of the percentages in the column. The third column, labeled "% in Tail," gives the percentage of scores in the tail for that Z score. The shaded tail area in the curve at the top of the column shows the meaning of the percentages in the column. Notice that the table lists only positive Z scores. This is because the normal curve is perfectly symmetrical. Thus, the percentage of scores between the mean and, say, a Z of +.98 (which is 33.65%) is exactly the same as the percentage of scores between the mean and a Z of –.98 (again 33.65%); and the percentage of scores in the tail for a Z score of +1.77 (3.84%) is the same as the percentage of scores in the tail for a Z score of –1.77 (again, 3.84%). Notice that for each Z score, the "% Mean to Z" value and the "% in Tail" value sum to 50.00. This is because exactly 50% of the scores are above the mean for a normal curve. For ex- ample, for the Z score of .57, the "% Mean to Z" value is 21.57% and the "% in Tail" value is 28.43%, and 21.57% + 28.43% = 50.00%.

Suppose you want to know the percentage of scores between the mean and a Z score of .64. You just look up .64 in the "Z" column of the table and the "% Mean

Some Key Ingredients for …