Statistics 200
schoolboy3013WEEK 4 HOMEWORK: LANE CHAPTER 7 AND ILLOWSKY CHAPTERS 6 AND 7
THE NORMAL DISTRIBUTION Z-TABLES ARE ATTACHED AND YOU ARE TO USE THEM RATHER THAN SOFTWARE TO SOLVE THESE PROBLEMS. (THIS IS STRAIGHT FORWARD TABLE READING.)
THIS WEEK’S CONCEPTS ARE REALLY THE HEART OF OUR COURSE. PROBABILITY (FROM LAST WEEK) IS THE DRIVING FORCE BEHIND STATISTICS (AND THEORETICAL PHYSICS – WATCH THE PBS MOVIE: “PARTICLE FEVER”).
PRINT OUT THE NORMAL DISTRIBUTION TABLES (ONE PAGE) AND REVIEW THEM AS YOU READ ON. THESE AND OTHER TABLES ARE IN OUR COURSE: COURSE CONTENT > COURSE RESOURCES > STATISTICAL RESOURCES > STANDARD NORMAL DISTRIBUTION TABLE |
THE AREAS UNDER PARTS OF THIS GRAPH ARE ALL WE ARE TRYING TO FIGURE OUT IN STATISTICS. IT’S THAT EASY.
HERE IS HOW WE DO IT:
· WE HAVE OUR DATA SET OF X-VALUES, WHICH CAN BE NUMBERS REPRESENTING HEIGHTS, WEIGHTS, AGES, ETC., OR THE NUMBERS CAN BE THE MEANS OF OUR SAMPLES, OR THE VARIANCES OF OUR SAMPLES OR THE STANDARD DEVIATIONS OF OUR SAMPLES. WE ALSO HAVE THE MEANS, VARIANCES AND STANDARD DEVIATIONS OF THESE STATISTICS DATA SETS
· WE “STANDARDIZE” OR CONVERT THESE INDIVIDUAL DATA VALUES TO Z-VALUES: Z = (X – MEAN) / STANDARD DEVIATION (DO THE SUBTRACTION FIRST). THE STANDARDIZED “Z-VALUE” IS SIMPLY THE NUMBER OF STANDARD DEVIATIONS THAT OUR CONVERTED X-VALUE IS FROM THE MEAN (AND THE STANDARDIZED MEAN IS ALWAYS ZERO). THE Z-VALUES CAN BE POSITIVE (TO THE RIGHT OF THE MEAN) OR NEGATIVE (TO THE LEFT OF THE MEAN), JUST AS YOUR 30 DATA POINTS WERE ABOVE AND BELOW THE MEAN.
· IF WE HAVE A Z-VALUE AND WANT TO DETERMINE WHAT THE RAW DATA POINT (X-VALUE) WAS WE USE: X = Z * STANDARD DEVIATION + MEAN (DO THE MULTIPLICATION FIRST)
· TAKE ANOTHER LOOK AT THE GRAPH YOU MADE OF YOUR 30 DATA POINTS. SOME DATA POINTS WERE ABOVE THE MEAN AND SOME BELOW. WHEN YOU STANDARDIZED THEM THE MEAN BECAME ZERO AND THOSE POINTS BELOW THE MEAN HAD A NEGATIVE STANDARD DEVIATON AND THOSE ABOVE IT HAD A POSITIVE STANDARD DEVIATION.
· THE AREAS IN THE TABLE SIMPLY CORRESPOND TO THE PROBABILITY OF A DATA POINT BEING LESS THAN OR EQUAL TO THAT Z-VALUE. SUBTRACT THAT AREA FROM 1.0000 AND WE HAVE THE PROBABILITY THAT OUR DATA POINT IS GREATER THAN OUR Z-VALUE. SO WHAT?
LONG STORY SHORT: THERE ARE SPECIFIC Z-VALUES OF INTEREST REFERRED TO AS “CRITICAL VALUES” AND THOSE ARE THE ONES THAT CORRESPOND TO THE SMALL (RARE) AREAS IN ONE OR BOTH “TAILS” OF OUR NORMAL DISTRIBUTION.
THESE CRITICAL Z-VALUES CORRESPOND TO “SIGNIFICANCE LEVELS” WHICH ARE THE AREAS TO THE LEFT OR RIGHT OF THAT CRITICAL Z-VALUE. THE COMMON SIGNIFICANCE LEVELS ARE 1%, 5%, AND 10% (OR 0.0100, 0.0500 AND 0.1000) ANDF THESE ARE THE AREAS IN THE BODY OF THE TABLES THAT ARE ALSO THE PROBABILITIES.
1) WHAT CRITICAL Z-VALUES (STANDARD DEVIATIONS) CORRESPOND TO THE -1%, -5% AND -10% AREAS UNDER THE CURVE? [THIS IS THE FAR LEFT AREA OF THE CURVE AND REMEMBER THAT THE TABLES GIVE AREAS TO THE LEFT SO YOU CAN JUST READ THESE Z-VALUES FROM THE TABLE)
2) WHAT CRITICAL Z-VALUES (STANDARD DEVIATIONS) CORRESPOND TO THE +1%, +5% AND +10% AREAS UNDER THE CURVE? THESE ARE THE AREAS TO THE FAR RIGHT BUT YOU ONLY GET AREAS TO THE LEFT FROM THE TABLE, SO WHAT AREAS AE YOU LOOKING FOR IN THE TABLE? [HINT: IF IT’S 1% TO THE RIGHT WHAT PERCENT MUST IT BE TO THE LEFT?]
THESE CRITICAL VALUES DON’T CHANGE AND YOU WILL USE THEM OFTEN, SO KEEP THESE CRITICAL VALUES HANDY
REMEMBER WHEN WE WERE TRYING TO IDENTIFY “OUTLIERS” IN A DATA SET? ONE WAY WAS TO SEE IF OUR DATA POINT WAS MORE THAN THE 2 STANDARD DEVATIONS ABOVE OR BELOW THE MEAN. NOTE THAT IN THE ABOVE GRAPH, 95.4% OF DATA IN A NORMAL DISTRIBUTION ARE IN THAT AREA OF THE CURVE (+ 2 SD’S FROM THE MEAN). ABOUT 5% ARE NOT (2.5% AT EACH EXTREME). THIS IS A RULE OF THUMB SUBSTITUTE FOR THE CRITICAL VALUE. |
HERE IS HOW WE USE THESE Z-VALUES TO SEE IF OUR DATA ARE IN THE “RARE” OR “UNUSUAL” AREAS TO THE FAR LEFT (OR RIGHT) OF OUR NORMAL DISTRIBUTION. WHY DO WE CARE IF DATA ARE RARE? YOU WILL SEE.
3) LET’S SEE IF ANY OF YOUR 30 DATA POINTS WOULD BE CONSIDERED “UNUSUAL”.
a) STANDARDIZE ALL 30 OF YOUR DATA POINTS (X-VALUES) TO GET 30 Z-VALUES. THESE Z-VALUES ARE THE NUMBER OF STANDARD DEVIATIONS THAT DATA POINT IS FROM THE MEAN. LIST THE X AND Z VALUES SIDE BY SIDE.
WE THEN DECIDE AT WHAT SIGNIFICANCE LEVEL WE WOULD CONSIDER A DATA POINT “UNUSUAL”. IF WE CHOSE A SIGNIFICANCE LEVEL OF 10% THAT MEANS THAT A DATA POINT WOULD HAVE TO HAVE A POSITIVE Z-VALUE (STANDARD DEVIATION) THAT CORRESPONDED TO A TABLE AREA OF 0.9000 TO THE LEFT (SINCE THIS DATA POINT IS IN THE 10% AREA IN THE FAR RIGHT TAIL. OR, IF THE DATA POINT HAD A NEGATIVE Z-VALUE ITS VALUE WOULD HAVE TO CORRESPOND TO A TABLE AREA OF 0.0100 (10%) IN THE FAR LEFT TAIL OF THE CURVE.
A SIGNIFICANCE LEVEL OF 5% WOULD NEED A +Z VALUE CORRESPONDING TO A TABLE AREA OF 0.9500 TO THE LEFT (LEAVING 0.0500 TO THE FAR RIGHT). OR A –Z-VALUE CORRESPONDING TO A TABLE AREA OF SIMPLY 0.0500 TO THE LEFT IN THE FAR LEFT TAIL.
b) FILL IN THE BLANKS : A SIGNIFICANCE LEVEL OF 1% WOULD NEED A +Z VALUE CORRESPONDING TO A TABLE AREA OF ________ TO THE LEFT (LEAVING ________ TO THE FAR RIGHT). OR A –Z-VALUE CORRESPONDING TO A TABLE AREA OF SIMPLY _________ TO THE LEFT IN THE FAR LEFT TAIL
c) IN QUESTION (1) YOU DETERMINED THE CRITICAL Z-VALUES FOR SIGNIFICANCE LEVELS OF +10%, +5% AND +1%, SO COMPARE YOUR STANDARDIZED DATA TO THEM AND LIST YOUR Z-VALUES AND ORIGINAL X-VALUES THAT ARE “UNUSUAL” AT THESE SIGNIFICANCE LEVELS. DOES THIS TELL YOU ANYTHING ABOUT YOUR NUMBER PICKIING? WHAT?
WE USE THIS SAME METHODOLOGY IN STATISTICAL HYPOTHESIS TESTING. WE CALCULATE A”TEST STATISTIC” BASED ON THE SAMPLE AND POPULATION DATA WE HAVE AND THEN COMPARE IT TO THE CRITICAL VALUES AT THE SIGNIFICANCE LEVEL WE HAVE CHOSEN (THE SAME CRITICAL VALUES WE HAVE IDENTIFED ABOVE STILL APPLY).
IF THE TEST STATISTIC IS GREATER THAN THE POSITIVE (+) CRITICAL Z-VALUE WE ARE IN THE “UNUSUAL” OR RARE AREA IN THE RIGHT TAIL OF THE NORMAL DISTRIBUTION AND WE WOULD “REJECT” OUR HYPOTHESIS. OR, IF THE TEST STATISTIC IS LESS THAN THE NEGATIVE (–) CRITICAL VALUE IT IS ALSO IN THE RARE AREA IN THE LEFT TAIL AND AGAIN WE REJECT OUR HYPOTHESIS. BE CAREFUL WITH THE NEGATIVES: WHILE A Z-VALUE OR STANDARED DEVIATION OF +2.36 IS GREATER THAN +2.34 (HENCE REJECT), -2.36 IS SMALLER (FURTHER LEFT IN THE TAIL) THAN -2.34 AND AGAIN WE REJECT. |
4) THE SPEED OF VEHICLES ALONG A STRETCH OF I-95 HAS AN APPROXIMATELY NORMAL DISTRIBUTION WITH A MEAN OF 75 MPH AND A STANDARD DEVIATION OF 10 MPH.
(a). THE SPEED LIMIT IS 70 MPH. WHAT IS THE PROPORTION OF VEHICLES GOING LESS THAN OR EQUAL TO THE SPEED LIMIT?
(b) WHAT PROPORTION OF THE VEHICLES WOULD BE GOING LESS THAN 60 MPH?
(c) A NEW SPEED LIMIT WILL BE INITIATED SUCH THAT APPROXIMATELY 10% OF VEHICLES WILL BE OVER THAT SPEED LIMIT. WHAT IS THE NEW SPEED LIMIT BASED ON THIS CRITERION?
(d) DO YOU THINK THE ACTUAL DISTRIBUTION (HOW THE CURVE LOOKS)OF SPEEDS DIFFERS FROM A NORMAL BELL-SHAPED DISTRIBUTION?
5) STUDENTS TAKES A STATISTICS TEST. THE GRADE DISTRIBUTION IS NORMAL WITH A MEAN OF 30, AND A STANDARD DEVIATION OF 6.
(a) ANYONE WHO SCORES IN THE TOP 20% OF THE DISTRIBUTION GETS A GRADE OF “A” OR “B” WHAT IS THE LOWEST SCORE SOMEONE CAN GET AND STILL GET A “B”?
(b) THE BOTTOM 20% GET A “D” OR “F”. WHAT IS THE LOWEST SCORE THAT STILL PASSES WITH THE “C” ?
6) WE CAN USE THE NORMAL DISTRIBUTION TO APPROXIMATE THE BINOMIAL DISTRIBUTION. YOU REMEMBER THE COMPLICATED BINOMIAL EQUATIONS (WEEK 1 BONUS)? THE EQUATIONS USING THE NORMAL TO APPROXIMATE THE BINOMIAL ARE MUCH SIMPLER, BUT THEY ARE NOT AS PRECISE. HERE IS A BINOMIAL PROBLEM:
(a) WHAT IS THE PROBABILITY OF GETTING 16 TO 18 HEADS OUT OF 25 FLIPS? IF WE WERE TO USE THE PRECISE BINOMIAL EQUATION TO CALCULATE THE PROBABILITY OF GETTING A 16, 17 AND 18 AND ADD THOSE INDIVIDUAL PROBABILITIES UP TO GET THE FINAL PROBABILITY. NOT REQUIRED BUT YOU CAN DO THIS FOR 0.5 BONUS POINTS
(b) THE NORMAL DISTRIBUTION APPROXIMATION FOR THIS BINOMIAL WOULD A RANGE FROM 15.5 TO 18.5 WITH A CONTINUOUS DISTRIBUTION (THAT IS, THE BINOMIAL OF 16 IS APPROXIMATED BY THE NORMAL OF 15.5 TO 16.5, ETC.) WHAT IS THE TOTAL PROBABILITY YOU GET WHEN USING THE NORMAL APPROXIMATION EQUATIONS? THIS ONE IS REQUIRED AND WRITE YOUR ANSWERS OUT TO FOUR DECIMAL PLACES.
7) HERE IS A GRAPH OF A NORMAL DISTRIBUTION. DRAW OVER IT WHAT A DISTRIBUTION WOULD LOOK LIKE:
(a) IF IT HAD THE SAME MEAN BUT A SMALLER STANDARD DEVIATION.
(b) IF IT HAD THE SAME MEAN BUT A LARGER STANDARD DEVIATION.
(c) WHY DON’T THE ENDS TOUCH THE ZERO LINE?
8) HEIGHT AND WEIGHT ARE TWO MEASUREMENTS USED TO TRACK A CHILD’S DEVELOPMENT. THE WEIGHTS FOR ALL 11 YEAR OLD GIRLS, 4’ 8” TALL IN A REFERENCE POPULATION HAD A MEAN OF µ = 74 POUNDS WITH A STANDARD DEVIATION OF σ = 2 LBS. ASSUME THESE WEIGHTS ARE NORMALLY DISTRIBUTED. CALCULATE THE Z-SCORES THAT CORRESPOND TO THE FOLLOWING WEIGHTS AND INTERPRET THEM. THIS IS USEFUL STATISTICS.
(a) 70 LBS
(b) 86 LBS KG
(c) 60 LBS
(d) IF YOU WERE THE PARENT OF ANY OF THESE CHILDREN, WOULD YOU BE CONCERNED? WHY OR WHY NOT?
9) A LEGAL STATISTICAL PROBLEM: A PATERNITY LAWSUIT. THE LENGTH OF A PREGNANCY IS NORMALLY DISTRIBUTED WITH A MEAN OF 280 DAYS AND A STANDARD DEVIATION OF 13 DAYS. AN ALLEGED FATHER WAS OUT OF THE COUNTRY FROM 240 TO 306 DAYS BEFORE THE BIRTH OF THE CHILD, SO THE PREGNANCY WOULD HAVE BEEN LESS THAN 240 DAYS OR MORE THAN 306 DAYS LONG IF HE WAS THE FATHER. (A HEALTHY CHILD WAS BORN WITH NO COMPLICATIONS, BUT:
(a) WHAT IS THE PROBABILITY THAT HE WAS NOT THE FATHER?
(b) WHAT IS THE PROBABILITY THAT HE COULD BE THE FATHER?
(HINT: CALCULATE THE Z-SCORES FIRST, AND THEN USE THOSE TO DETERMINE THE PROBABILITIES)
10) SUPPOSE THAT THE DISTANCE OF FLY BALLS HIT TO THE OUTFIELD (IN BASEBALL) IS NORMALLY DISTRIBUTED WITH A MEAN OF 240 FEET AND A STANDARD DEVIATION OF 40 FEET. WE RANDOMLY SAMPLE 50 FLY BALLS. IF = AVERAGE DISTANCE IN FEET FOR 50 FLY BALLS, THEN
(a) WHAT IS THE PROBABILITY THAT THE 50 FLY BALLS TRAVELED AN AVERAGE OF LESS THAN 230 FEET? SKETCH THE GRAPH. SCALE THE HORIZONTAL AXIS FOR . SHADE THE REGION CORRESPONDING TO THE PROBABILITY. FIND THE PROBABILITY.
(b) FIND THE 75TH PERCENTILE OF THE DISTRIBUTION OF THE AVERAGE OF 50 FLY BALLS.