# LAB 2

hp17
Lab2Questions.docx

1. Question #2 - Page # 88, Use data P02_03.xlsx.

The file P02_03.xlsx contains data from a survey of 399

people regarding a government environmental policy.

a. Create a crosstabs and an associated column chart for

Gender versus Opinion. Express the counts as percentages

so that for either gender, the percentages add

to 100%. Discuss your findings. Specifically, do the

two genders tend to differ in their opinions about the

environmental policy?

b. Repeat part a with Age versus Opinion.

c. Recode Salary to be categorical with categories “Less

than \$40K,” “Between \$40K and \$70K,” “Between \$70K

and \$100K,” and “Greater than \$100K” (where you can

treat the breakpoints however you like). Then repeat

part a with this new Salary variable versus Opinion.

2.  Question #20 - Page #96, Use data P02_56.xlsx

The file P02_56.xlsx contains monthly values of indexes

that measure the amount of energy necessary to heat or cool

buildings due to outside temperatures. (See the explanation

in the Source sheet of the file.) These are reported for each

state in the United States and also for several regions, as

listed in the Locations sheet, from 1931 to 2000.

a. For each of the Heating Degree Days and Cooling

Degree Days sheets, create a new Season variable

with values “Winter,” “Spring,” “Summer,” and “Fall.”

Winter consists of December, January, and February;

Spring consists of March, April, and May; Summer

consists of June, July, and August; and Fall consists of

September, October, and November.

b. Find the mean, median, and standard deviation

of Heating Degree Days (HDD), broken down by

Season,

for the 48 contiguous states location (code

5999). (Ignore the first and last rows for the given location,

the ones that contain -9999, the code for missing

values.)

Also, create side-by-side box plots of HDD,

broken down by season. Comment on the results. Do they go in the direction you would expect? Do the

same for Cooling Degree Days (which has no missing data).

c. Repeat part b for California (code 0499).

d. Repeat part b for the New England group of states

(code 5801).

3.  Question #30 - Page #105, Use data P03_30.xlsx

rates of various currencies versus the U.S. dollar. It is of

interest to financial analysts and economists to see whether

exchange rates move together through time. You could find

the correlations between the exchange rates themselves,

but it is often more useful with time series data to check for

correlations between differences from day to day.

a. Create a column of differences for each currency.

b. Create a table of correlations between all of the original

variables. Then on the same sheet, create a second

table of correlations between the difference variables.

On this same sheet, enter two cutoff values, one positive

such as 0.6 and one negative such as 20.5, and

use conditional formatting to color all correlations (in

both tables) above the positive cutoff green and all

correlations below the negative cutoff red. Do it so

that the 1’s on the diagonal are not colored.

c. Based on the second table and your coloring, can you

conclude that these currencies tend to move together

in the same direction? If not, what can you conclude?

d. Repeat part c for Ratio2.

1. Question #49 - Page #125, Use data P03_22.xlsx

The file P03_22.xlsx lists financial data on movies

released from 1980 to 2011 with budgets of at least

\$20 million.

a. Create three new variables, Ratio1, Ratio2, and

Decade. Ratio1 should be US Gross divided by

Budget,

Budget, and Decade should list 1980s, 1990s, or

2000s, depending on the year of the release date.

If either US Gross or Worldwide Gross is listed as

“Unknown,” the corresponding ratio should be blank.

(Hint: For Decade, use the YEAR function to fill in a

new Year column. Then use a lookup table to populate

b. Use a pivot table to find counts of movies by various

distributors. Then go back to the data and create one

more column, Distributor New, which lists the distributor

for distributors with at least 30 movies and lists

Other for the rest. (Hint: Use a lookup table to populate

Distributor New, but also use an IF to fill in Other

where the distributor is missing.)

c. Create a pivot table and corresponding pivot chart

that shows average and standard deviation of Ratio1,

broken down by Distributor New, with Decade in the

Filters area. Comment on any

d. Repeat part c for Ratio2.

1.  Question #63 - Page #129, Use data P03_63.xlsx

63. The file P03_63.xlsx contains financial data on 85

U.S. companies in the Computer and Electronic Product

Manufacturing sector (NAICS code 334) with 2009

earnings before taxes of at least \$10,000. Each of these

companies listed R&D (research and development)

expenses on its income statement. Create a table of correlations

between all of the variables and use conditional

formatting to color green all correlations involving

R&D that are strongly positive or negative. (Use cutoff

values of your choice to define “strongly.”) Then create

scatterplots of R&D (Y axis) versus each of the other

most highly correlated variables. Comment on any patterns

you see in these scatterplots, including any obvious

outliers, and explain why (or if) it makes sense that

these variables are highly correlated with R&D. If there

are highly correlated variables with R&D, can you tell

which way the causality goes?