Using Excel for Statistical Analysis and Visualization

profilefahad1993
Lab5UsingExcelforStatisticalAnalysisandVisualization2020.docx

Lab: Using Excel for Statistical Analysis and Visualization

PUP 424—Planning Methods

Assignment Overview

In addition to giving you more practice accessing data from the US Census website, this assignment provides you the opportunity to work with Excel to perform various statistical analyses of your data, as well as to visualize your data as information ready to be presented. Create a Word document on which you will answer the questions listed below. Submit your answers via the link on Canvas found under the “Assignments” tab. This assignment is due Monday night 3/23 by midnight.

Assignment Steps

1. Open the file titled “Lab 5 Data”. You can find it as an attachment to this assignment on Canvas. This table provides data by zip code in the Phoenix metro area for housing and transportation-related characteristics. Let’s say you want to create some graphic visuals of some of the data. You want to create two pie charts that compare the proportion of vacant vs. occupied housing units in the 85003 and 85251 zip codes. First, highlight the cells containing the number of occupied and vacant housing units for 85003 (cells C3 and D3), as well as the cells containing the column headings (cells C2 and D2). Under the “Insert” tab, select the 2-D pie chart option to create the first visual. Under the “Chart Layouts” option at the top left, select a layout that includes a chart title and shows the percentages. Change the chart title to “Housing Occupancy in 85003”and adjust the text font and color so that it can easily be seen on the chart. (Your chart might look something like Figure 3.)

Question #1: Cut and paste your chart from the Excel document and into your Word document.

Question #2: Now complete the same steps to create a pie chart for the 85251 zip code. Cut and paste the results in your Word document.

Figure 3

4. Now let’s say you want to determine the mean (average) number of mobile home units in Phoenix zip codes. You can do this either by adding the mobile home counts (using Excel’s SUM function) and dividing the total by the number of zip codes in the sample OR by using Excel’s built-in AVERAGE function. Ask a classmate or your TA if you need help!

Question #3: What is the average number of mobile home units?

You also want to compare the median to the mean that you just calculated. Again there are two ways to do this. One is to sort your data by the mobile home count column (M), and the value in the middle of your data will be the median (take the total number of data points + 1, and then divide by 2 to determine which record contains the median). To perform the sort, first highlight all the cells in spreadsheet that contain data (rows 3 through 165 from columns A to AH), but do not highlight the column headings (in rows 1 or 2). Under the “Data” tab select “Sort” and choose “Sort by” Column M (the column containing the mobile home counts) and choose order “Smallest to Largest.” Now that the data are ordered, find the median. Alternatively, you can use Excel’s MEDIAN function.

Question #4: What is the median number of mobile home units? What is the minimum number of mobile home units in a zip code? What is the maximum number?

5. Finally, you want to conduct an analysis of the impacts of low-density single-family “sprawled” development on commuting travel time. You want to analyze the correlation between the percentage of 1-unit detached housing structures and the percentage of workers commuting longer than 30 minutes. To do this, you will first need to add a few columns to your table to calculate the percentages. First, add a column to the right of the “1-unit, detached” column. (To do this, highlight the column to the right of “1-unit, detached,” right click, and select “insert” to make a new column appear.) Label the new column “Percent Single-Family” (in cell G2). Next, in cell G3 type “=F3/E3” to divide the number of 1-unit detached houses by the total number of housing units in the zip code. With cell G3 highlighted, under the “Home” tab click the “%” symbol to convert the result into a percentage. Cut and paste this formula into the remaining cells in column G (or drag cell G3 down). You should now have percentages of single-family units for each zip code.

Question 5: What percentage of units in the 85013 zip code are single-family?

Question 6: Calculate the mean percentage of single-family homes per zip code (see step 4 above). What is the mean percentage?

Question 7: Sort the data based on the numbers in column G (see step 4 above). What is the median data value for single-family homes per zip code?

Now add a column on the right side of the table and label it “Total Commuters.” Since you want to find out how many workers commute in each zip code, in cell AJ3 type “=SUM(X3:AI3)” to add across the row. Cut and paste this formula into the remaining cells down column AJ to get your totals for each zip code.

Next add a column to the right labeled “30 Minutes Plus” and type “=SUM(AD3:AI3)” in cell AK3. This will add the counts for all commuters that travel 30 or more minutes to work (columns AD through AI). Cut and paste this formula into the remaining cells down column AK to get your totals for each zip code.

Now, add a column to the right labeled “% Commute 30+” and type “=AK3/AJ3” in cell AL3. This will divide the number of 30+ commuters by the total number of commuters to give you a percentage. (Be sure to change the value to a %, as in step 5 above). Cut and paste this formula into the remaining cells down column AL to get your percentages for each zip code.

Question 8: What percentage of workers living in the 85258 zip code commute 30 minutes or longer?

Now, create a scatter plot of the percentage of workers commuting 30 minutes or longer (column AL) who live in single-family homes (column G). To do this, hold down the control key and highlight the data in column G (cells G3 through G165) and the data in column AL (cells AL3 through AL165). (Make sure all the data are highlighted.) Under the “Insert” tab select the Scatter chart (with only markers) option. Change the layout so your scatter plot includes a chart title and axis titles. Change the chart title to “Long Commutes and Single-Family Living,” the X-axis to “Percent Single-Family Homes” and the Y-axis to “Percent Commutes 30+ Minutes.” (Delete the legend from your scatter plot, since there is only one series.)

To complete your scatter plot, insert a line that represents the line of best fit. Click on the scatter plot points, right click, and go to “Add Trendline”. Then choose “linear” as the type.

Question 9: Cut and paste your scatter plot (with line of best fit drawn) in your Word document.

Finally, you’ll create one last chart. This one will represent the distribution of vehicles available to households in the entire dataset. First, you’ll need to sum up the number of households with different numbers of vehicles available across all zip codes. Use Excel’s SUM function for this. Then, select your newly-summed cells and create a pie chart of these values (Insert -> Chart, and then choose a pie chart option). Label your chart with a title, and ideally with labels for each pie slice (you can get Excel to label them for you if you select both the newly-summed cells and the pie slice titles when you create your chart).

Question 10: Cut and paste your pie chart into your Word document.

Your Word document with answers to these questions, as well as your Excel table from step 2, are due via the link on Canvas found under the “Assignments” tab no later than Monday night 3/23 before midnight.

1