# Rstudio,dplyr,ggplot2

Rshah250

View(d)

# Name of the dataset is "d".

# View the dataset. This is about US counties,

# such as county's state, population, income, etc.

# QUESTIONS

# 1) What percentage of the dataset is populated with NA values?

#    Hint: Calculate total NAs then divide by (number of rows X number of columns)

# 2) Show the county name and population in 2017 for the state of Connecticut

#    where unemployment rate is greater than 5.0.

# 3) Show the county names, states and per capita income whose population

#    change is positive and unemployement rate is less than 2.

# 4) We are wondering about population change from 2010 to 2017

#    in counties where poverty rate is greater than 20 and

#    unemployment rate is greater than 8.

#    For this purpose, create a new variable using the following formula

#    as a new column of the dataset d.

#    popChng17_20 = (pop2017 - pop2010)/pop2010.

#

# 5) The population in "Hoonah Angoon Census Area" of Alaska in 2017

#    is missing (i.e. NA). However, a quick Google search showed that

#    this population is actually 2139. Now replace the NA in this

#    spot with 2139.

# 6) We are wondering about mean poverty level for metro and non-metro counties

#    in the state of Connecticut. Calculate them. Which one is higher?

#    (NOTE: To ignore NAs, use na.rm = TRUE when calculating means)

# 7) Which year has the highest variation in terms of county populations?

#    Is it 2000, 2010 or 2017? (NOTE: To ignore NAs, use na.rm = TRUE )

# 8) Create a histogram for homeownership variable

#    with 40 bars. Comment on the skewness of the distribution.

# 9) Create a boxplot showing poverty variable for categories metro and non-metro areas

#    (HINT: use y ~ x notation in the boxplot.) Comment on the plot

#    as to which location has higher poverty overall?

# 10) Assume that you are investigating the variables that

# could be associated with 'poverty' variable. Create scatter

# plots on a 2 x 2 panel for poverty vs. unemployment_rate,

# homeownership, per_capita_income, and pop_change.

# Which variables seem to be associated with poverty?

# (NOTE: If you get "figure margins too large", enlarge the plotting

# window to the left and upward)

#################################################

# In this part, use ggplot2 and dplyr packages

#################################################

library(ggplot2)

library(dplyr)

# 11) We wonder about the change in unemployement rate

#     as education level changes.

#     For this purpose, create a boxplot for unemployment_rate vs. median_edu.

#     Label x-axis as "Education Level"

#     Comment on the chart. How does the unemployement rate change?

# 12) Using dplyr and ggplot2, find population change from 2010 to 2017

#     in counties where poverty rate is greater than 20 and

#     unemployment rate is greater than 8.

#     For this purpose, mutate a new variable

#     called popChng17_20 = (pop2017 - pop2010)/pop2010.

#     Then, create a ggplot boxplot showing popChng17_20 vs metro.

# 13) Group the dataset by State, and then summarize using

#     count, mean unemployement rate, and mean per capita income.

#     Sort the result by mean unemployment rate.

#     (Hint: Remove NAs when calculating means: na.rm = TRUE)

• 12 days ago
• 50