For this assignment you must have at least three commits and all of your code chunks must have meaningful names.
For your first commit, update your author name in the YAML header of the template R Markdown file.
You can access the assignment at https://classroom.github.com/a/qRRhVDce.
Clone the repository and open a new project in RStudio. See the earlier lab and lecture for additional instructions.
We will work with the tidyverse
package as usual. We will also use viridis
and the ggridges
packages.
library(tidyverse)
library(viridis)
library(ggridges)
<- read_csv("anes2020_subset.csv") anes
The data for this homework assignment comes from the 2020 American National Election Study.
A subset of variables are provided here. Some of them have been recoded, while others you may need to recode in order to be able to carry out your analysis. The variables are as follows:
CASEID
: a Case ID for the respondent.
hunt_fish
: a dummy variable asking if the respondent has gone hunting or fishing in the past year.
scientists
: A feeling thermometer question that asks how warmly respondents feel towards scientists. A score of 0 represents the coolest rating, while a score of 100 represents the warmest rating.
education
: An ordinal categorical representing the highest level of education for the respondent, ranging from less than high school to a professional degree.
ideology
: a seven point self-rating scale for the respondent’s ideology ranging from most liberal to most conservative
urbanrural
: a variable indicating how rural or urban the respondent’s home community is with four possible values: rural, small town, suburb, or city.
All plots should follow the best visualization practices discussed in lecture. Plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.
In addition, code and narrative should not exceed the 80 character limit. See the Lab #01 instructions for setting a vertical line at 80 characters in your R Markdown file.
anes
dataset? How many columns? Please include code and output to support your response.Now would be a good time to do your first knit, commit, and push.
Create a bar chart showing the ideology
of the respondents, with the count on the y-axis. Please remember to include labels. What is the most common ideology? Do respondents tend to be moderate or more ideologically extreme?
Now, let’s examine whether ideologies are different based upon where people live. Please make a filled bar plot, showing one bar for each ideology
, with the percentage of respondents on the y-axis going from 0-1, and the fill determined by urbanrural
. You are encouraged but not required to use viridis
colors. Please remember to include labels.
Where do people of different ideologies tend to live? Does the percentage of non-responses (i.e., people who said NA) vary much by ideology?
Now would be a good time to knit, commit, and push again.
How do people view scientists? Please make a histogram with the feeling thermometer on the x-axis and the number of respondents on the y-axis. Then, please comment on features of the histogram such as skewness and peaks.
Does the ideology of those who have gone hunting or fishing in the past year differ from those who haven’t? Please make side-by-side boxplots of these two groups.
You should start your code with:
%>%
anes drop_na(hunt_fish) %>%
mutate(hunted_fished = ifelse(hunt_fish == 0, "Did Not Hunt or Fish", "Hunted or Fished")) %>%
Please note that drop_na
removes observations that are NA for the hunt_fish
variable.
Then construct side-by-side ridgeline density plots using geom_density_ridge()
. You can read more about ridgeline plots here.
Please describe what you observe in both plots and what you learn from one plot that you do not see in the other or that adds additional context to the other.
method = "lm"
. Do you think this is an especially useful visualization? Why or why not?Now would be a good time do a final knit, commit, and push again.
There are a lot of data points in this dataset. For this exercise, you are going to begin by taking a sample, using the code below:
set.seed(18)
<- anes %>%
anes2 sample_frac(.10)
This code takes a random subset of the dataset– including set.seed
makes sure that it is the same subset each time.
Make a scatterplot using this subset and facet by whether the person hunted or fished in the past year, with labels in words identifying which group the subplot represents.
Then, please add a geom_smooth()
with method = lm
for each season and add the argument se = FALSE
to omit the bands surrounding the line. Describe what you observe.
Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.
Only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Associate the “Overall” section with the first page.
Total: 50 pts.
Exercise 1: 2 pts
Exercise 2: 6 pts
Exercise 3: 6 pts
Exercise 4: 4 pts
Exercise 5: 6 pts
Exercise 6: 10 pts
Exercise 7: 10 pts
Workflow and formatting: 6 pts