Lab #02: Data Visualization

due January 19th at 11:59 PM

Goals

Configuring SSH and GitHub

In case you need to again the public key, here is a reminder of the steps from last week’s lab.

  1. First, type credentials::ssh_setup_github() into your console.
  2. Second, R will ask “No SSH key found. Generate one now?” You should click 1 for yes.
  3. Third, you will generate a key. It will begin with “ssh-rsa….” R will then ask “Would you like to open a browser now?” You should click 1 for yes.
  4. Fourth, you may be asked to provide your username and password to log into GitHub. This would be the ones associated with your account that you set up. After entering this information, you should paste the key in and give it a name.

Getting started

You can find lab 2 here: https://classroom.github.com/a/HbnmJCpS.

Configure git by running the following code in the terminal. Fill in your GitHub username and the email address associated with your GitHub account.

git config --global user.name 'username'
git config --global user.email 'password'

Write your answers in the lab02.Rmd template file. Update the YAML header with your name and today’s date. Then, knit the document and make sure the resulting PDF file has the correct date. Stage, commit, and push your changes.

Packages

We will be using tidyverse and viridis to make plots in R! Our data comes from the fivethirtyeight package.

library(tidyverse)
library(fivethirtyeight)
library(viridis)

Bechdel Test

In 2017, FiveThirtyEight published an article on the Bechdel Test, which is used to measure gender imbalance in movies. The test examines whether the movie has at least two female characters who have a conversation that is not about a man.

The data for this lab is inside the fivethirtyeight package in a dataset called bechdel.

All plots should follow the best visualization practices we have discussed in lecture. Plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.

In addition, code and narrative should not exceed the 80 character limit. To help police this, add a vertical line at 80 characters by clicking “Tools” \(\rightarrow\) “Global Options” \(\rightarrow\) “Code” \(\rightarrow\) “Display”, then set “Margin Column” to 80 and click “Apply”.

Your assignment should have at least three meaningful commits.

  1. Make a histogram of the domestic gross in 2013 dollars. (There is a specific variable in the dataset for this with an intuitive name.) Please set the number of bins at 10. Please label your axes and give the plot a title. Does it appear that there are any outliers? Does the distribution appear to be symmetric or skewed?

Before going to Exercise 2, now would be a good time to do your first knit and then commit and push.

  1. Generate a scatterplot of a film’s budget in 2013 dollars (budget_2013) as your x variable versus 2013 domestic gross (domgross_2013) as your y variable with points colored by binary, which indicates if it passed the Bechdel Test. Please label your axes and legend and give the plot a title.

(Hint: if you would like to use the viridis palette, the line scale_color_viridis(discrete = TRUE, option = "C", name = "Passed Bechdel Test?") will be useful to your code.)

  1. Describe what you observe in the plot for Exercise 2. Do you observe the same pattern for movies that pass and those that do not?

  2. Now, examine the relationship between same two variables, with a separate plot for those that passed and those that did not. Please label your axes and add a line of best fit without standard errors (i.e., please set se = FALSE). Which plot do you prefer? Briefly explain your choice.

Now would be a good time to knit, commit, and push, again.

  1. While it might seem straightforward as to whether a movie passes the test or not, sometimes it is less clear. The variable clean_test divides movies into five separate categories in terms of whether they pass. “Dubious” and “OK” indiciate some degree of passage, while the other three categories indicate some degree of failure.

Does the international gross of a movie depend upon whether the movie passes the Bechdel Test?

Create side-by-side boxplots of a movie’s 2013 international gross (intgross_2013) for whether it passed the test (clean_test). Briefly comment on what you notice. Which categories have movies that have grossed above 3 billion dollars? How do you know based upon the plot?

  1. Has the percentage of movies that pass the Bechdel Test increased over time?

Please create a segmented bar chart with one bar per year, each bar going from 0 - 1, with the fill determined by whether the movie passed the Bechdel Test. What do you notice?

Before going to the last exercise, try knitting, committing, and pushing one more time.

  1. Recreate the plot below. This page will be helpful in determining the theme. The size of the points is 0.75. (Please note: the points are colored different shades of blue.)

Note: The y-axis represents domestic gross in 2013 dollars.

Submission

Once you are fully satisfied with your lab, Knit to PDF to create a PDF document.

Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. we will be checking these to make sure you have been practicing how to commit and push changes.

Remember – you must turn in a PDF file to the Gradescope page before the submission deadline for full credit.

Once your work is finalized in your GitHub repo, you will submit it to Gradescope. Your assignment must be submitted on Gradescope by the deadline to be considered “on time”.

Grading

Total: 50 pts