You can find the repo for homework 3 here: https://classroom.github.com/a/D5Fe1IWz.
Clone the repository, open a new project in RStudio.
We will work with the tidyverse
package as usual. Our data comes from the fivethirtyeight
package. You may also want to use viridis.
library(tidyverse)
library(fivethirtyeight)
library(viridis)
Bob Ross was a painter who was most famous for his PBS television show The Joy of Painting. In each episode, Ross created a new oil painting and provided instructions and commentary as he painted it. Ambitious viewers could paint along but viewers also simply enjoyed watching and listening to Ross’s soothing voice as he painted an outdoor scene in 30 minutes.
In 2014, Walt Hickey wrote an article for FiveThirtyEight using statistics to analyze the paintings created on the show.The article focused on features that were often seen in Ross’s paintings, such as trees, clouds, cabins, among others. Click here to see the article
In this assignment, you will analyze the data that was used for the article. The data is in the bob_ross
data set in the fivethirtyeight R package. Each observation represents an episode of the TV show. One painting was created in an episode. To access the full codebook of variables, explore the documentation using ?bob_ross
.
We’ll focus on the following variables in this assignment:
tree
: Whether or not the painting contains at least 1 treeguest
: Whether or not the episode featured a guest paintersteve_ross
: Whether or not Steve Ross was the featured guestcirrus
: Whether the painting contains cirrus cloudscumulus
: Whether the painting contains cumulus cloudsmountain
: Whether the painting contains a mountainriver
: Whether the painting contains a rivercabin
: Whether the painting contains a cabinlake
: Whether the painting contains a lake“There’s nothing wrong with having a tree as a friend.”
In how many episodes was a tree painted?
What is the probability a randomly selected episode featured a tree?
The Joy of Painting occasionally featured a guest painter other than Bob Ross. One guest painter was Bob’s son Steve Ross.
What’s the probability the show featured Steve Ross given there was a guest painter?
Did Steve Ross like to paint mountains more or less than the other guest painters? Create a stacked bar plot of guest painters. Have whether or not Steve was the guest painter on the x-axis and fill the bars according to whether or not a mountain exists in the painting. Note: rename observations so that they are more informative than 0 and 1.
The next few questions will focus only on paintings created by Bob Ross. Make a new data frame called ross_paintings
that only includes episodes (and thus paintings) made by Bob Ross. Save this data frame and use it for exercises 4 - 6.
“Let’s build us a happy, little cloud that floats around the sky.”
Are the following two events disjoint? Why or why not?
cirrus
cloudcumulus
cloudIn the FiveThirtyEight article, Walt Hickey calculates various probabilities to describe the combination of features typically found in Bob Ross paintings. He states the following about the presence of cabins and lakes in Ross’s paintings: “About 18 percent of his paintings feature a cabin. Given that Ross painted a cabin, there’s a 35 percent chance that it’s on a lake…”
How many of Bob Ross’s paintings feature a cabin? Call this number “M”
How many of those that feature a cabin also feature a lake? Call this number “X” for the next question.
Imagine the following: every time Bob Ross paints a cabin, he flips a fair coin to decide whether or not to paint a lake. Given a collection of M Bob Ross paintings with cabins, what is the probability Bob Ross painted X or fewer lakes?
Hint: use the code below as a template; you can read more about rbinom here.
set.seed(2182022) # don't change the seed
= rbinom(100000, M, prob = ?)
num_lakes = data.frame(num_lakes) cabin_lakes
Suppose you randomly select a Bob Ross painting and see that it features a mountain. Use Bayes Theorem to calculate the probability this painting also features a river. Show your work by using a code chunk as a calculator.
Hint: p(mountain | river) = 0.39
Follow up question: Does Bob Ross paint mountains independent of whether or not he paints rivers? Why? In other words, is event A independent of B? Here we define events:
A: Bob Ross paints a mountain
B: Bob Ross paints a river
Your turn! Use this data to explore a question of your choice about paintings created in the TV show The Joy of Painting. Your question should explore the relationship between 3 variables in the data set; at least one of the variables must be one that hasn’t been used in exercises 1 - 6. You may use the entire data set or focus the analysis on paintings made by Bob Ross.
Hint: Click here for functions to manually create color palettes in ggplot2.
Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo. Only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all corresponding pages. Associate the “Overall” section with the first page.