library(tidyverse)
library(knitr)
sta199 <- read_csv("sta199-fa21-year-major.csv")

Learning goals

Coming Up

Definitions

Let A and B be events.

Part 1: STA 199 years & majors

For this portion of the AE, we will continue using the data including the year in school and majors for students taking STA 199 in Fall 2021, i.e., you! The data set includes the following variables:

Let’s start with the contingency table from the last class:

sta199 %>% 
  count(year, major_category) %>%
  pivot_wider(id_cols = c(year, major_category),#how we identify unique obs
              names_from = major_category, #how we will name the columns
              values_from = n, #values used for each cell
              values_fill = 0) %>% #how to fill cells with 0 observations 
  kable() # neatly display the results
year compsci only econ only other pubpol only stat + other major stats only undecided
First-year 8 6 39 22 26 7 5
Junior 7 3 12 4 1 0 0
Senior 2 0 5 1 1 0 0
Sophomore 23 6 42 11 8 3 5

Try to answer the questions below using the contingency table and using code to answer in a reproducible way.

Part A: What is the probability a randomly selected STA 199 student is studying a subject in the “other” major category?

sta199 %>% 
  count(major_category) %>%
  mutate(prob = n / sum(n)) %>%
  filter(major_category == "other")

Part B: What is the probability a randomly selected STA 199 student is a first-year?

sta199 %>% 
  count(year) %>%
  mutate(prob = n / sum(n)) %>%
  filter(year == "First-year")

Part C: What is the probability a randomly selected STA 199 student is a first year and is studying a subject in the “other” major category?

sta199 %>%
  mutate(firstyr_other = ifelse(year == "First-year" & major_category == "other", 1, 0)) %>%
  summarize(prob = mean(firstyr_other))

Part D: What is the probability a randomly selected STA 199 student is a first year given they are studying a subject in the “other” major category?

sta199 %>%
  filter(major_category == "other")%>%
  count(year) %>%
  mutate(prob = n / sum(n)) %>%
  filter(year == "First-year")

Part E: What is the probability a randomly selected STA 199 student is studying a subject in the “other” major category given they are a first-year?

sta199 %>%
  filter(year == "First-year")%>%
  count(major_category) %>%
  mutate(prob = n / sum(n)) %>%
  filter(major_category == "other")

Part F: Are being a first-year and studying a subject in the “other” category independent events? Briefly explain.

Part 2: Bayes’ Theorem

Monty Hall Problem:

A Video: https://brilliant.org/wiki/monty-hall-problem/.

“Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you,”Do you want to pick door No. 2?” Is it to your advantage to switch your choice?”

We will investigate the above decision of whether to switch or not to switch.

Assumptions:

The host will always open a door not picked by the contestant.

The host will always open a door which reveals a goat (i.e. not a car).

The host will always offer the contestant the chance to switch to another door.

The door behind which the car is placed is chosen at random.

The door initially chosen by the contestant is chosen at random.

doors <- c(1, 2, 3)
monty_hall <- tibble(
  car_door = sample(doors, size = 10000, replace = TRUE),
  my_door = sample(doors, size = 10000, replace = TRUE)
  )
monty_hall
monty_hall <- monty_hall %>% 
  rowwise() %>% 
  mutate(monty_door = if_else(car_door == my_door,
                              sample(doors[-my_door], size = 1),
                              6 - (car_door + my_door))) %>% 
  ungroup()
monty_hall
monty_hall <- monty_hall %>% 
  mutate(switch_win = car_door != my_door,
         stay_win   = car_door == my_door)
monty_hall
monty_hall %>% 
  summarise(switch_win_prob = mean(switch_win),
            stay_win_prob   = mean(stay_win))

Some Practice using the Hypothetical 10,000

The global coronavirus pandemic illustrates the need for accurate testing of COVID-19, as its extreme infectivity poses a significant public health threat. Due to the time-sensitive nature of the situation, the FDA enacted emergency authorization of a number of serological tests for COVID-19 in 2020. Full details of these tests may be found on its website here.

We will define the following events:

The Abbott Alinity test has an estimated sensitivity of 100%, P(Pos | Covid) = 1, and specificity of 99%, P(Neg | No Covid) = 0.99.

Suppose the prevalence of COVID-19 in the general population is about 2%, P(Covid) = 0.02.

Bayes Theorem and the Hypothetical 10,000.

Part A: Use the Hypothetical 10,000 to calculate the probability a person has COVID given they get a positive test result, i.e. P(Covid | Pos).

Covid No Covid Total
Pos 200 98 298
Neg 0 9702 9702
Total 200 9800 10000

Part B: Use Bayes’ Theorem to calculate P(Covid|Pos).