library(tidyverse)
library(knitr)
sta199 <- read_csv("sta199-fa21-year-major.csv")
Let A and B be events.
Marginal probability: The probability an event occurs regardless of values of the other event
P(A) or P(B)
Joint probability: The probability two or more simultaneously occur
P(A and B)
Conditional probability: The probability an event occurs given the other has occurred
P(A|B) or P(B|A)
Independent events: Knowing one event has occurred does not lead to any change in the probability we assign to another event.
P(A|B) = P(A) or P(B|A) = P(B)
For this portion of the AE, we will continue using the data including the year in school and majors for students taking STA 199 in Fall 2021, i.e., you! The data set includes the following variables:
section
: STA 199 sectionyear
: Year in schoolmajor_category
: Major / academic interest.
Let’s start with the contingency table from the last class:
sta199 %>%
count(year, major_category) %>%
pivot_wider(id_cols = c(year, major_category),#how we identify unique obs
names_from = major_category, #how we will name the columns
values_from = n, #values used for each cell
values_fill = 0) %>% #how to fill cells with 0 observations
kable() # neatly display the results
year | compsci only | econ only | other | pubpol only | stat + other major | stats only | undecided |
---|---|---|---|---|---|---|---|
First-year | 8 | 6 | 39 | 22 | 26 | 7 | 5 |
Junior | 7 | 3 | 12 | 4 | 1 | 0 | 0 |
Senior | 2 | 0 | 5 | 1 | 1 | 0 | 0 |
Sophomore | 23 | 6 | 42 | 11 | 8 | 3 | 5 |
Try to answer the questions below using the contingency table and using code to answer in a reproducible way.
Part A: What is the probability a randomly selected STA 199 student is studying a subject in the “other” major category?
sta199 %>%
count(major_category) %>%
mutate(prob = n / sum(n)) %>%
filter(major_category == "other")
Part B: What is the probability a randomly selected STA 199 student is a first-year?
sta199 %>%
count(year) %>%
mutate(prob = n / sum(n)) %>%
filter(year == "First-year")
Part C: What is the probability a randomly selected STA 199 student is a first year and is studying a subject in the “other” major category?
sta199 %>%
mutate(firstyr_other = ifelse(year == "First-year" & major_category == "other", 1, 0)) %>%
summarize(prob = mean(firstyr_other))
Part D: What is the probability a randomly selected STA 199 student is a first year given they are studying a subject in the “other” major category?
sta199 %>%
filter(major_category == "other")%>%
count(year) %>%
mutate(prob = n / sum(n)) %>%
filter(year == "First-year")
Part E: What is the probability a randomly selected STA 199 student is studying a subject in the “other” major category given they are a first-year?
sta199 %>%
filter(year == "First-year")%>%
count(major_category) %>%
mutate(prob = n / sum(n)) %>%
filter(major_category == "other")
Part F: Are being a first-year and studying a subject in the “other” category independent events? Briefly explain.
A Video: https://brilliant.org/wiki/monty-hall-problem/.
“Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you,”Do you want to pick door No. 2?” Is it to your advantage to switch your choice?”
We will investigate the above decision of whether to switch or not to switch.
Assumptions:
The host will always open a door not picked by the contestant.
The host will always open a door which reveals a goat (i.e. not a car).
The host will always offer the contestant the chance to switch to another door.
The door behind which the car is placed is chosen at random.
The door initially chosen by the contestant is chosen at random.
doors <- c(1, 2, 3)
monty_hall <- tibble(
car_door = sample(doors, size = 10000, replace = TRUE),
my_door = sample(doors, size = 10000, replace = TRUE)
)
monty_hall
monty_hall <- monty_hall %>%
rowwise() %>%
mutate(monty_door = if_else(car_door == my_door,
sample(doors[-my_door], size = 1),
6 - (car_door + my_door))) %>%
ungroup()
monty_hall
monty_hall <- monty_hall %>%
mutate(switch_win = car_door != my_door,
stay_win = car_door == my_door)
monty_hall
monty_hall %>%
summarise(switch_win_prob = mean(switch_win),
stay_win_prob = mean(stay_win))
The global coronavirus pandemic illustrates the need for accurate testing of COVID-19, as its extreme infectivity poses a significant public health threat. Due to the time-sensitive nature of the situation, the FDA enacted emergency authorization of a number of serological tests for COVID-19 in 2020. Full details of these tests may be found on its website here.
We will define the following events:
The Abbott Alinity test has an estimated sensitivity of 100%, P(Pos | Covid) = 1, and specificity of 99%, P(Neg | No Covid) = 0.99.
Suppose the prevalence of COVID-19 in the general population is about 2%, P(Covid) = 0.02.
Bayes Theorem and the Hypothetical 10,000.
Part A: Use the Hypothetical 10,000 to calculate the probability a person has COVID given they get a positive test result, i.e. P(Covid | Pos).
Covid | No Covid | Total | |
---|---|---|---|
Pos | 200 | 98 | 298 |
Neg | 0 | 9702 | 9702 |
Total | 200 | 9800 | 10000 |
Part B: Use Bayes’ Theorem to calculate P(Covid|Pos).