Part 1: Hypothesis test for a single proportion¹

A large university knows that about 70% of the full-time students are employed at least 5 hours per week. The members of the Statistics Department wonder if the same proportion of their students work at least 5 hours per week. They randomly sample 25 majors and find that 15 of the students work 5 or more hours each week. Use the code below to create a data frame of the results.

stats_work <- tibble(work_hours = c(rep("At least 5", 15), 
                               rep("Less than 5", 10)))

State the null and alternative hypotheses.
Fill in the code to generate the null distribution.

set.seed(101821)
#null_dist <- ____ %>%
#  specify(response = ____, success = "_____") %>%
#  hypothesize(null = "point", p = ____) %>%
#  generate(reps = 100, type = "draw") %>%
#  calculate(stat = "prop")
null_dist <- stats_work %>%
  specify(response = work_hours, success = "At least 5") %>%
  hypothesize(null = "point", p = 0.7) %>%
  generate(reps = 1000, type = "draw") %>%
  calculate(stat = "prop")

Visualize the null distribution and shade in the area used to calculate the p-value.

# add code 
prop_over5 <- stats_work %>%
  count(work_hours) %>%
  mutate(p = n / sum(n)) %>%
  filter(work_hours == "At least 5") %>% 
  pull()
visualize(null_dist) +
  shade_p_value(obs_stat = prop_over5, direction = "two-sided")

Note, the odd asymmetry we seem to be getting around 0.7 seems to be a result of the seed we are using.

Calculate the p-value. Then use the p-value to make your conclusion using a significance level of 0.05.

# add code
null_dist %>%
  get_p_value(obs_stat = prop_over5, direction = "two-sided")

## # A tibble: 1 × 1
##   p_value
##     <dbl>
## 1   0.354

Part 2: Test for independence

First let’s, watch the experiment from Mythbusters here [https://www.youtube.com/watch?v=mrr_UjNLbhE].
Let be the treatment group who saw a person yawn, be the control group who did not see anyone yawn,a nd be the proportion of people who yawned.

Exercise 1

We want to use simulation-based inference to assess whether or not yawning and seeing someone yawn are independent.

State the null and alternative hypotheses in words:
Select the appropriate null and alternative hypotheses written in mathematical notation:

Exercise 2

Let’s use the data from the Mythbusters episode and simulation-based inference in R to test this claim. Based on their experiment, do yawning and seeing someone yawn appear to be dependent?

Evaluate this question using a simulation based approach. We will use the same null and alternative hypotheses as before. The data from Mythbusters is available in the yawn data frame.

Fill in the code below to generate the null distribution. Uncomment the code once it is complete.

set.seed(101821)
#null_dist <- yawn %>%
#  specify(response = ____, explanatory = _____, success = "yawn") %>%
#  hypothesize(null = "______") %>%
#  generate(100, type = "permute") %>%
#  calculate(stat = "______", 
#            order = c("trmt", "ctrl"))
null_dist <- yawn %>%
  specify(response = result, explanatory = group, success = "yawn") %>%
  hypothesize(null = "independence") %>%
  generate(100, type = "permute") %>%
  calculate(stat = "diff in props", 
            order = c("trmt", "ctrl"))

Visualize the null distribution and shade in the area used to calculate the p-value.

# add code 
visualize(null_dist) +
  shade_p_value(obs_stat = 0.0441, direction = "greater")

null_dist %>%
  filter(stat >= 0.0441) %>%
  summarise(p_value = n()/nrow(null_dist))

## # A tibble: 1 × 1
##   p_value
##     <dbl>
## 1    0.52

get_p_value(null_dist, 0.0441, "greater")

## # A tibble: 1 × 1
##   p_value
##     <dbl>
## 1    0.52

Calculate p-value. Then use the p-value to make your conclusion using a significance level of 0.1.

# add code

Exercise 3: Confidence interval

Construct a 90% confidence interval for the difference in proportion of yawners between those who see someone else yawn and those who don’t.

#boot_dist <- yawn %>%
  #specify(response = ______, explanatory = ______, success = "yawn") %>%
 # generate(100, type = "bootstrap") %>%
 # calculate(stat = "______", 
 #           order = c("trmt", "ctrl"))

boot_dist <- yawn %>%
  specify(response = result, explanatory = group, success = "yawn") %>%
  generate(100, type = "bootstrap") %>%
  calculate(stat = "diff in props", 
            order = c("trmt", "ctrl"))

Why are we using “bootstrap” instead of “permute” here?

# calculate the lower and upper bounds for the 90% ci
#boot_dist %>%
 # summarise(lower = quantile(stat, ___),
 #          upper = quantile(stat, ___))
boot_dist %>%
  summarise(lower = quantile(stat, 0.05),
           upper = quantile(stat, 0.95))

## # A tibble: 1 × 2
##    lower upper
##    <dbl> <dbl>
## 1 -0.148 0.257

Interpret the interval in the context of the data.
Suppose you use the confidence interval to evaluate the hypotheses in Exercise 2. Is the conclusion drawn from the confidence interval consistent with the conclusion from the the hypothesis test?

Exercise 4

Do you buy the conclusions from this experiment? Why or why not?

AE 15: Hypothesis tests for independence

STA 199

03/01/2022

Learning goals

Different options inside generate

Part 1: Hypothesis test for a single proportion¹

Part 2: Test for independence

Exercise 1

Exercise 2

Exercise 3: Confidence interval

Exercise 4

AE 15: Hypothesis tests for independence

STA 199

03/01/2022

Learning goals

Different options inside generate

Part 1: Hypothesis test for a single proportion1

Part 2: Test for independence

Exercise 1

Exercise 2

Exercise 3: Confidence interval

Exercise 4

Part 1: Hypothesis test for a single proportion¹