Here is the link for this homework assignment: https://classroom.github.com/a/uedP3saW.
Clone the repository, open a new project in RStudio.
Mercury, is a naturally occurring element that can have toxic effects on the nervous, digestive and immune systems of humans (see WHO for more details).
In local rivers (and other bodies of water), microbes transform mercury into the highly toxic methyl mercury. Fish accumulate methyl mercury (since they are unable to excrete it) in their tissue over the course of their life.
Bass from the Waccamaw and Lumber Rivers were caught, weighed, and measured. In addition, a filet from each fish caught was sent to the lab so that the tissue concentration of mercury could be determined for each fish. Each fish caught corresponds to a single row of the data frame.
A code book is provided below (copied from here).
river
: 0=Lumber, 1=Waccamawstation
that the fish was collected atlength
of the fish in centimetersweight
of the fish in gramsmercury
: concentration of mercury in parts per million (ppm)The data come from Craig Stowe, Nicholas School of the Environment circa 1990s
We will work with the tidyverse
and tidymodels
package. Optionally, you might choose to use viridis
color palettes.
library(tidyverse)
library(tidymodels)
library(viridis)
<- read_csv("mercury.csv") mercury_bass
The Environmental Protection Agency (EPA) recommends children and pregnant/breastfeeding women avoid eating fish with mercury ppm greater than 0.46 ppm due to adverse neuro-developmental effects that result from mercury exposure.
We are concerned that the average mercury level in local bass may be too high for this subset of the population to eat.
Let \(\mu\) be the mean mercury (ppm) found in a local bass fish. State the null and alternative hypothesis in words and mathematical notation.
Next, we want to simulate data under the null hypothesis, i.e. assuming the null hypothesis is true.
Hint: the null is a statement about the true population mean of mercury ppm.
set.seed(2)
# null_dist <- mercury_bass %>%
# specify(response = ____) %>%
# hypothesize(null = "____", __ = ____) %>%
# generate(reps = __, type = "bootstrap") %>%
# calculate(stat = "____")
First, compute the observed statistic from your data.
Next, use the observed statistic to compute
visualize
the null distribution, shade the p-value and add appropriate title and axesAssuming \(\alpha = 0.05\), do you accept or reject the null hypothesis? What does this mean in context?
Calculate a 95% bootstrap confidence interval for the mean mercury (ppm) in North Carolina bass. Use set.seed(4)
.
Does the average mercury content differ significantly between bass caught in Waccamaw and bass caught in Lumber river?
Let \(\mu_W\) be the mean mercury in Waccamaw bass and \(\mu_L\) be the mean mercury in Lumber bass.
Again, set up a null and alternative hypothesis in words and mathematically.
Next, add a variable riverName
to take values “Lumber” and “Waccamaw” instead of 0 and 1 respectively. Save your data frame (you will use this variable in the next part.)
Construct the null distribution to exercise 5 using set.seed(6)
.
Why is it important to specify the null and alternative hypotheses before looking at the data?
Compute the p-value for a different alternative hypothesis in exercise 6 to explain.
Your turn
Come up with 1 additional question you could answer with this data set via hypothesis testing.
Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo. Only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all corresponding pages. Associate the “Overall” section with the first page.