In this lab you will…
You may have seen this already through the course of your collaboration in the past few weeks. When two collaborators make changes to a file and push the file to their repository, git merges these two files.
If these two files have conflicting content on the same line, git will produce a merge conflict. Merge conflicts need to be resolved manually, as they require a human intervention:
To resolve the merge conflict, decide if you want to keep only your text, the text on GitHub, or incorporate changes from both texts. Delete the conflict markers <<<<<<<
, =======
, >>>>>>>
and make the changes you want in the final merge.
Assign numbers 1, 2, 3, and so on to each of your team members (if only 3 team members, just number 1 through 3). Go through the following steps in detail, which simulate a merge conflict. Completing this exercise will be part of the lab grade.
Step 1: Everyone Clone this repo in the manner you would for a normal lab assignment: https://classroom.github.com/a/ez-X9QRN
Team Member 4 should look at the group’s repo on GitHub.com to ensure that the other members’ files are pushed to GitHub after every step.
Step 2: Team Member 1 Change the team name to your team name. Knit, commit, and push.
Step 3: Member 2 Change the team name to something different (i.e., not your team name). Knit, commit, and push.
You should get an error.
Pull and review the document with the merge conflict. Read the error to your teammates. You can also show them the error by sharing your screen. A merge conflict occurred because you edited the same part of the document as Member 1. Resolve the conflict with whichever name you want to keep, then knit, commit and push again.
Step 4: Member 3 Write some narrative in the space provided. You should get an error.
This time, no merge conflicts should occur, since you edited a different part of the document from Members 1 and 2. Read the error to your teammates. You can also show them the error by sharing your screen.
Click to pull. Then, knit, commit, and push. All merge conflicts should be resolved and all documents updated in the GitHub repo.
You do not need to submit anything on Gradescope for the merge conflict activity, but you need to have actually done it to get points. Please close this project before moving to the main lab.
You can find the repo for this lab here: https://classroom.github.com/a/0LKdGB9_.
Each person on the team should clone the repository and open a new project in RStudio. Do not make any changes to the .Rmd file until the instructions tell you do to so.
We will use the tidyverse and tidymodels packages in this lab.
library(tidyverse)
library(tidymodels)
Today’s data comes from the Duke Lemur center. We will examine a subset of the data and specifically focus on the following variables
taxon
: the specific lemur taxonage_at_death_y
: age of lemur at deathbirth_month
: month the lemur was bornsex
: whether the lemur is male or femaleClick here for more info on the dataset including a codebook of variable names and taxonomic codes.
= read_csv("lemur_subset.csv") lemurs
For each exercise:
show all relevant code and output used to obtain your response.
Write all narrative in complete sentences, and use clear axis labels and titles on visualizations.
Use a small number of reps (about 100) as you write and test out your code. Once you have finalized all of your code, increase the number of reps to 10,000 to produce your final results.
For each simulation exercise, use the seed specified in the exercise instructions.
The idea is that you want to test whether or not one variable affects another. E.g. does lemur taxonomy affect life-span?
Hypothesis: mongoose lemurs have a greater median life-span than the red-bellied lemurs.
Construct a hypothesis test to investigate the difference in median age of death between the two groups using age_at_death_y
.
To begin, state the null and alternative hypothesis mathematically and in words.
Next, compute the sample statistic (what is the observed difference between the two groups)? Save this quantity as diff_med
. Check the codebook to decode taxon names. You can ignore NA observations.
lemurs2
. Simulate under the null using the template code below.Hint: reponse
is the dependent variable while explanatory is the independent variable. Think about the prompt above: “does lemur taxonomy affect life-span?”
# null_diff_life = lemurs2 %>%
# specify(response = ___, explanatory = ___) %>%
# hypothesize(null = "___") %>%
# generate(reps = 100, type = "___") %>%
# calculate(stat = "___", order = c("EMON", "ERUB")) # specifies order
Hint: there are three types of generate
.
bootstrap: A bootstrap sample will be drawn for each replicate, where a sample of size equal to the input sample size is drawn (with replacement) from the input sample data. Use when you want to resample the data while changing one aspect. e.g. resample data with a different mean
permute: For each replicate, each input value will be randomly reassigned (without replacement) to a new output value in the sample. This is a good option for randomizing categorical labels, e.g. if the null assumes group membership does not affect another variable.
draw: A value will be sampled from a theoretical distribution with parameters specified in hypothesize() for each replicate. This option is currently only applicable for testing point estimates. This is a good option (although not limited to) simulating coin flips with a specified probability p
. e.g. if the null assumes something about a fixed proportion of hte population.
Compute the p-value and use \(\alpha = 0.05\) to make a conclusion. Be sure to state your conclusion in context.
According to Duke’s lemur center 75% of breeding occurs during October and November. Since gestation lasts about 4.5 months, one might expect 75% of births to occur in March and April. Do you believe the proportion of births in these two months is significantly different?
As above, setup a hypothesis test to investigate, following each step below.
State the null and alternative hypothesis
Compute the observed statistic. Hint: first mutate a new variable birth_march_april
to be TRUE
if birth month is 3 or 4 and FALSE
otherwise. Save your mutated variable in lemurs3
Simulate the null distribution, specify the response
to be the mutated variable you created in part 3 and success = TRUE
.
Compute p-value and compare to \(\alpha = 0.05\). Write your conclusion in context.
Report the median lifespan of both male and female lemurs (separately) and an associated 90% confidence interval with each estimate.
Do the confidence intervals overlap?
Does this support or dispute the notion that female lemurs live longer?
Go back through your write up to make sure you followed the coding style guidelines we discussed in class (e.g. no long lines of code).
There should only be one submission per team on Gradescope.
Component | Points |
---|---|
Ex 1 | 8 |
Ex 2 | 10 |
Ex 3 | 6 |
Ex 4 | 10 |
Ex 5 | 10 |
Workflow & formatting | 2 |
Merge Conflicts Activity | 4 |