Lab #06: Duke Lemurs and Inference

Learning Goals

In this lab you will…

Merge Conflicts (uh oh)

You may have seen this already through the course of your collaboration in the past few weeks. When two collaborators make changes to a file and push the file to their repository, git merges these two files.

If these two files have conflicting content on the same line, git will produce a merge conflict. Merge conflicts need to be resolved manually, as they require a human intervention:

To resolve the merge conflict, decide if you want to keep only your text, the text on GitHub, or incorporate changes from both texts. Delete the conflict markers <<<<<<<, =======, >>>>>>> and make the changes you want in the final merge.

Assign numbers 1, 2, 3, and so on to each of your team members (if only 3 team members, just number 1 through 3). Go through the following steps in detail, which simulate a merge conflict. Completing this exercise will be part of the lab grade.

Resolving a merge conflict

Step 1: Everyone Clone this repo in the manner you would for a normal lab assignment: https://classroom.github.com/a/ez-X9QRN

Team Member 4 should look at the group’s repo on GitHub.com to ensure that the other members’ files are pushed to GitHub after every step.

Step 2: Team Member 1 Change the team name to your team name. Knit, commit, and push.

Step 3: Member 2 Change the team name to something different (i.e., not your team name). Knit, commit, and push.

You should get an error.

Pull and review the document with the merge conflict. Read the error to your teammates. You can also show them the error by sharing your screen. A merge conflict occurred because you edited the same part of the document as Member 1. Resolve the conflict with whichever name you want to keep, then knit, commit and push again.

Step 4: Member 3 Write some narrative in the space provided. You should get an error.

This time, no merge conflicts should occur, since you edited a different part of the document from Members 1 and 2. Read the error to your teammates. You can also show them the error by sharing your screen.

Click to pull. Then, knit, commit, and push. All merge conflicts should be resolved and all documents updated in the GitHub repo.

You do not need to submit anything on Gradescope for the merge conflict activity, but you need to have actually done it to get points. Please close this project before moving to the main lab.

Getting started

Packages

We will use the tidyverse and tidymodels packages in this lab.

library(tidyverse)
library(tidymodels)

Data

Today’s data comes from the Duke Lemur center. We will examine a subset of the data and specifically focus on the following variables

Click here for more info on the dataset including a codebook of variable names and taxonomic codes.

lemurs = read_csv("lemur_subset.csv")

Exercises

For each exercise:

Hypothesis testing for difference between two groups (i.e. independence).

The idea is that you want to test whether or not one variable affects another. E.g. does lemur taxonomy affect life-span?

Exercise 1

Hypothesis: mongoose lemurs have a greater median life-span than the red-bellied lemurs.

Construct a hypothesis test to investigate the difference in median age of death between the two groups using age_at_death_y.

  • To begin, state the null and alternative hypothesis mathematically and in words.

  • Next, compute the sample statistic (what is the observed difference between the two groups)? Save this quantity as diff_med. Check the codebook to decode taxon names. You can ignore NA observations.

Exercise 2

  • Filter your data frame to contain only the two taxa of lemurs you care about. Save this new data frame as lemurs2. Simulate under the null using the template code below.

Hint: reponse is the dependent variable while explanatory is the independent variable. Think about the prompt above: “does lemur taxonomy affect life-span?”

# null_diff_life = lemurs2 %>%
#   specify(response = ___, explanatory = ___) %>%
#   hypothesize(null = "___") %>%
#   generate(reps = 100, type = "___") %>%
#   calculate(stat = "___", order = c("EMON", "ERUB")) # specifies order

Hint: there are three types of generate.

  • bootstrap: A bootstrap sample will be drawn for each replicate, where a sample of size equal to the input sample size is drawn (with replacement) from the input sample data. Use when you want to resample the data while changing one aspect. e.g. resample data with a different mean

  • permute: For each replicate, each input value will be randomly reassigned (without replacement) to a new output value in the sample. This is a good option for randomizing categorical labels, e.g. if the null assumes group membership does not affect another variable.

  • draw: A value will be sampled from a theoretical distribution with parameters specified in hypothesize() for each replicate. This option is currently only applicable for testing point estimates. This is a good option (although not limited to) simulating coin flips with a specified probability p. e.g. if the null assumes something about a fixed proportion of hte population.

  • Compute the p-value and use \(\alpha = 0.05\) to make a conclusion. Be sure to state your conclusion in context.

Hypothesis testing about a proportion

Exercise 3

According to Duke’s lemur center 75% of breeding occurs during October and November. Since gestation lasts about 4.5 months, one might expect 75% of births to occur in March and April. Do you believe the proportion of births in these two months is significantly different?

As above, setup a hypothesis test to investigate, following each step below.

  • State the null and alternative hypothesis

  • Compute the observed statistic. Hint: first mutate a new variable birth_march_april to be TRUE if birth month is 3 or 4 and FALSE otherwise. Save your mutated variable in lemurs3

Exercise 4 (continuation of 3)

  • Simulate the null distribution, specify the response to be the mutated variable you created in part 3 and success = TRUE.

  • Compute p-value and compare to \(\alpha = 0.05\). Write your conclusion in context.

Bootstrapping confidence interval

Exercise 5

Report the median lifespan of both male and female lemurs (separately) and an associated 90% confidence interval with each estimate.

Wrapping up

Go back through your write up to make sure you followed the coding style guidelines we discussed in class (e.g. no long lines of code).

Submission

There should only be one submission per team on Gradescope.

Grading

Component Points
Ex 1 8
Ex 2 10
Ex 3 6
Ex 4 10
Ex 5 10
Workflow & formatting 2
Merge Conflicts Activity 4