Lab #07: Central Limit Theorem Intro

Due Friday March 18 at 11:59 PM

Learning Goals

In this lab you will…

Getting started

Packages

We will use the tidyverse and tidymodels packages in this lab.

library(tidyverse)
library(tidymodels)

Data

Today’s data is a subset of the PanTHERIA dataset1 Jones, Kate E., et al. “PanTHERIA: a species‐level database of life history, ecology, and geography of extant and recently extinct mammals: Ecological Archives E090‐184.” Ecology 90.9 (2009): 2648-2648. on mammalian life history traits.

pantheria <- read_csv("pantheria_subset.csv")

Exercises

Instructions

Exercise 1

To begin, let’s clean the data. Values of -999 should in fact be NA. To convert these to NA, use the code chunk below as a template, replacing the question mark with the appropriate value.

pantheria[pantheria == ?] = NA

Exercise 2

Exercise 3

The goal of this analysis is to use CLT-based inference to understand the distribution of body mass. The idea is that if CLT holds, we can assume the distribution of the sample mean is normal and thus easily generate a normal null distribution to test hypotheses.

Before we use CLT, let’s check to see if the necessary criteria are satisfied. For each condition, indicate whether it is satisfied and provide a brief explanation supporting your response. Be sure to check for both families of interest.

Ex 3 Hint: we only observe each species in a family once. You should search in your favorite browser of choice: “how many species in vespertilionidae family?” and “how many species in soricidae family?”)

Exercise 4

Is the mean adult body mass (abm) of Soricidae significantly greater than 10 g?

State the null and alternative hypothesis. Write your hypotheses in words and mathematical notation.

Exercise 5

Let \(\bar{x}_s\) be the sample mean of Soricidae.

Given the Central Limit Theorem and the hypotheses from the previous exercise,

Ex 5 Hint: Use \sim to create the mathematical tilde. This statements reads: “x bar is normally distributed”

Exercise 6

Compute the p-value associated with our observed statistic (sample mean).

Ex 6 Hint: pnorm finds a left-tailed probability by default, and we are interested in a right-tailed probability.

Exercise 7

Let’s compute the p-value in a slightly different way.

To begin, use R as a calculator to compute a standardized score called a “Z-score”. Save this quantity as Z. The formula to compute Z is below:

\[ Z = \frac{\bar{x} - \mu_0}{SE} \] Here, \(\bar{x}\) is the sample mean, \(\mu_0\) is the mean under the null and \(SE\) is the standard error.

Exercise 8

\[ T = \frac{\bar{x} - \mu_0}{SE_\hat{\sigma}} \]

Where \(SE_{\hat{\sigma}}\) denotes the standard error computed with our observed standard error based on \(\hat{\sigma}\).

How does the T statistic compare to the Z score? Why?

Submission

There should only be one submission per team on Gradescope.

Grading

Component Points
Ex 1 2
Ex 2 10
Ex 3 6
Ex 4 4
Ex 5 5
Ex 6 6
Ex 7 6
Ex 8 5
Workflow & formatting 6