Spatial Data and Visualization

Main Ideas

Spatial data is important
- exploratory data analysis
- detecting spatial patterns and trends
- understanding spatial data relationships
- analysis of spatial data should reflect spatial structure

Coming Up

HW 1 is due tomorrow (Friday).
HW 2 goes out today.
Lab 3 is due on Friday.

Hot Keys

Task / function	Windows & Linux	macOS
Insert R chunk	Ctrl+Alt+I	Command+Option+I
Knit document	Ctrl+Shift+K	Command+Shift+K
Run current line	Ctrl+Enter	Command+Enter
Run current chunk	Ctrl+Shift+Enter	Command+Shift+Enter
Run all chunks above	Ctrl+Alt+P	Command+Option+P
`<-`	Alt + -	Option + -
`%>%`	Ctrl+Shift+M	Command+Shift+M

Lecture Notes and Exercises

library(tidyverse)

## Warning in system("timedatectl", intern = TRUE): running command 'timedatectl'
## had status 1

library(sf)

Spatial data is different.*

Our typical “tidy” dataframe.

mpg

A new simple feature object.

nc <- st_read("nc_regvoters.shp", quiet = TRUE)
nc

Question: What differences do you observe when comparing a typical tidy data frame to the new simple feature object?

Simple features

A simple feature is a standard, formal way to describe how real-world spatial objects (country, building, tree, road, etc) can be represented by a computer.

The package sf implements simple features and other spatial functionality using tidy principles. Simple features have a geometry type. Common choices are shown in the slides associated with today’s lecture.

Simple features are stored in a data frame, with the geographic information in a column called geometry. Simple features can contain both spatial and non-spatial data.

All functions in the sf package helpfully begin st_.

`sf` and `ggplot`

To read simple features from a file or database use the function st_read().

nc <- st_read("nc_regvoters.shp", quiet = TRUE)

Notice nc contains both spatial and nonspatial information.

We can build up a visualization layer-by-layer beginning with ggplot. Let’s start by making a basic plot of North Carolina counties.

ggplot(nc) +
  geom_sf() +
  labs(title = "North Carolina counties")

Now adjust the theme with theme_bw().

ggplot(nc) +
  geom_sf() +
  labs(title = "North Carolina counties with theme") + 
  theme_bw()

Now adjust color in geom_sf to change the color of the county borders.

ggplot(nc) +
  geom_sf(color = "darkgreen") +
  labs(title = "North Carolina counties with theme and aesthetics") + 
  theme_bw()

Then increase the width of the county borders using size.

ggplot(nc) +
  geom_sf(color = "darkgreen", size = 1.5) +
  labs(title = "North Carolina counties with theme and aesthetics") +
  theme_bw()

Fill the counties by specifying a fill argument.

ggplot(nc) +
  geom_sf(color = "darkgreen", size = 1.5, fill = "orange") +
  labs(title = "North Carolina counties with theme and aesthetics") +
  theme_bw()

Finally, adjust the transparency using alpha.

ggplot(nc) +
  geom_sf(color = "darkgreen", size = 1.5, fill = "orange", alpha = 0.50) +
  labs(title = "North Carolina counties with theme and aesthetics") +
  theme_bw()

Our current map is a bit much. Adjust color, size, fill, and alpha until you have a map that effectively displays the counties of North Carolina.

North Carolina Registered Voters

The nc data was obtained from the NC Board of Elections website and contains statistics on NC registered voters as of September 4, 2021.

The dataset contains the following variables on all North Carolina counties, categories provided by the NCSBE:

county: county name
dem: total number of voters who are registered Democrats
gop: total number of voters who are registered Republicans
lib: total number of voters who are registered Libertarians
unaf: total number of voters who are unaffiliated
white: total number of voters who are white
black: total number of voters who are Black
ntv_a: total number of voters who are Native American
ntv_h: total number of voters who are Native Hawaiian
other: total number of voters who are classified as “other” for race
hispanic: total number of voters who are Hispanic
male: total number of voters who identify as male
female: total number of voters who identify as female
- Please note- these are the only options given by the NCBSE, but male + female do not add up to total since some voters either decide not to disclose or have a different gender identity than these options.
total: total number of registered voters in that county
geometry: geographic coordinates of the county

Let’s use the NCBSE data to generate a choropleth map of the number of registered voters by county.

ggplot(nc) +
  geom_sf(aes(fill = total)) + 
  labs(title = "Number of Registered Voters by County",
       fill = "# voters") + 
  theme_bw()

It is sometimes helpful to pick diverging colors, colorbrewer2 can help.

One way to set fill colors is with scale_fill_gradient().

ggplot(nc) +
  geom_sf(aes(fill = total)) +
  scale_fill_gradient(low = "#fee8c8", high = "#7f0000") +
  labs(title = "The Triangle and Charlotte have the Most Voters",
       fill = "# cases") + 
  theme_bw()

Challenges

Different types of data exist (raster and vector).
The coordinate reference system (CRS) matters.
Manipulating spatial data objects is similar, but not identical to manipulating data frames.

`dplyr`

The sf package plays nicely with our earlier data wrangling functions from dplyr.

`select()`

Maybe you are interested in the partisan breakdown of a county.

nc %>% 
  select(county, dem, gop, total)

`mutate()`

Maybe you are interested in the percentage of registered Democrats in a county.

nc %>% 
  mutate(pct_dem = dem/total)

`filter()`

You could filter for the percentage of Dems being over 50% (a majority).

nc %>% 
  mutate(pct_dem = dem/total) %>%
  filter(pct_dem > 0.5)

`summarize()`

We can also calculate summary statistics for our new variable.

nc %>% 
  mutate(pct_dem = dem/total) %>%
  summarize(mean_pct_dem = mean(pct_dem),
            min_pct_dem = min(pct_dem),
            max_pct_dem = max(pct_dem))

Geometries are “sticky”. They are kept until deliberately dropped using str_drop_geometry.

nc %>% 
  select(county, total) %>% 
  st_drop_geometry()

Practice

Construct an effective visualization investigating the percentage of all voters in NC that are Native American. Use #f7fbff as “low” on the color gradient and #08306b as “high”. Which county has the highest percentage of Native American voters? (You might want to use Google here.)
Write a brief research question that you could answer with this dataset and then investigate it here.
What are limitations of your visualizations above?

Spatial Data and Visualization

STA199

1-27-2022

Main Ideas

Coming Up

Hot Keys

Lecture Notes and Exercises

Simple features

`sf` and `ggplot`

North Carolina Registered Voters

Challenges

`dplyr`

`select()`

`mutate()`

`filter()`

`summarize()`

Practice

Additional Resources

Spatial Data and Visualization

STA199

1-27-2022

Main Ideas

Coming Up

Hot Keys

Lecture Notes and Exercises

Simple features

sf and ggplot

North Carolina Registered Voters

Challenges

dplyr

select()

mutate()

filter()

summarize()

Practice

Additional Resources

`sf` and `ggplot`

`dplyr`

`select()`

`mutate()`

`filter()`

`summarize()`