This demo illustrates how to work with the Safegraph weekly patterns data to obtain daily visit information to POIs, as well as movement behavior between census block groups and points of interest. I will use the North Fair Oaks community as an example.

library(tidyverse)
library(plotly)
library(sf)
library(tigris)
library(leaflet)
library(censusapi)
library(mapview)

Loading in Safegraph data

Before we start manipulating the Safegraph data, we first need the census block groups (CBGs) located in North Fair Oaks. In this demo, we will only consider points of interest (POIs) that are within the NFO boundary. The following chunk derives NFO block groups from all CA blockgroups.

#Gets the boundary of NFO
nfo_boundary <- 
  places("CA", cb = T, progress_bar = F) %>% 
  filter(NAME == "North Fair Oaks")

#All block groups in CA
ca_block_groups <-
  block_groups("CA", county = "081", cb = T, progress_bar = F) %>% 
  st_set_crs(st_crs(nfo_boundary))

#Isolates NFO block groups
nfo_block_groups <-
  ca_block_groups %>% 
  st_centroid() %>% 
  .[nfo_boundary,] %>% 
  dplyr::select(GEOID,geometry) %>% 
  st_set_geometry(NULL) %>% 
  left_join(
    ca_block_groups %>% 
      dplyr::select(GEOID,geometry),
    by = "GEOID"
  ) %>%
  st_as_sf() %>% 
  st_set_crs(st_crs(nfo_boundary))
  
ggplot(nfo_block_groups) + geom_sf()

Now we will jump into Safegraph data. For important background information on Safegraph weekly patterns data, please read the bullet titled “Safegraph Weekly Patterns” in the Background section of the Project Doc.

First, we will set some file path variables that we will use later in the analysis.

# SET your path here to the covid19analysis folder.
sg_path <- 'G:/Shared drives/SFBI-Restricted/Safegraph/covid19analysis/'

#This is the location of the Core POI data. Safegraph releases new core_poi files each month.
poi_url <- paste0(sg_path,'core/2020/09/Core-USA-Sep-CORE_POI-2020_08-2020-09-08/core_poi-ca.rds')

Now set file paths for the weekly patterns files. I am only reading in data in the time window of August 3 through August 30, 2020, but we have access to weekly patterns data all the way back to the beginning of 2019. Note that the naming of the file indicates which day the data starts on, and includes data for all the days in that week. Also note that we are reading a “home-panel-summary” file, which is explained in the background section of the project doc.

#August 03 2020
patterns_W200803 <- 
  paste0(sg_path,'weekly-patterns/v2/main-file/2020-08-03-weekly-patterns-ca.rds')

hps_W200803 <- 
  paste0(sg_path,'weekly-patterns/v2/home-summary-file/2020-08-03-home-panel-summary.rds')

#August 10 2020
patterns_W200810 <- 
  paste0(sg_path,'weekly-patterns/v2/main-file/2020-08-10-weekly-patterns-ca.rds')

hps_W200810 <- 
  paste0(sg_path,'weekly-patterns/v2/home-summary-file/2020-08-10-home-panel-summary.rds')


#August 17 2020
patterns_W200817 <- 
  paste0(sg_path,'weekly-patterns/v2/main-file/2020-08-17-weekly-patterns-ca.rds')

hps_W200817 <- 
  paste0(sg_path,'weekly-patterns/v2/home-summary-file/2020-08-17-home-panel-summary.rds')

#August 24 2020
patterns_W200824 <- 
  paste0(sg_path,'weekly-patterns/v2/main-file/2020-08-24-weekly-patterns-ca.rds')

hps_W200824 <- 
  paste0(sg_path,'weekly-patterns/v2/home-summary-file/2020-08-24-home-panel-summary.rds')

We will now read in the file that includes the normalization function. This file was pre-created to normalize the visits in the safegraph data to account for each block group’s population. This .R file is in a separate working directory, so adjust the file path below as necessary to be able to point to the correct location on your machine. Note that you might not be able to run this source() command from the Source window (something odd about it being within a chunk), but you can run it by copying it to the Console.

setwd("~/GitHub/covid19/safegraph_processing")

source('safegraph_process_patterns_functions.R')

#set working directory back to your original location

Daily POI Visits

The following chunk reads and processes the weekly patterns file to obtain daily POI visits.

poi_ca <- readRDS(poi_url) #read core_poi file

#Function that reads in each patterns file, and filters/cleans the dataset
process_patterns <- function(patterns){
  
#Load the SafeGraph patterns dataset.
sg_date <- readRDS(patterns)

#Subset to POIs in North Fair Oaks
sg_nfo_businesses <- 
  sg_date %>% 
  filter(city == "North Fair Oaks") %>% 
  left_join(
    poi_ca %>% 
      select(
        safegraph_place_id,
        latitude,
        longitude,
        top_category,
        sub_category
      ),
    by = "safegraph_place_id"
  ) 

return(sg_nfo_businesses)
}

We define a “get_businesses” function that calls a different function (process_patterns_daily) from the source file to explode the daily visits field.

#function to get daily visits to POIs
get_businesses <- function(patterns_input, home_panel_summary){
  
  hps_date <- readRDS(home_panel_summary)
  
  daily_visits <- process_patterns_daily(patterns_input,hps_date) %>%
    mutate(
      visit_counts_high = round(visit_counts_high, digits = 2),
      visit_counts_low = round(visit_counts_low, digits = 2)
    ) %>% 
    left_join(
      patterns_input %>% 
        select(safegraph_place_id,location_name,top_category), 
        by = "safegraph_place_id"
    )
  
  return(daily_visits)
}

Here we process the patterns, read the home panel summary file, and obtain the daily POI visits.

#Processes patterns for August 2020
patterns_0803 <- process_patterns(patterns_W200803)
patterns_0810 <- process_patterns(patterns_W200810)
patterns_0817 <- process_patterns(patterns_W200817)
patterns_0824 <- process_patterns(patterns_W200824)

#Read home panel summary file
hps_0803 <- readRDS(hps_W200803)
hps_0810 <- readRDS(hps_W200810)
hps_0817 <- readRDS(hps_W200817)
hps_0824 <- readRDS(hps_W200824)


#Processes businesses to get daily visits to POIs
businesses_0803 <- get_businesses(patterns_0803,hps_W200803)
businesses_0810 <- get_businesses(patterns_0810,hps_W200810)
businesses_0817 <- get_businesses(patterns_0817,hps_W200817)
businesses_0824 <- get_businesses(patterns_0824,hps_W200824)


cumulative_businesses <-
  businesses_0803 %>% 
  rbind(businesses_0810) %>% 
  rbind(businesses_0817) %>% 
  rbind(businesses_0824) %>% 
  mutate(
    mean_visits = (visit_counts_high + visit_counts_low) / 2
  )

At this point, we have daily visits for all POIs in NFO between 8/3 and 8/30. As an example, say I want to see the total number of visits to restaurants in North Fair Oaks during that time window. The following chunk groups and sums the restaurant data.

restaurant_total_visits <-
  cumulative_businesses %>% 
  filter(top_category == "Restaurants and Other Eating Places") %>% 
  group_by(safegraph_place_id,location_name) %>% 
  summarise(total_visits = round(sum(mean_visits)))

head(restaurant_total_visits[1:10,])
## # A tibble: 6 x 3
## # Groups:   safegraph_place_id [6]
##   safegraph_place_id             location_name                      total_visits
##   <chr>                          <chr>                                     <dbl>
## 1 sg:019fe587271b42cdbc47e3d8f5~ Cuco's Burritos                            3014
## 2 sg:05159caa21384eb08637511710~ Taco's Jalisco                             3601
## 3 sg:0ab11e2923f84f71bbf1cffc1f~ Zipotes Restaurant                         1104
## 4 sg:0e0e373f192a48c287a48442cf~ Connoisseur Coffee                         1475
## 5 sg:0e22c8c2556941b88737c2aac9~ El Guanaco Mexican & Salvadorean ~         2689
## 6 sg:0fbd763215914bcb9e5612967e~ La Fuga Taqueria                           4161

POI Visitor Origins

You can also find CBG origins of POI visitors. See in-line comments. The following chunk extrapolates the origin-destination (OD) visit pairs. See the in-line comments for more information.

#By using the normBG function from the safegraph_process_patterns_functions.R, we can get the origins of visitors (at the CBG level). 
#There is a function within this file that allows you to obtain daily origin visit information, but for the purposes of this analysis
#We will use weekly visit from origins, and later take the average across four weeks.
origins_0803 <- normBG(patterns_0803,hps_0803)
origins_0810 <- normBG(patterns_0810,hps_0810)
origins_0817 <- normBG(patterns_0817,hps_0817)
origins_0824 <- normBG(patterns_0824,hps_0824)

#Combine all of the origin/destination data for August
cumul_origins <-
  origins_0803 %>% 
  rbind(origins_0810) %>% 
  rbind(origins_0817) %>% 
  rbind(origins_0824) 

#At this point, some origin_census_block_group fields are NA. As explained in the background section of the project doc, this is due to there only being 1 visitor from a certain CBG that is not recorded.
#Here, we filter to only CBGs in NFO
#Next, we have a lower and upper estimation of visit counts. To get one visit value, we simply take the average of these two numbers.
#We also omit the rows that have NA as a origin census block group
#Finally, we group by origin/destination, and take the average 
cumul_origins_grouped <-
  cumul_origins %>% 
  filter(origin_census_block_group %in% nfo_block_groups$GEOID) %>% 
  mutate(
    origin_visitor_counts = (origin_visitor_counts_high + origin_visitor_counts_low) / 2
  ) %>% 
  select(
    safegraph_place_id,
    origin_census_block_group,
    location_name,
    origin_visitor_counts
  ) %>% 
  na.omit() %>% 
  group_by(
    safegraph_place_id,
    origin_census_block_group,
    location_name
  ) %>% 
  summarise(
    mean_origin_visitor_counts = mean(origin_visitor_counts)
  )

head(cumul_origins_grouped[1:10,])
## # A tibble: 6 x 4
## # Groups:   safegraph_place_id, origin_census_block_group [6]
##   safegraph_place_id    origin_census_bloc~ location_name     mean_origin_visit~
##   <chr>                 <chr>               <chr>                          <dbl>
## 1 sg:019fe587271b42cdb~ 060816105002        Cuco's Burritos                119. 
## 2 sg:019fe587271b42cdb~ 060816106011        Cuco's Burritos                190. 
## 3 sg:019fe587271b42cdb~ 060816106023        Cuco's Burritos                 88.7
## 4 sg:05159caa21384eb08~ 060816106011        Taco's Jalisco                 211. 
## 5 sg:05159caa21384eb08~ 060816106023        Taco's Jalisco                  94.7
## 6 sg:05488421852f4c099~ 060816105004        Tosetti Institut~              140.