Exploration of SafeGraph Mobility Data and Wastewater COVID-19 Detection

          This analysis explored Neighborhood Patterns (NH Patterns) data from SafeGraph, which uses anonymized, opt-in cell-phone location data from ~50M devices. Additionally, this analysis explored data of COVID-19 detection in wastewater sampled from the San Jose Wastewater Plant Tributary Area (SJ PTA), which was provided by the Boehm Lab at Stanford.
          The motivation for this analysis is to better understand the nature of COVID-19 spread through the evaluation of related datasets. Mobility data have been considered useful in contributing to a more comprehensive understanding of COVID-19 beyond fundamental COVID-19 testing data. The new Neighborhood Patterns dataset can yield insight into foot traffic at the level of census block groups. Similarly, the detection of COVID-19 in wastewater yields data that could hypothetically capture the “ground truth” of COVID-19 spread. However, wastewater data has not yet had significant evaluation. This analysis seeks to understand how these two sets of data relate to each other, and to COVID-19 case data.

The first step in this analysis was to filter the NH Patterns datasets to census block groups in the SJ PTA (See Appendix A: Safegraph Analysis).

In the background, this analysis processed NH Patterns data to daily visit counts, aggregated for the SJ PTA for the chosen timeframe.

Then, we accounted for the fraction of census block groups not in the SJ PTA (See Appendix B: Spatial Subset).

For the map below, Average Total Visit and Nonresident Visit counts were calculated for the fraction of SJ PTA in a given Census Block Group over 122 days.


SJ PTA in Blue, Census Block Groups Outlined in Black

November 15 was chosen as the start date in this analysis since the wastewater data becomes continuous following that date.

          The following graph includes total visits to census block groups in the SJ PTA. The next graph comprises visits from devices whose origin was outside the census block group they were detected in (See Appendix C: Nonresident Visits). This analysis converts all data to 7-day moving averages and takes the percent change of these moving averages per day.

          The variations seen in the total visits to the SJ PTA, and the Wastewater plot are dissimilar. The nonresident visits plot is even more dissimilar than the total visits plot, suggesting that nonresident visits data from SafeGraph is not a clear indicator of COVID-19 prevalence in wastewater. However, this analysis could be refined and more specific. For example, further analyses could examine individual census block groups for trends, and/or examine other data such as percent positivity rates in COVID-19 tests per census block group.
          Another source of data this analysis investigated is the percent leaving home output derived from SafeGraph Social Distancing data. This is a set of data generated to approximate device “home” areas per census block group. From this data set, the percent of the population leaving “home” in a given census block group was calculated, and then aggregated for the SJ PTA and plotted below.

          The variation in the percent of those leaving home is magnitudes smaller than the COVID-19 in wastewater data. Detection of COVID-19 in wastewater steadily increased from the start of the analysis time frame (Nov. 15). While there are uncertainties in those values (e.g., sampling errors), the overall trend climbs upwards, while the other variables in this analysis (Neighborhood Visits, Social Distancing data) do not.

Exploration of New COVID-19 Cases in Santa Clara County and Wastewater COVID-19 Detection

          This analysis also took a look at new COVID-19 cases in Santa Clara County (SCC) to compare it to the detection of COVID-19 in wastewater for the SJ PTA. The timeframe for this analysis was extended from Nov. 15 to Feb. 7 for a 12-week analysis.

          At first glance, the difference between these graphs appears minimal. In order to understand how the COVID-19 detection in Wastewater data differs from the new COVID-19 case count, I plotted the difference between these two graphs of percent changes per day (See Appendix D, Percent Difference).

          This plot shows the difference between the percent changes for the moving averages of both wastewater and case data over 12 weeks (November 15 - February 7).
          There could be various potential explanations for the variation in percent differences, albeit these percent differences are quite small. There could be uncertainty due to changes in testing frequency during the holiday season, for example. Examining data such as COVID-19 test positivity rates & testing volume could shed insight into these variations.

Appendix

Methods:

A: Safegraph Analysis

          Functions developed by the Stanford CEE 218 teaching team normalize Safegraph device counts from an origin census block group (CBG) to full-population estimates using the ratio of the number of devices detected as residing in a CBG to population counts as per Census data (from American Communities Survey data).


B: Spatial subset of census block groups and the San Jose Plant Tributary Area

          This analysis yields the fraction of a given census block group that overlaps a portion of the SJ PTA, and extends beyond its boundaries. The visit counts for that census block group were multiplied by this fraction, to yield an approximate visit count number for the SJ PTA rather than the whole census block group that extends beyond it.


C: Nonresident Visits

          The functions used to calculate visits data also calculate visits of devices whose origin census block group differs from their destination census block group.


D: Percent Difference

          This column was calculated as follows (in dataframe sj_pta_all_var_perc_change): COVID-19 Detection in Wastewater, 7-day moving average (% change per day) - New COVID-19 Cases in SCC (% change per day).