2.5 Migration

By way of demonstrating more data wrangling with ACS, let’s consider another classic question in urban systems: To what degree has the increase in income in a place like the Bay Area been a result of true economic mobility, and to what degree has it been the result of higher-income newcomers displacing lower-income residents?

Two relevant datasets from the ACS are B07010, “GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY INDIVIDUAL INCOME IN THE PAST 12 MONTHS FOR CURRENT RESIDENCE IN THE UNITED STATES”, and B07410, “GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY INDIVIDUAL INCOME IN THE PAST 12 MONTHS FOR RESIDENCE 1 YEAR AGO IN THE UNITED STATES”. The difference is that B07010 counts the current population in the given year, a combination of “people who’ve remained” and “people who’ve immigrated in”, while B07410 counts “people who’ve remained” and “people who were here a year ago but emigrated somewhere else”. From these two datasets, you can get “inflow” and “outflow”, which make “external net flow”. You’ll also find that subtracting total population last year (obtained by grabbing B07010 from the previous year) from total population this year does not equal “external net flow”, so whatever’s left is “internal net flow”, a combination of births and deaths (there may also be untracked migration abroad, which we assume to be negligible compared to the other counts). By the way, these datasets also provide breakdowns by income tiers for every bucket, so “external net flow” in and out of each income tier can be captured, as well as some amount of internal economic mobility across income tiers.

Let’s first try loading in B07010 for 2019. We’ll focus our analysis on San Mateo County, FIPS code 081, because the dataset only clearly delineates movement in and out of the county level.

library(tidyverse)
library(censusapi)

Sys.setenv(CENSUS_KEY="c8aa67e4086b4b5ce3a8717f59faa9a28f611dab")

acs_vars_2019_1yr <-
  listCensusMetadata(
    name = "2019/acs/acs1",
    type = "variables"
  )

smc_mobility_current_19 <- 
  getCensus(
    name = "acs/acs1",
    vintage = 2019,
    region = "county:081",
    regionin = "state:06",
    vars = c("group(B07010)")
  ) %>% 
  select(!c(GEO_ID,state,NAME) & !ends_with(c("EA","MA","M"))) %>%
  pivot_longer(
    ends_with("E"),
    names_to = "variable",
    values_to = "estimate"
  ) %>%
  left_join(
    acs_vars_2019_1yr %>% 
      select(name, label), 
    by = c("variable" = "name")
  ) %>% 
  select(-variable)

For any new ACS dataset loaded using censusapi, you’ll want to check the label field (smc_mobility_current_19$label in Console) to understand the nested structure of the data. Remember, if you aren’t careful with how you work with this data, you could easily double-count. This step is always a kind of puzzle that you want to figure out how to solve in the simplest way possible. The following is a solution to this particular puzzle, based on an understanding of which rows are duplicative, and what categorization we ultimately want, where “Same house 1 year ago:” and “Moved within the same county:” are considered “Here since last year”, while “Moved from different county within same state:”, “Moved from different state:”, and “Moved from abroad:” are considered “Inflow”.

smc_mobility_current_19 <- 
  getCensus(
    name = "acs/acs1",
    vintage = 2019,
    region = "county:081",
    regionin = "state:06",
    vars = c("group(B07010)")
  ) %>% 
  select(!c(GEO_ID,state,NAME) & !ends_with(c("EA","MA","M"))) %>%
  pivot_longer(
    ends_with("E"),
    names_to = "variable",
    values_to = "estimate"
  ) %>%
  left_join(
    acs_vars_2019_1yr %>% 
      select(name, label), 
    by = c("variable" = "name")
  ) %>% 
  select(-variable) %>% 
  separate(
    label,
    into = c(NA,NA,"mobility","temp","income"),
    sep = "!!"
  ) %>% 
  mutate(
    income = ifelse(
      temp == "No income",
      temp,
      income
    ),
    mobility = ifelse(
      mobility %in% c("Same house 1 year ago:", "Moved within same county:"),
      "Here since last year",
      "Inflow"
    )
  ) %>% 
  filter(!is.na(income)) %>% 
  group_by(mobility, income) %>% 
  summarize(estimate = sum(estimate))

The resultant tidy dataframe has counts of individuals (which by the way, is the universe of people over the age of 15) by income tier and by “Here since last year” or “Inflow”.

Next, we repeat these two steps for B07410 in 2019:

smc_mobility_lastyear_19 <- 
  getCensus(
    name = "acs/acs1",
    vintage = 2019,
    region = "county:081",
    regionin = "state:06",
    vars = c("group(B07410)")
  ) %>% 
  select(!c(GEO_ID,state,NAME) & !ends_with(c("EA","MA","M"))) %>%
  pivot_longer(
    ends_with("E"),
    names_to = "variable",
    values_to = "estimate"
  ) %>%
  left_join(
    acs_vars_2019_1yr %>% 
      select(name, label), 
    by = c("variable" = "name")
  ) %>% 
  select(-variable)

You’ll notice there are fewer labels overall. Since people who move abroad between 2018 and 2019 wouldn’t take an ACS survey in 2019, they are “lost”.

smc_mobility_lastyear_19 <- 
  getCensus(
    name = "acs/acs1",
    vintage = 2019,
    region = "county:081",
    regionin = "state:06",
    vars = c("group(B07410)")
  ) %>% 
  select(!c(GEO_ID,state,NAME) & !ends_with(c("EA","MA","M"))) %>%
  pivot_longer(
    ends_with("E"),
    names_to = "variable",
    values_to = "estimate"
  ) %>%
  left_join(
    acs_vars_2019_1yr %>% 
      select(name, label), 
    by = c("variable" = "name")
  ) %>% 
  select(-variable) %>% 
  separate(
    label,
    into = c(NA,NA,"mobility","temp","income"),
    sep = "!!"
  ) %>% 
  mutate(
    income = ifelse(
      temp == "No income",
      temp,
      income
    ),
    mobility = ifelse(
      mobility %in% c("Same house:", "Moved within same county:"),
      "Here since last year",
      "Outflow"
    )
  ) %>% 
  filter(!is.na(income)) %>% 
  group_by(mobility, income) %>% 
  summarize(estimate = sum(estimate))

By the way, at this point you can compare the two dataframes and verify that “Here since last year” is the same between the two, which you would expect.

Finally, we need “total population” as recorded in 2018, which we could potentially get from other sources, but we’ll just use B07010, and summarize the information such that we end up with just total population by income tiers.

smc_mobility_current_18 <- 
  getCensus(
    name = "acs/acs1",
    vintage = 2018,
    region = "county:081",
    regionin = "state:06",
    vars = c("group(B07010)")
  ) %>% 
  select(!c(GEO_ID,state,NAME) & !ends_with(c("EA","MA","M"))) %>%
  pivot_longer(
    ends_with("E"),
    names_to = "variable",
    values_to = "estimate"
  ) %>%
  left_join(
    acs_vars_2019_1yr %>% 
      select(name, label), 
    by = c("variable" = "name")
  ) %>% 
  select(-variable) %>% 
  separate(
    label,
    into = c(NA,NA,"mobility","temp","income"),
    sep = "!!"
  ) %>% 
  mutate(
    income = ifelse(
      temp == "No income",
      temp,
      income
    ),
    mobility = "Here last year"
  ) %>% 
  filter(!is.na(income)) %>% 
  group_by(mobility, income) %>% 
  summarize(estimate = sum(estimate))

Now, we can bind these dataframes together in a specific way so that we only hold on to counts of “Here last year”, “Here this year”, “Outflow”, and “Inflow”. We’ll also pivot the table to wide format so we can mutate() some new fields for internal and external net flow.

smc_flows_19 <-
  rbind(
    smc_mobility_current_18,
    smc_mobility_lastyear_19 %>% 
      filter(mobility == "Outflow"),
    smc_mobility_current_19 %>% 
      filter(mobility == "Inflow"),
    smc_mobility_current_19 %>% 
      group_by(income) %>% 
      summarize(estimate = sum(estimate)) %>% 
      mutate(mobility = "Here this year")
  ) %>% 
  pivot_wider(
    names_from = mobility,
    values_from = estimate
  ) %>% 
  mutate(
    `External net` = Inflow - Outflow,
    `Internal net` = `Here this year` - `Here last year` - `External net`,
  ) %>% 
  select(
    `Income tier` = income, 
    `Internal net`,
    `External net`,
    `Here last year`, 
    `Here this year`, 
    Outflow, 
    Inflow
  )

smc_flows_19

I’ll leave you to reflect on this result, and what it suggests about the answer to the initial question posed in this section.