6.1 Housing data

One common indicator of housing is “housing cost burden”, which is generally defined by a household spending more than 30% of their income on housing costs (50% is also used as a “severe” cost burden threshold). This measure can be used for both renters and owners. Using PUMS data, we can not only determine whether individual respondents are housing burdened or not (and extrapolate total population estimates using weights), but also quantify the total “amount of housing cost burden”, which can be thought of as a quantity of money that, if somehow cut from the appropriate households’ housing costs, would eliminate housing cost burden in the population. A similar measure, the “amount of housing income gap”, estimates the amount of extra income needed, holding housing costs constant, to also eliminate housing cost burden. Below, we’ll demonstrate how to perform these calculations using PUMS data for the whole Bay Area.

library(tidyverse)
library(tigris)
library(tidycensus)
library(sf)
library(leaflet)
library(mapboxapi)

census_api_key("c8aa67e4086b4b5ce3a8717f59faa9a28f611dab")
mb_access_token("YOUR_TOKEN_HERE", install = T)
readRenviron("~/.Renviron")

bay_county_names <-
  c(
    "Alameda",
    "Contra Costa",
    "Marin",
    "Napa",
    "San Francisco",
    "San Mateo",
    "Santa Clara",
    "Solano",
    "Sonoma"
  )

bay_counties <-
  counties("CA", cb = T, progress_bar = F) %>%
  filter(NAME %in% bay_county_names)

ca_pumas <-
  pumas("CA", cb = T, progress_bar = F)

bay_pumas <-
  ca_pumas %>% 
  st_centroid() %>% 
  .[bay_counties, ] %>% 
  st_set_geometry(NULL) %>% 
  left_join(ca_pumas %>% select(GEOID10)) %>% 
  st_as_sf()

pums_vars_2018 <- 
  pums_variables %>%
  filter(year == 2018, survey == "acs5")

There is an important refinement we haven’t done so far when working with dollar amounts in PUMS 5-yr samples. Since the individual responses can come from any of the 5 years (2014-2018), any dollar calculations should involve adjustments for inflation, so that, say, all values are represented in 2018 dollars. The PUMS data provides ADJHSG and ADJINC for each record, which vary in their values by year of the response, which you can use to adjust housing-related and income-related fields, respectively, to 2018 dollars.

Also, to get to as holistic of a measure of housing costs as possible, we need to factor in many PUMS variables, also considering the different information available for renters and owners. The following variables may constitute housing costs:

RNTP: Monthly rent for renters
MHP: Annual mobile home costs for mobile home residents
MRGP: Monthly first mortgage payment for owners
MRGT: A flag for whether MRGP includes annual property taxes for a given record. If not, then the following field should be added.
TAXAMT: Annual property tax for owners
MRGI: A flag for whether MRGP includes insurance for a given record. If not, then the following field should be added.
INSP: Annual fire/hazard/flood insurance for
SMP: Any other monthly mortgage payments for owners
CONP: Monthly condo (or HOA) fees for condo owners
ELEP: Monthly electricity costs for all households
GASP: Monthly gas costs for all households
FULP: Other annual fuel costs for all households
WATP: Annual water costs for all households

Fortunately, there are pre-aggregated variables as well: GRNTP for all monthly renter housing costs, and SMOCP for all monthly owner housing costs.

ca_pums <- get_pums(
  variables = c(
    "PUMA",
    "GRNTP",
    "SMOCP",
    "ADJHSG",
    "HINCP",
    "ADJINC"
  ),
  state = "CA",
  year = 2018,
  survey = "acs5"
)

bay_pums <-
  ca_pums %>% 
  filter(PUMA %in% bay_pumas$PUMACE10)

Now, let’s compute housing cost burden as a percentage as well as an absolute dollar amount, given a specific burden threshold, for each record:

burden_threshold <- 0.3

bay_burden <-
  bay_pums %>% 
  filter(HINCP > 0) %>%
  filter(SPORDER == 1) %>% 
  transmute(
    PUMA = PUMA,
    weight = WGTP,
    housingcost = ifelse(
      SMOCP > 0,
      SMOCP*12*as.numeric(ADJHSG),
      GRNTP*12*as.numeric(ADJHSG)
    ),
    income = HINCP*as.numeric(ADJINC),
    burden_perc = housingcost/income,
    burden_30 = housingcost - burden_threshold*income,
    incomegap_30 = housingcost/burden_threshold - income
  )

There may be many reasons at this stage to further refine the analysis, given that housing cost burden may be more of a concern for households under a certain overall income, or with other characteristics. You might also be interested in multiple different burden thresholds as well. For our purposes, let’s go ahead and summarize our results for each PUMA:

bay_burden_pumas <-
  bay_burden %>% 
  mutate(
    burdened_30 = ifelse(
      burden_perc >= burden_threshold,
      weight,
      0
    ),
    excess_30 = ifelse(
      burden_30 < 0,
      burden_30,
      0
    ),
    burden_30 = ifelse(
      burden_30 > 0,
      burden_30,
      0
    ),
    incomegap_30 = ifelse(
      incomegap_30 > 0,
      incomegap_30,
      0
    )
  ) %>% 
  group_by(PUMA) %>% 
  summarize(
    burdened_30 = sum(burdened_30),
    households = sum(weight),
    burden_30 = sum(burden_30*weight),
    incomegap_30 = sum(incomegap_30*weight),
    excess_30 = sum(excess_30*weight)
  ) %>% 
  mutate(
    burdened_30_perc = burdened_30/households
  ) %>% 
  left_join(bay_pumas %>% select(PUMA = PUMACE10)) %>% 
  st_as_sf()

sum(bay_burden_pumas$burdened_30)/sum(bay_burden_pumas$households)

## [1] 0.3623124

sum(bay_burden_pumas$burden_30)

## [1] 10989324791

Based on the quick summary statistics above, it appears that, according to the 2014-2018 PUMS data, about a third of households in the Bay Area were paying more than 30% of their income on housing, and the total amount of that housing cost above the 30% threshold, for all of those households combined, was almost $11 billion per year, in 2018 dollars (in other words, federal funding in the form of housing vouchers at this scale would eliminate the “housing affordability problem” in the Bay Area, from a definitional point of view). Here are two maps to visualize these results geospatially:

burden_pal1 <- colorNumeric(
  palette = "Purples",
  domain = bay_burden_pumas$burdened_30_perc
)

bay_burden_pumas %>% 
  leaflet() %>% 
  addMapboxTiles(
    style_id = "streets-v11",
    username = "mapbox"
  ) %>% 
  addPolygons(
    fillColor = ~burden_pal1(burdened_30_perc),
    fillOpacity = 0.5,
    color = "white",
    weight = 0.5,
    label = ~paste0(round(burdened_30_perc*100), "% of households paying 30%+ of income on housing"),
    highlightOptions = highlightOptions(
      weight = 2
    )
  ) %>% 
  addLegend(
    pal = burden_pal1,
    values = ~burdened_30_perc,
    title = "% Cost-burdened<br>households"
  )

burden_pal2 <- colorNumeric(
  palette = "Reds",
  domain = bay_burden_pumas$burden_30/1e6
)

bay_burden_pumas %>% 
  leaflet() %>% 
  addMapboxTiles(
    style_id = "streets-v11",
    username = "mapbox"
  ) %>% 
  addPolygons(
    fillColor = ~burden_pal2(burden_30/1e6),
    fillOpacity = 0.5,
    color = "white",
    weight = 0.5,
    label = ~paste0("$", round(burden_30/1e6), "M total annual cost burden"),
    highlightOptions = highlightOptions(
      weight = 2
    )
  ) %>% 
  addLegend(
    pal = burden_pal2,
    values = ~burden_30/1e6,
    title = "Total housing cost<br>burden, in $ millions"
  )

As previously noted, further refinement is strongly recommended to isolate just those who are truly “vulnerable” in the sense of having low income, or having a certain kind of less stable employment/income, or having certain household characteristics, or having even higher than 30% cost-burden. You’ll be asked to consider some of these refinements in the assignment at the end of this chapter. Otherwise, a big factor that may contribute to high housing costs is the scarcity of housing, which may be affected by local zoning, which we’ll explore next.