6.4 Parcel and building geometries

As we’ve seen, parcel-level data can include certain kinds of building information, but it doesn’t include the geometry of the building footprint itself. We’d find actual building geometries useful for certain kinds of analysis, such as estimating available rooftop area for solar panels, or estimating buildable area in our urban landscapes, i.e. the space between parcel boundaries and existing building boundaries.

In certain leaflet base maps we’ve worked with, we’ve seen building footprints rendered, and we’re porbably most used to seeing this in Google Maps, so it would seem to be the case that we can find building footprints publicly available online. We recommend two sources.

First, OpenStreetMap is a Wikipedia-style crowdsourcing platform for geometries generally, and a German organization called Geofabrik makes many components of OSM available for easy download. After clicking a few levels down, you can find a NorCal page with a 700MB file for download with these attributes. It is worth it to personally download this zip file and explore the various polygons available. For our purposes, we’ll load the building shapefiles. You’re encouraged to use saveRDS() to save this is a more retrievable format, like you’ve done for the ACS data dictionary.

library(tidyverse)
library(sf)
library(leaflet)
library(mapboxapi)
osm_bldg <- st_read("G:/Shared drives/SFBI/Data Library/OSM/gis_osm_buildings_a_free_1.shp")

saveRDS(osm_bldg, "osm_bldg.rds")

Another option is Microsoft’s AI-generated building footprints, which may be more or less accurate than OSM in different places. Their GitHub repo includes links to download the files as GeoJSONs by state, but California is massive, so a nice option is to use county-level disaggregations done by Patty Frontiera at Berkeley. Let’s download ca_06081_footprints.zip, unzip on our local machine, and read the results. st_as_sf() can be pointed at the WKT field and will know how to convert that to geometries.

microsoft_bldg_smc <- 
  read_csv("ca_06081_footprints.csv") %>% 
  st_as_sf(wkt = "WKT") %>% 
  st_set_crs(4326) %>% 
  rename(geometry = WKT)

Now let’s filter both to just one CBG in East Palo Alto and compare the results. Here we use cb = F to get the most accurate boundary to the CBG as we can.

test_cbg <- 
  block_groups("CA","San Mateo", cb = F, progress_bar = F) %>% 
  filter(GEOID == "060816119003")

bldg_osm <-
  osm_bldg[test_cbg, ]

bldg_microsoft <-
  microsoft_bldg_smc %>% 
  st_transform(4269) %>% 
  .[test_cbg, ]
leaflet() %>% 
  addTiles(
    group = "OSM Base Map"
  ) %>% 
  addMapboxTiles(
    style_id = "satellite-streets-v11",
    username = "mapbox",
    group = "Satellite Base Map"
  ) %>%
  addPolygons(
    data = bldg_osm,
    fill = F,
    weight = 2,
    group = "OSM Building"
  ) %>% 
  addPolygons(
    data = bldg_microsoft,
    fill = F,
    color = "red",
    weight = 2,
    group = "Microsoft Building"
  ) %>% 
  addLayersControl(
    baseGroups = c("OSM Base Map","Satellite Base Map"),
    overlayGroups = c("OSM Building", "Microsoft Building"),
    options = layersControlOptions(collapsed = FALSE)
  )

Note that the blue boundaries match the underlying OpenStreetMap boundary (the default when using addTiles()) exactly. You can toggle to Satellite view and evaluate which building footprint is more accurate on a parcel-by-parcel basis. We’ll use OSM for the rest of this demonstration.

Now, let’s bring in parcel shapes for this CBG as well. Recall that every County will have different locations for finding parcel data, or you won’t be able to find it at all. For San Mateo County, the parcel data is available for download here:

temp <- tempfile()

download.file("https://gis.smcgov.org/GIS_data_download/SAN_MATEO_COUNTY_ACTIVE_PARCELS.zip", destfile = temp, mode = "wb")

smc_parcels <- st_read(unzip(temp, "SAN_MATEO_COUNTY_ACTIVE_PARCELS.shp"))

unlink(temp)

test_parcels <-
  smc_parcels %>% 
  st_set_crs(2227) %>% 
  st_transform(4269) %>% 
  .[test_cbg, ]
leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = test_parcels,
    fillColor = "blue",
    color = "white",
    weight = 1,
    fillOpacity = 0.5
  ) %>% 
  addPolygons(
    data = bldg_osm,
    stroke = F,
    fillColor = "white",
    fillOpacity = 1
  )

Note that if you check st_crs(smc_parcels), it appears that the CRS is WGS 84, but that is definitely not the case when you look at the geometry information and see coordinates that look like numbers around 6 million. This appears to be an error on SMC’s part, and we need to figure out what CRS the data actually is, so we can use st_set_crs() to establish that before transforming to something else. Unfortunately, if the coordinates are unrecognizable, this is a little bit like finding a needle in a haystack. Quite a bit of sleuthing on the internet let me to discover the California State Plane Coordinate System, and SMC’s being in Zone 3, which is identified as EPSG 2777.

As you can see in the map, building footprint and parcel shapes don’t perfectly match – sometimes they overlap.

Now, as a demonstration, let’s consider the viability of different parcels for building accessory structures, like accessory dwelling units (ADUs). In California, these are basically allowed on every residential parcel. Note that we have one nonresidential parcel in this CBG, by inspection, which we could choose to remove before proceeding; unlike with San Francisco, we can’t easily publicly access Assessor-Recorder data in San Mateo County to comprehensively make these determinations based on parcel use type. We can use st_difference() to cut away the building footprints from parcels. Unless a parcel is completely erased, we should be left with the same number of parcels, just with shapes that have holes in them.

projection <- "+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83 +units=ft +no_defs"

bldg_osm <- bldg_osm %>% 
  st_transform(projection)

parcels <-
  test_parcels %>% 
  st_transform(projection)

yards <-
  parcels %>% 
  mutate(
    lot_area = st_area(.)
  ) %>% 
  arrange(desc(lot_area)) %>% 
  .[-1, ] %>% 
  st_difference(st_union(bldg_osm))

leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = yards %>% 
      st_transform(4326),
    fillColor = "blue",
    color = "white",
    weight = 1,
    fillOpacity = 0.5
  )

Notes:

  • We’ve transformed to our custom projection in feet so we can perform certain geospatial computations, like calculating area, in a projected coordinate system.
  • st_difference() receives a geometry you want to cut away from the object in the pipeline, and should receive just one merged geometry using st_union().

Now, since we’re imagining building ADUs, we would need to consider setbacks which may be mandated by local jurisdictions through zoning codes. These include setbacks from the street edge (usually for aesthetics, but can include safety sightlines at intersections), setbacks from side and rear (usually for preventing “nuisance” effects on neighbors, but also for firefighter access), and setbacks from existing structures (for firefighter access and physical accessibility). In East Palo Alto, generally development would be restricted from being closer to the street than the existing structure, 4ft from side/rear, and 6ft from existing structures.

Let’s first tackle the street edge condition. Parcel by parcel, it’s hard to infer what the “street edge” is, but if we take advantage of the property that parcels can be merged together to form a “block”, that can help us identify the street edge to individual parcels. In the chunk below, we merge “blocks” with st_union(), create a “crust” using a combination of st_buffer() and st_difference(), then cut that crust into individual “pizza slice crusts” for each parcel.

blocks <-
  parcels %>% 
  st_union()

street_edges <-
  blocks %>% 
  st_difference(
    blocks %>% 
      st_buffer(-1)
  ) %>% 
  st_intersection(parcels,.) %>% 
  filter(APN %in% yards$APN)

leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = yards %>% 
      st_transform(4326),
    fillColor = "yellow",
    color = "white",
    weight = 1,
    fillOpacity = 0.5
  ) %>% 
  addPolygons(
    data = street_edges %>% 
      st_transform(4326),
    fillColor = "brown",
    color = "brown",
    weight = 2
  )

Now, for each individual parcel, we can buffer the “crust” as far as needed to intersect with an existing structure. First, we’ll spatial join building centroids to parcels so we can easily identify which buildings are “within” an individual parcel. Then, in a first test, we pick a parcel, identify the buildings in that parcel, then use st_nearest_points() to create a line connecting the two closest points between the crust and the buildings, as visualized on the map:

bldg_join <-
  bldg_osm %>% 
  st_centroid() %>% 
  st_join(parcels) %>% 
  st_set_geometry(NULL) %>% 
  left_join(bldg_osm %>% select(osm_id)) %>% 
  st_as_sf()

temp_street_edge <- street_edges[2,]

temp_apn <- temp_street_edge %>% 
  pull(APN)
    
temp_bldg <- bldg_join %>% 
  filter(APN == temp_apn) %>% 
  st_union()

nearest_distance <- 
  st_nearest_points(temp_street_edge, temp_bldg) %>%
  st_sf()

leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = temp_street_edge %>% 
      st_transform(4326)
  ) %>% 
  addPolygons(
    data = temp_bldg %>% 
      st_transform(4326)
  ) %>% 
  addPolylines(
    data = nearest_distance %>% 
      st_transform(4326),
    color = "red"
  )

st_length() will then get the length of this red line, which can then be used in st_buffer(). We now do this series of steps within a loop:

street_edges_buffered <-
  1:nrow(street_edges) %>% 
  map_dfr(function(x){
    
    temp_apn <- street_edges[x,] %>% 
      pull(APN)
    
    temp_bldg <- bldg_join %>% 
      filter(APN == temp_apn) %>% 
      st_union()
    
    nearest_distance <- st_nearest_points(street_edges[x, ], temp_bldg) %>% 
      st_length()
    
    street_edges[x, ] %>% 
      st_buffer(nearest_distance)
    
  })
leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = yards %>% 
      st_transform(4326),
    fillColor = "yellow",
    color = "white",
    weight = 1,
    fillOpacity = 0.5
  ) %>% 
  addPolygons(
    data = street_edges_buffered %>% 
      st_transform(4326),
    fillColor = "brown",
    color = "brown",
    weight = 2
  )

street_edges_buffered can now be used within st_difference() to cut the yards, removing front yards from consideration (of course, this could be actually used to identify front yards instead, if one wanted to actually encourage some kind of development in the front yard, or tree planting).

Now, for side/rear property setbacks, since EPA has the same setback for side and rear, it would be relatively straightforward to just buffer inwards from the original parcel shape. If these setbacks were different, we’d have to come up with another creative strategy to distinguish rear from side, which is nontrivial.

Lastly, for building setbacks, we can just buffer outwards from the original building shapes. So let’s perform all three steps together to reach leftover yard area with all setbacks applied:

yards_setbacks <-
  parcels %>% 
  mutate(
    lot_area = st_area(.)
  ) %>% 
  arrange(desc(lot_area)) %>% 
  .[-1, ] %>% 
  st_buffer(-4) %>% 
  st_difference(st_union(bldg_osm %>% st_buffer(6))) %>% 
  st_difference(st_union(street_edges_buffered))

leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = yards_setbacks %>% 
      st_transform(4326),
    fillColor = "blue",
    color = "white",
    weight = 1,
    fillOpacity = 0.5
  )

This set of yard portions is much more limited than we were originally working with. One last thing we can do is consider a “buildable area” for a habitable structure like an ADU, which realistically would need to be at least 8ft wide and 20ft long. A neat trick to get only areas that can fit something 8ft wide is to buffer inwards just shy of half of 8ft, like 3.9ft, and then buffer back outwards the same amount. If a narrow nook or cranny is thinner than 8ft, it will entirely disappear in this process. After the buffering, we can then disaggregate our geometries into individual shapes, since the buffering process might have split one single geometry into two smaller geometries; this is done with st_cast(), first standardizing every row to a “MULTIPOLYGON” and then casting to “POLYGON” which will split genuine multipolygons. Lastly, we’ll calculate the remaining area and filter to only shapes that have at least 160 sqft of buildable area.

buildable_area <-
  yards_setbacks %>% 
  st_buffer(-3.9) %>% 
  st_buffer(3.9, joinStyle = "BEVEL") %>%
  filter(!st_is_empty(.)) %>% 
  st_cast("MULTIPOLYGON") %>% 
  st_cast("POLYGON") %>% 
  mutate(
    buildable_area = st_area(.) %>% as.numeric()
  ) %>% 
  filter(buildable_area >= 160)

leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = yards_setbacks %>% 
      st_transform(4326),
    fillColor = "red",
    color = "white",
    weight = 1,
    fillOpacity = 0.5
  ) %>% 
  addPolygons(
    data = buildable_area %>% 
      st_transform(4326),
    fillColor = "green",
    color = "white",
    weight = 1,
    fillOpacity = 1
  )

This “buildable area” could be useful to calculate a more realistic distribution of possible ADU sizes in a neighborhood, where larger ADUs may be able to house larger families in a naturally affordable way. It may also be useful for families to visualize on a dashboard as they consider building an ADU. Note that some parcels seem to already have small backyard structures identified by OpenStreetMap that are possibly already ADUs (and possibly many more that may have been missed by OSM).