2.4 Parcel and building geometries

As we’ve seen, parcel-level data can include certain kinds of building information, but it doesn’t include the geometry of the building footprint itself. We’d find actual building geometries useful for certain kinds of analysis, such as estimating available rooftop area for solar panels, or estimating buildable or plantable area in our urban landscapes, i.e. the space between parcel boundaries and existing building boundaries.

In certain leaflet base maps we’ve worked with, we’ve seen building footprints rendered, and we’re probably most used to seeing this in Google Maps, so it would seem to be the case that we can find building footprints publicly available online. We recommend two sources.

First, OpenStreetMap is a Wikipedia-style crowdsourcing platform for geometries generally, and a German organization called Geofabrik makes many components of OSM available for easy download. After clicking a few levels down, you can find a NorCal page with a 700MB file for download with these attributes. It is worth it to personally download this zip file and explore the various polygons available. For our purposes, we’ll load the building shapefiles. You’re encouraged to use saveRDS() to save this in a more retrievable format, like you’ve done for the ACS data dictionary.

library(tidyverse)
library(sf)
library(leaflet)

Use st_read() to load shapefiles, ensuring that the filepath you point to has .dbf, .prj, .shp, and .shx files, all with the same filename as the .shp file.

osm_bldg <- st_read("gis_osm_buildings_a_free_1.shp")

Another option is Microsoft’s AI-generated building footprints, which may be more or less accurate than OSM in different places. Their GitHub repo includes links to download the files as GeoJSONs by state, but California is massive, so a nice option is to use county-level disaggregations done by Patty Frontiera at Berkeley. Let’s download ca_06081_footprints.zip, unzip on our local machine, and read the results. st_as_sf() can be pointed at the WKT field and will know how to convert that to geometries.

microsoft_bldg_smc <- 
  read_csv("ca_06081_footprints.csv") %>% 
  st_as_sf(wkt = "WKT") %>% 
  st_set_crs(4326) %>% 
  rename(geometry = WKT)

Now let’s filter both to just one CBG in East Palo Alto and compare the results. Here we use cb = F to get the most accurate boundary to the CBG as we can.

test_cbg <- 
  block_groups("CA","San Mateo", cb = F, progress_bar = F) %>% 
  filter(GEOID == "060816119003") %>% 
  st_transform(st_crs(osm_bldg))

bldg_osm <-
  osm_bldg[test_cbg, ]

bldg_microsoft <-
  microsoft_bldg_smc[test_cbg, ]
leaflet() %>% 
  addTiles(
    group = "OSM Base Map"
  ) %>% 
  addProviderTiles(
    provider = providers$Esri.WorldImagery,
    group = "Satellite Base Map"
  ) %>% 
  addPolygons(
    data = bldg_osm,
    fill = F,
    weight = 2,
    group = "OSM Building"
  ) %>% 
  addPolygons(
    data = bldg_microsoft,
    fill = F,
    color = "red",
    weight = 2,
    group = "Microsoft Building"
  ) %>% 
  addLayersControl(
    baseGroups = c("OSM Base Map","Satellite Base Map"),
    overlayGroups = c("OSM Building", "Microsoft Building"),
    options = layersControlOptions(collapsed = FALSE)
  )

Note that the blue boundaries match the underlying OpenStreetMap boundary (the default when using addTiles()) exactly. You can toggle to Satellite view and evaluate which building footprint is more accurate on a parcel-by-parcel basis. We’ll use OSM for the rest of this demonstration.

Now, let’s bring in parcel shapes for this CBG as well. Recall that every County will have different locations for finding parcel data, or you won’t be able to find it at all. For San Mateo County, the parcel data is available for download here:

temp <- tempfile()

download.file("https://gis.smcgov.org/GIS_data_download/SAN_MATEO_COUNTY_ACTIVE_PARCELS_GEOJSON.zip", destfile = temp, mode = "wb")

smc_parcels <- st_read(unzip(temp, "SAN_MATEO_COUNTY_ACTIVE_PARCELS_GEOJSON.json"))

unlink(temp)

test_parcels <-
  smc_parcels %>% 
  st_transform(4326) %>% 
  .[test_cbg, ]
leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = test_parcels,
    fillColor = "blue",
    color = "white",
    weight = 1,
    fillOpacity = 0.5
  ) %>% 
  addPolygons(
    data = bldg_osm,
    stroke = F,
    fillColor = "white",
    fillOpacity = 1
  )

Note that if you check st_crs(smc_parcels), it appears that the CRS is an unusual EPSG 2777, part of the California State Plane Coordinate System. If the dataset did not already come knowing its own CRS, it would be difficult to figure out what the coordinates were, but is sometimes part of the job of data wrangling through systematic Google searching, like finding a needle in a haystack.

Also note that as you can see in the map, building footprint and parcel shapes don’t perfectly match – sometimes they overlap. This is to be expected given the resolution of this kind of data, and is a healthy reminder that any geometric operations we perform at this scale are at best for neighborhood-level estimates, and should not be taken to be accurate depictions of any one parcel.

Now, as a demonstration, let’s consider the viability of different parcels for building accessory structures, like accessory dwelling units (ADUs). In California, backyard ADUs are allowed on every residential parcel (if they fit). Note that we have one nonresidential parcel in this CBG, by inspection, which we could choose to remove before proceeding; unlike with San Francisco, we can’t easily publicly access Assessor-Recorder data in San Mateo County to comprehensively make these determinations based on parcel use type. We can use st_difference() to cut away the building footprints from parcels. Unless a parcel is completely erased, we should be left with the same number of parcels, just with shapes that have holes in them.

projection <- "+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83 +units=ft +no_defs"

bldg_osm <- bldg_osm %>% 
  st_transform(projection)

parcels <-
  test_parcels %>% 
  st_transform(projection)

yards <-
  parcels %>% 
  mutate(
    lot_area = st_area(.)
  ) %>% 
  arrange(desc(lot_area)) %>% 
  .[-1, ] %>% 
  st_difference(st_union(bldg_osm))
leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = yards %>% 
      st_transform(4326),
    fillColor = "blue",
    color = "white",
    weight = 1,
    fillOpacity = 0.5
  )

Notes:

  • We’ve transformed to our custom projection in feet so we can perform certain geospatial computations, like calculating area, in a projected coordinate system.
  • st_difference() receives a geometry you want to cut away from the object in the pipeline, and should receive just one merged geometry using st_union().
  • Since we had identified through manual inspection that the largest parcel was non-residential, we arranged by descending order of area and then explicitly removed the first entry, as a quick way of getting rid of that parcel.

Now, since we’re imagining building ADUs, we would need to consider setbacks which may be mandated by local jurisdictions through zoning codes. These include setbacks from the street edge (usually for aesthetics, but can include safety sightlines at intersections), setbacks from side and rear (usually for preventing “nuisance” effects), and setbacks from existing structures (for mitigating fire spread and for physical accessibility). In East Palo Alto, generally development is restricted from being closer to the street than the existing structure, 4ft from side/rear, and 6ft from existing structures.

Let’s first tackle the street edge condition. Parcel by parcel, it’s hard to infer what the “street edge” is, but if we take advantage of the property that parcels can be merged together to form a “block”, that can help us identify the street edge to individual parcels. In the chunk below, we merge “blocks” with st_union(), create a “crust” using a combination of st_buffer() and st_difference(), then cut that crust into individual “pizza slice crusts” for each parcel.

blocks <-
  parcels %>% 
  st_union()

street_edges <-
  blocks %>% 
  st_difference(
    blocks %>% 
      st_buffer(-1)
  ) %>% 
  st_intersection(parcels,.) %>% 
  filter(APN %in% yards$APN)
leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = yards %>% 
      st_transform(4326),
    fillColor = "yellow",
    color = "white",
    weight = 1,
    fillOpacity = 0.5
  ) %>% 
  addPolygons(
    data = street_edges %>% 
      st_transform(4326),
    fillColor = "brown",
    color = "brown",
    weight = 2
  )

Now, for each individual parcel, we can buffer the “crust” as far as needed to intersect with an existing structure. First, we’ll spatial join building centroids to parcels so we can easily identify which buildings are “within” an individual parcel. Then, in a first test, we pick a parcel, identify the buildings in that parcel, then use st_nearest_points() to create a line connecting the two closest points between the crust and the buildings, as visualized on the map:

bldg_join <-
  bldg_osm %>% 
  st_centroid() %>% 
  st_join(parcels) %>% 
  st_drop_geometry() %>% 
  left_join(bldg_osm %>% select(osm_id)) %>% 
  st_as_sf()

temp_street_edge <- street_edges[2,]

temp_apn <- temp_street_edge %>% 
  pull(APN)
    
temp_bldg <- bldg_join %>% 
  filter(APN == temp_apn) %>% 
  st_union()

nearest_distance <- 
  st_nearest_points(temp_street_edge, temp_bldg) %>%
  st_sf()
leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = temp_street_edge %>% 
      st_transform(4326)
  ) %>% 
  addPolygons(
    data = temp_bldg %>% 
      st_transform(4326)
  ) %>% 
  addPolylines(
    data = nearest_distance %>% 
      st_transform(4326),
    color = "red"
  )

st_length() will then get the length of this red line, which can then be used in st_buffer(). We now do this series of steps within a loop:

street_edges_buffered <-
  1:nrow(street_edges) %>% 
  map_dfr(function(x){
    
    temp_apn <- street_edges[x,] %>% 
      pull(APN)
    
    temp_bldg <- bldg_join %>% 
      filter(APN == temp_apn) %>% 
      st_union()
    
    nearest_distance <- st_nearest_points(street_edges[x, ], temp_bldg) %>% 
      st_length()
    
    street_edges[x, ] %>% 
      st_buffer(nearest_distance)
    
  })
leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = yards %>% 
      st_transform(4326),
    fillColor = "yellow",
    color = "white",
    weight = 1,
    fillOpacity = 0.5
  ) %>% 
  addPolygons(
    data = street_edges_buffered %>% 
      st_transform(4326),
    fillColor = "brown",
    color = "brown",
    weight = 2
  )

street_edges_buffered can now be used within st_difference() to cut the yards, removing front yards from consideration (of course, this could be actually used to identify front yards instead, if one wanted to actually encourage some kind of activity in the front yard, like tree planting).

Now, for side/rear property setbacks, since EPA has the same setback for side and rear, it would be relatively straightforward to just buffer inwards from the original parcel shape. If these setbacks were different, we’d have to come up with another creative strategy to distinguish rear from side, which is nontrivial.

Lastly, for building setbacks, we can just buffer outwards from the original building shapes. So let’s perform all three steps together to reach leftover yard area with all setbacks applied:

yards_setbacks <-
  parcels %>% 
  mutate(
    lot_area = st_area(.)
  ) %>% 
  arrange(desc(lot_area)) %>% 
  .[-1, ] %>% 
  st_buffer(-4) %>% 
  st_difference(st_union(bldg_osm %>% st_buffer(6))) %>% 
  st_difference(st_union(street_edges_buffered))
leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = yards_setbacks %>% 
      st_transform(4326),
    fillColor = "blue",
    color = "white",
    weight = 1,
    fillOpacity = 0.5
  )

This set of yard portions is much more limited than we were originally working with. One last thing we can do is consider a “buildable area” for a habitable structure like an ADU, which realistically would need to be at least 8ft wide and 20ft long. A neat trick to get only areas that can fit something 8ft wide is to buffer inwards just shy of half of 8ft, like 3.9ft, and then buffer back outwards the same amount. If a narrow nook or cranny is thinner than 8ft, it will entirely disappear in this process. After the buffering, we can then disaggregate our geometries into individual shapes, since the buffering process might have split one single geometry into two smaller geometries; this is done with st_cast(), first standardizing every row to a “MULTIPOLYGON” and then casting to “POLYGON” which will split genuine multipolygons. Lastly, we’ll calculate the remaining area and filter to only shapes that have at least 160 sqft of buildable area.

buildable_area <-
  yards_setbacks %>% 
  st_buffer(-3.9) %>% 
  st_buffer(3.9, joinStyle = "BEVEL") %>%
  filter(!st_is_empty(.)) %>% 
  st_cast("MULTIPOLYGON") %>% 
  st_cast("POLYGON") %>% 
  mutate(
    buildable_area = st_area(.) %>% as.numeric()
  ) %>% 
  filter(buildable_area >= 160)
leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = yards_setbacks %>% 
      st_transform(4326),
    fillColor = "red",
    color = "white",
    weight = 1,
    fillOpacity = 0.5
  ) %>% 
  addPolygons(
    data = buildable_area %>% 
      st_transform(4326),
    fillColor = "green",
    color = "white",
    weight = 1,
    fillOpacity = 1
  )

This “buildable area” could be useful to calculate a more realistic distribution of possible ADU sizes in a neighborhood, where larger ADUs may be able to house larger families in a naturally affordable way. It may also be useful for families to visualize on a dashboard as they consider building an ADU. Note that some parcels seem to already have small backyard structures identified by OpenStreetMap that are possibly already ADUs (and possibly many more that may have been missed by OSM).