3.6 Assignment 3

Using the datasets we have encountered so far in the curriculum (American Communities Survey, CalEnviroScreen, 500 Cities, PG&E), and any other datasets you can access publicly, conduct a series of regression analyses that shed light on an urban topic of interest to you. Given that this may be your first data-driven statistical analysis, approach this exercise as a “walk-before-you-run” kind of experience; you will certainly be able to practice more complicated statistical analyses throughout the rest of this curriculum and your urban work. Closely follow the guidelines below:

  1. Select one (or at most, two) outcomes of interest that can be measured at the CBG, tract, or ZCTA level in the Bay Area. They must be continuous variables (like % of households in poverty) at this scale of analysis. At least one of the outcomes must also be available in the PUMS data at the individual or household level, so that one of your analyses can make use of PUMS data specifically. At this scale of analysis, the outcome can still be a continuous variable (like individual income) or convert to a categorical variable that can be made binary (like completing a college degree or not).
  2. Perform an individual or household level multiple regression analysis (with PUMS data for PUMAS in the Bay Area) using one of your outcomes of interest as the dependent variable and at least two independent variables (like language ability or commute time), which should be chosen based on some brief open-ended exploration of academic papers, news articles, or personal anecdotes that suggest a statistical relationship may be found between the independent and dependent variables. Compare your findings to those of your reviewed resources (keep in mind that if your geography and/or time period of analysis is different from that of your sources, those can be sources of differences in findings). Carefully describe your position on whether a causal claim can be made based on the results of your analysis, and feel free to comment on the degree to which your reviewed sources communicate correlation vs. causation in the same way. Be sure to make use of weights properly in the PUMS data.
  3. Perform at least one additional multiple regression analysis at the CBG, tract, or ZCTA level in the Bay Area, at which you may choose to use a similar outcome as the first analysis or a different one. Again, select at least two independent variables. Both independent and dependent variables can now come from other datasets besides the ACS, as long as they match the geographic scale of analysis. You are more than welcome to try matching this analysis with the first one as much as possible, so as to explicitly compare the results of an individual vs. ecological scale of analysis, or make this analysis completely different (in which case you should prepare a second set of literature review).
  4. To deepen this analysis, you are encouraged to also run the same regressions for a different time period with available data, so as to investigate whether the statistical relationships change over time.