The mySidewalk data library supports a rich collection data including US Census data that span over thirty years. We leveraged published Census data to produce projected values for over 100 indicators. These projected values are available for all mySidewalk geographies and are apportionable to your custom boundary.

Published Census Data

All projections are generated from data originating from the following five Census products:

  • Decennial Census 1990

  • Decennial Census 2000

  • Decennial Census 2010

  • American Community Survey (ACS) 2007-2011 5-Year Estimates (used to proxy for concepts not available in the short form Decennial Census 2010)

  • American Community Survey (ACS) 2015-2019 5-Year Estimates

Preparing the Data

Producing projections from historical Census data would not be possible without a few supporting procedures to prepare the data.

  1. Geographic harmonization was used to 'walk' the data from Decennial Census 1990 and Decennial Census 2000 block group boundaries to the Decennial Census 2010 block group boundaries. Every Decennial Census builds a new set of boundaries, which is why the historical data needed to be harmonized to the current 2010 boundaries.

  2. The data was then apportioned from block groups to the other mySidewalk boundaries using weighted block to block group apportionment. In post-processing, the values for states and the nation were overwritten with the values directly from the respective Decennial Censuses 1990, 2000, and 2010.

  3. We also used the Amelia II software provided by Gary King et al. at Harvard University (see citation below) to impute missing values in the data. The missing values were largely a result of needing to harmonize the block group level data to the 2010 boundaries. Inputing the values leverages the data available in the neighboring boundaries to calculate and then replace missing values.

Projection Methodology

  • Projections for over one hundred Census concepts were produced for 2019, 2021, 2023, 2025, and 2027 using a modified linear regression over the years: 1990, 2000, 2010, and ACS 2015-2019 (midpoint of 2017 was used).

  • The slope of the regressor is standard (for efficiency and to prevent overfitting), and the ACS 2015-2019 value is always used as the offset instead of the standard regression offset, as it is the last known good of the series.

  • As a post processing step, values for the 2019 projection were overwritten with 1-year ACS 2019 data, in instances where values are available for a given boundary.

Citations:
James Honaker, Gary King, and Matthew Blackwell. 2011. “Amelia II: A Program for Missing Data.” Journal of Statistical Software, 45, 7, Pp. 1-47. Copy at http://j.mp/2owkPrr

Did this answer your question?