mySidewalk’s time series data is a collation of data from multiple source years and projections for future years. This dataset makes it easy to compare past data to current data, or make predictions about how places may change in the future. We have data from 1990-2020, meaning you can make data-based past vs. current comparisons or future predictions within these time frames.
Time Series data can be useful to see:
- How past data compares to current data
- The future projection for data in a given location
- How a particular location has changed or is going to change over the years
In order to create the Time Series dataset from 1990 to 2020, data from three decennial censuses and the ACS 2011-2015 were used. One of the challenges of using these three datasets is that the geographies change over time. For example, the smallest unit of geography to collect information in a census is a block group. These change over time, so that the block group boundary in which data was collected in 1990 is probably not the same boundary that was used in the 2010 data collection, and a new boundary will probably be used for the 2020 data collection. In order to make the data comparable, it is necessary to process the data so that it pairs with one geography, in this case the 2010 census geographies. Note: see below for a more technical description of the process used.
Want to try it out for yourself? There are two ways you can apply Time Series data to your analysis - Location Snapshot, and Maps.
Get an instant overview of your community from 1990-2020
- From the Home page, select the category Time Series from the drop-down menu at the top of the page
- Click Time Series Snapshot
Visualize how your community has changed, and what it might look like in the next few years on a map
- From the top drop-down Menu, click New Map. Enter your location of interest, then click Start
- To see Time Series data, click Display map by on the top left-side drop-down menu. Click Browse data, then type Time Series to see the available time series datasets
- If you would like to see a bivariate map, click the Bivariate Map toggle to “On” at the bottom of the Display map by box
- Optional: Choose a second corresponding time-series dataset (for example, if your first selection is "Time Series: Total Population 2010," the next selection should be "Time Series: Total Population 2020" to see the change from 2010-2020)
- Optional: Normalize your data with a corresponding selection to see the change over time (for example, if you selected two "Time Series: Population" datasets, you should normalize by Total Population to compare the change over time as a percentage)
- To see the full range of data from 1990-2020, click Charts in the left-side toolbar. Click Add Charts, then select the Time Series category from the drop-down menu
- Select the box to the left for all the charts you would like to add, then click Add Selected Charts
- Optional: Share your map with others to show them how your community has changed, and what it might look like in the future
Let's Get Technical
Interested in learning about how the talented mySidewalk team of data technicians collated the data from multiple source times in order to render the data comparable? Read below for an overview of the process, as well as a full listing of data and their sources.
As stated earlier, the geographies upon which we record data change over the years, often via redistricting. The earlier years have been transformed (or "harmonized") to be rendered comparable with data from later years.
To see more about how mySidewalk harmonized the 1990 and 2000 data onto the 2010 geographies, please read about the approach from the NHGIS.
There are four cases in which an imputation process is used to fill in missing data. Each case and the imputation process used is listed below.
- Case 1 - No data is present: A global mean is calculated from all source data. This value is used to fill in the middle year of block groups with no data.
- Case 2 - One data point is present: A mean slope is calculated from the regression slopes of all other block groups. This slope is used to impute all missing years in block groups with only a single value, including those that were imputed in step one.
- Case 3 -Two points are present: A line is constructed from the two points.
- Case 4 - Three or more points are present: A line is constructed from linear regression.
Projected count estimates were computed for 2016, 2018, and 2020 using a modified linear regression over the years: 1990, 2000, 2010, and ACS 2011-2015 (referred to as 2013).
The slope of the regressor is standard, and the ACS 2011-2015 value is used as the offset instead of the standard regression offset, as it is the last known good of the series.
All source data comes from the block group level, and is carried throughout this process as such. The final step is to take the estimated block group data and summarize it to higher levels of geography such as neighborhoods and states via weighted block-point apportionment (WBPA). This is why our time series 2010 values may not exactly match the values from DC2010.
Now that we understand the processing steps that the data is put through, let's take a look at the data we're currently using.
Note: the particular source years and projection years will change over time (and may have changed already by the time you read this), but this manuscript is written with definite set of years for the clarity that examples provide.
Source: Decennial Census 1990 and 2000
The 1990 and 2000 geographies were redistricted to 2010 geographies and will not match its counts on the old geographies due to harmonization.
Source: Decennial Census 2010 & American Community Survey (ACS) 5-Year Estimates. ACS 2008-2012
The 2010 Decennial Census is the underlying source for the population and housing data and the map displays the 2010 census geographies.
Median income and median home value were used as 2010 estimates from the ACS 2008-2012 because the 2010 Decennial Census did not have counts for those.
Source: American Community Survey (ACS) 5-Year Estimates. ACS 2011-2015
The above uses the ACS 2011-2015 counts due to the fact that the estimates midpoint in 2013 and seem to represent that year the best when compared to data from other years.
2016, 2018, & 2020
Source: Estimations based on the prior four source years.
Data for 2016, 2018, and 2020 are estimated from a modified linear regression over the years: 1990, 2000, 2010, and ACS 2011-2015 (referred to as 2013).
The slope of the projected line is from standard linear regression (mainly for efficiency and to prevent overfitting). The offset of the line is from the last point of the series.
Now that you understand the Time Series data, you may want to check out:
- Maps: Learn about how building a customized map on mySidewalk helps you tell a story about a place using data
- Location Snapshot/Comparison: Location Snapshots are customized data reports: you can change it according to your role-type and interests, allowing you to gain a deeper insight about your community in an instant. You can also compare your location to any other location in two-clicks
- Sharable Maps: Share the insights you have found using data by sending your map to others
- Charts: Charts allow you to visualize additional and multi-dimensional data alongside your map
- Annotations: Provide context to a specific area on your map that you want to draw attention to
- Layers: Layers offer an additional level of data visualization to the place you are analyzing