Skip to main content
Data Availability

Why data may appear as NULL or No Data in mySidewalk.

Sarah Byrd avatar
Written by Sarah Byrd
Updated over 5 months ago

We at mySidewalk work hard to build a data library that contains data for your community. It is not always possible however to build data values for every topic, at every time, for every geography. In these instances, you will see NULL or No Data in a mySidewalk product.

Contents

Common data availability questions

  1. Why does some data appear as No Data or NULL in mySidewalk?

  2. Why is this data available for other cities, but not my city?

  3. Why is data available for Census Tracts, but not neighborhoods?

  4. Why do I see No Data when Seek availability showed 100%?

Key Factors Affecting Data Availability

Data availability is influenced by a combination of 3 things: Geography, Topic, and Time.

  1. 🗺️ Geography. This is your location of interest, which could range from a single ZIP Code, state, or City. With over 450,000 possible geographies, the availability of data can vary significantly.

  2. 📝 Topic. This refers to the specific data or group of data you are interested in.

  3. 🕰️ Time. When selecting data, you also choose a specific time. The default is the most recent data available for that topic.

These three factors combine to get to a data value. Selecting options for these 3 things will get you to a data value. For the same reasons, the right (or wrong depending on your perspective) combination of 3 things results in NULL or No Data.

Reasons Data are NULL or No Data

The following are the most common reasons the combination of a topic + time + geography may be NULL or show as No Data.

Raw published data lacked granularity

Even when a raw data source publishes data for a geography, such as county, it may not provide a data value for every county. This is because every data source has its own rules and protocols when it comes to sample size and privacy. Where sample size is too low, data values are also suppressed or replaced with a placeholder. The minimum threshold of what qualifies as “too low” varies from data source to data source. This can be done in an effort to protect privacy. It also could be due to lack of confidence in the produced data values due to small sample size.

  • Example: If the CDC raw data is published for only County, State, Nation. We will not manipulate the data for a smaller geography such as block group.

Raw data suppressed for specific features

Even when a raw data source publishes data for a geography, such as county, it may not provide a data value for every county. This is because every data source has its own rules and protocols when it comes to sample size and privacy. Where sample size is too low, not meeting a minimum threshold, data values are also suppressed or replaced with a placeholder. This can be done in an effort to protect privacy. It also could be due to lack of confidence in the produced data values due to small sample size.

In other instances, the values may be known but are so specific - such as a combination of a particular age, sex, education, poverty status, and disability - that it could be possible to use the data to identify an individual. Each data source takes its own approach to obfuscate the data in ways to prevent individual identification. One of the tools in their toolbox is suppression, in which they replace the known value with a placeholder or NULL.

  • Example: 3 of the 88 counties in Ohio had placeholder values in the original data. The mySidewalk team removes placeholder values. Across mySidewalk products, the values for those 3 counties will appear as NULL or No Data.

Raw published data pre-calculated / otherwise manipulated

Data sources sometimes published only the calculation outputs, in formats such as ratios or percentiles. Without access to the input data, it is sometimes impossible to recalculate the values for additional geographies.

  • Example: County level death rate. CDC does not provide the raw data inputs to calculate for a smaller geography such as Census tract.

Data calculated using apportionment

Apportionment is a powerful tool that we use to calculate data values for non-Census geographies (City Councils, Neighborhoods, MPOs) and for geographies that are not published by the raw data provider. It does have its limits.

Data will be NULL or No Data if the target geography (the geography to which you are apportioning data) fails to capture enough block points. Or if the captured block points have zero ratio for the selected apportionment types. There are four block-to-block group apportionment type ratios: people, households, housing units, and properties.

Calculated data failed mySidewalk QA testing

We take a great deal of pride at mySidewalk in providing data you can trust to make decisions for the betterment of your community. To that end, we have an extensive QA process, customized to every data source. Data that fails testing does not get added to the mySidewalk data library. Sometimes the data that fail are only for specific geographies - in which case only they will be made NULL or No Data.

  • Examples of failed tests: too high margin of error on input data, missing or zero value(s) in part of the calculation, result value is outlier that cannot be verified

Why do I see No Data when Seek availability showed 100%?

The display of data availability in Seek is rounded. So a value of 99.51798% will appear as 100%.

Did this answer your question?