Correlations / Scatter Plots

Learn how to make and understand a scatter plot / correlation with both mySidewalk data and your data.

A
Written by Aliyah Hunter
Updated this week

What is correlation?

At the basic level, a correlation is simply a relationship or connection between two things. When thinking about correlation in the statistical sense, it is a measure of the extent that two variables relate to each other.

So how do we measure correlation?

TL;DR: The correlation coefficient, r, is the measure of how well your two datasets are related. Below is the breakdown of the strong, moderate, and weak correlation coefficient, r, definitions used in mySidewalk.

Positive r coefficient

Negative r coefficient

Strong

1 to 0.8

-0.8 to 1

Moderate

0.8 to 0.5

-0.5 to -0.8

Weak

0.5 to 0.3

-0.3 to -0.5

No Correlation

0.3 to 0

0 to -0.3

Correlation is about the relationship of values from two datasets; it does not refer to a cause and effect. To measure correlation, a scale from 1 to -1 is used. Values near either end of the scale, that is near 1 or near -1, are strongly correlated while values near zero are not correlated. You can find the correlation value for any two datasets.

Technical Explanation: “Pearson product-moment correlation attempts to draw a line of best fit through the data of two variables, and the Pearson correlation coefficient, r, indicates how far away all these data points are to this line of best fit (i.e., how well the data points fit this new model/line of best fit).”

Build a Correlation

To add a correlation to your report, click the Correlation icon and choose either mySidewalk Data or Your Data.

You will be asked to select a geography and a level of granularity for your correlation. Use the search option to find a geography and select one of the options under Display Geography by

MySidewalk Data

Video of how to make a correlation

The default variables that come up are Median Household Income and Median Home Value. You can change the X and Y variables separately by clicking Change Data in the menu on the right.

Use the search option to find variables. Click the drop down arrow and add data using the blue plus button.

You can also change the geography by clicking Selected Layer at the top of the menu.

In the Data tab, you can change the data, or edit the label and units under More Options. Click Pivot X & Y Axis to swap the axes.

The Style tab allows you to change the colors and add a title, footnote, or accessibility description. You are also able to set the minimum and maximum values for each of the axes of your scatter plot from this screen. This adds to the effect and the usefulness of your Correlation.

A footnote will be automatically generated to state the strength of the correlation between your two variables and provide a brief explanation of what the correlation results mean. As a reminder (see above), correlations values near 1 or -1 are strongly correlated and near zero are weakly correlation. A small scale is also produced in the legend that shows the correlation value (r) and, underneath, the 95% confidence interval. That confidence interval can be a helpful way to understand the margin of error for this set of data.

I like to read it in my head as: I'm 95% confident that the true mean of these values falls in this range.

Your Data

You can use data that you have georeferenced in correlations in mySidewalk. This allows you to make a correlation using both mySidewalk data and your own data. First, you need to georeference your data during upload. You can learn how to do that here. (or use a tutorial!)

When you georeference your data, you are agreeing to use one of our pre-loaded shapes to define your data area. Once we both agree to the shape of the data, we can start to compare data with a confidence that we are talking about similar sets of things.

If you want to use your own data for either axis, you must use the Correlation icon and then choose Your Data.

Then select a variable to start with. Only georeferenced data will appear in this list.

The default variables that come up are Median Household Income and Median Home Value. You can change the X and Y variables separately by clicking Change Data in the menu on the right. When you have started with Your Data, you will have a choice to pick your own data or mySidewalk data.

In the Data tab, you can change the data, or edit the label and units under More Options. Click Pivot X & Y Axis to swap the axes.

The Style tab allows you to change the colors and add a title, subtitle, footnote, or accessibility description. You are also able to set the minimum and maximum values for each of the axes of your scatter plot from this screen. This adds to the effect and the usefulness of your Correlation.

A footnote will be automatically generated to state the strength of the correlation between your two variables and provide a brief explanation of what the correlation results mean. A small scale is also produced in the legend that shows the correlation value (r) and, underneath, the 95% confidence interval. That confidence interval can be a helpful way to understand the margin of error for this set of data.

I like to read it in my head as: I'm 95% confident that the true mean of these values falls in this range.

Did this answer your question?