Why do I need this?

Gaining insights from data is not as easy as just looking at it. Discovering something about a community requires some knowledge of how to look at and interpret data. Below, we are going to walk you through some examples where you can use Seek to gain insights.

What can I learn here?

Seek makes it possible to quickly and easily perform the following statistical calculations on community or place-based data, so let’s go over them.

For each statistical concept, we’ve provided:

A simple plain-language definition
Real-world examples for understanding communities
Where and how to use the concept in Seek
Links to more in-depth explanations

Mean

The mean (average) is the center value in a set of numbers. It is calculated by taking the sum of all the values and dividing by the total number of values. The mean is also used to find the standard deviation for the dataset

Real-world examples:

Health: Insurance analysts often calculate the mean age of the individuals they provide insurance for so they can know the average age of their customers.
Community Development: calculate the mean age of the individuals that reside in the community so they can know that average age

Finding Mean in Seek:

In the Table view, you can find the mean or average on a variable you’ve searched by checking the box next to Show Statistics. This will add statistics, including Mean, to the bottom of the table.

Median

The median is the middle value in an ordered set of numbers. Half of the values in the set are greater than (>) the median, and half of the values are less than (<) the median. If a value is more than three standard deviations away from the median, it is considered an outlier in the dataset.

Real-world examples:

Development analysts can view the median income in certain regions so that they can be informed of what the typical “middle” salary is.

Finding Median in Seek:

In the Table view, you can find the median on a variable you’ve searched by checking the box next to Show Descriptive Statistics. This will add statistics, including median, to the bottom of the table.

Mode

The mode is the value that appears most frequently in a set of data. It is the statistical value most likely to be sampled because it can reveal important information about the whole dataset.

Finding Mode in Seek:

In the Table view, you can find the mode on a variable you’ve searched by checking the box next to Show Descriptive Statistics. This will add statistics, including mode, to the bottom of the table.

Standard Deviation

The standard deviation (SD) is the standard (normal) amount that values deviate (differ) from the mean (average). A low SD indicates low variability, so data is more clustered around the mean (center). A high SD indicates higher variability, so data is more spread out across the range. The variability of values can mean many things, so it is important to understand the context of the data to determine what the spread might mean.

Finding Standard Deviation in Seek:

In the Table view, you can find the standard deviation on a variable you’ve searched by checking the box next to Show Descriptive Statistics. This will add statistics, including standard deviation, to the bottom of the table.

Outliers

Outliers are values in a dataset that differ significantly from the rest. They are found by calculating the differences between the data values and the median. If the difference is more than three standard deviations, it is considered an outlier. An outlier can mean many things, so it is important to understand the context of the data to determine if the difference is meaningful.

Real-world examples:

If something stands out as an outlier, it is usually an indication to investigate further. In the example of unemployment rate, if there is a particular census tract that is very high, there is likely something to be reviewed more closely. Perhaps the number of residents of working age is very small or there might be a transportation issue keeping people from working.

Finding outliers in Seek:

In the Table view, you can find the outliers on a variable you’ve searched by checking the box next to Highlight outliers. This will highlight and identify the outliers on the table.

Regions in Order

Each region is ordered from highest to lowest along the x-axis. The y-axis highlights the relative difference between the regions.

Real-world examples:

Qualified Census Tracts: this Region chart is a good way to see geographies that have a high concentration of qualified census tracts. For example, choose Qualified Census Tracts as Data and a county with the zip code sub-geography as your region. When you hover over the tallest bars (on the left) it will show which zip codes have the most qualified census tracts from what you selected.

Finding Regions in Order in Seek:

In the Distribution view, you can find the Regions in Order on a variable you’ve selected. This chart will change if you click the “previous” or “next” buttons at the top to change variable.

Histogram

A histogram divides the total range (the minimum value to maximum value) into 100 equal intervals along the x-axis. The number of regions that fit within each interval is represented on the y-axis. Histograms can help to visualize the variability within the dataset in a more compact way.

Real-world examples:

Digital Equity: When you choose an Internet Access variable, the Histogram will show you how many regions have a high and low amount of households with internet access. If you have a tall bar (high number of regions) near 0, then looking for internet access grants is a good idea for your community.

Finding Histogram in Seek:

In the Distribution view, you can find the Histogram of Regions on a variable you’ve selected by scrolling down. This histogram will change if you click the “previous” or “next” buttons at the top to change variable.

Correlation Matrix

The correlation matrix helps to illustrate how datasets relate to each other by building a grid (matrix) of the r values for every intersection within the data. Correlations measure how closely two datasets relate to each other. The relationship can be positive (if X increases, Y also increases) or negative (if X increases, Y will decrease). The coefficient (r) is used to express the strength and direction of the relationship. The r value ranges from 1 (perfectly positive correlation) to -1 (perfectly negative correlation). A value of zero means the datasets do not correlate.

Finding Correlation Matrix in Seek:

In the Relationships view, you will find correlation matrix for all the variables you have selected across all regions you have selected.

Regression Line

When two datasets are visualized in a scatter plot, if there is a correlation, a regression line can be drawn that best fits the data and visualizes the direction and strength of the relationship.

Finding Regression Line in Seek:

In the Relationships view, you can find the regression line by picking any colored box inside the correlation matrix.

Normalization

Normalization modifies data values to have a common scale that can make comparisons easier. For example, if you normalize (divide) by the total population you can see how regions relate to each other as percentages of the population instead of trying to compare absolute data values without the context.

Real-world examples:

Community Development: median income per capita. This is a calculation of median income normalized (divided by) Total population and will give you an indication of how wealthy your constituency is.

Finding Normalization in Seek:

In the Data modal, you can click any ‘Selected’ indicator and add a normalization unit. This will divide the original indicator by that normalizer. You will also likely want to change your format to Percent.