# Why do I need this?

Gaining insights from data is not as easy as just looking at it. Discovering something about a community requires some knowledge of how to look at and interpret data. Below, we are going to walk you through some examples where you can use Seek to gain insights.

# What can I learn here?

Seek makes it possible to quickly and easily perform the following statistical calculations on community or place-based data, so let’s go over them.

For each statistical concept, we’ve provided:

A simple plain-language definition

Real-world examples for understanding communities

Where and how to use the concept in Seek

Links to more in-depth explanations

**Mean**

The mean (average) is the center value in a set of numbers. It is calculated by taking the sum of all the values and dividing by the total number of values. The mean is also used to find the *standard deviation* for the dataset

Real-world examples:

Health: Insurance analysts often calculate the mean age of the individuals they provide insurance for so they can know the average age of their customers.

Community Development: calculate the mean age of the individuals that reside in the community so they can know that average age

Finding *Mean* in Seek:

In the **Table** view, you can find the mean or average on a variable you’ve searched by checking the box next to **Show Statistics**. This will add statistics, including Mean, to the bottom of the table.

**Median**

The median is the middle value in an ordered set of numbers. Half of the values in the set are greater than (>) the median, and half of the values are less than (<) the median. If a value is more than three *standard deviations* away from the median, it is considered an outlier in the dataset.

Real-world examples:

Development analysts can view the median income in certain regions so that they can be informed of what the typical “middle” salary is.

Finding *Median* in Seek:

In the **Table** view, you can find the median on a variable you’ve searched by checking the box next to **Show Descriptive Statistics**. This will add statistics, including median, to the bottom of the table.

**Mode**

The mode is the value that appears most frequently in a set of data. It is the statistical value most likely to be sampled because it can reveal important information about the whole dataset.

Finding *Mode* in Seek:

In the **Table** view, you can find the mode on a variable you’ve searched by checking the box next to **Show Descriptive Statistics**. This will add statistics, including mode, to the bottom of the table.

**Standard Deviation**

The standard deviation (SD) is the *standard* (normal) amount that values *deviate* (differ) from the *mean* (average). A low SD indicates low variability, so data is more clustered around the *mean* (center). A high SD indicates higher variability, so data is more spread out across the *range*. The variability of values can mean many things, so it is important to understand the context of the data to determine what the spread might mean.

Finding *Standard Deviation* in Seek:

In the **Table** view, you can find the standard deviation on a variable you’ve searched by checking the box next to **Show Descriptive Statistics**. This will add statistics, including standard deviation, to the bottom of the table.

**Outliers**

Outliers are values in a dataset that differ significantly from the rest. They are found by calculating the differences between the data values and the *median*. If the difference is more than three *standard deviations*, it is considered an outlier. An outlier can mean many things, so it is important to understand the context of the data to determine if the difference is meaningful.

Real-world examples:

If something stands out as an outlier, it is usually an indication to investigate further. In the example of unemployment rate, if there is a particular census tract that is very high, there is likely something to be reviewed more closely. Perhaps the number of residents of working age is very small or there might be a transportation issue keeping people from working.

Finding *outliers* in Seek:

In the **Table** view, you can find the outliers on a variable you’ve searched by checking the box next to **Highlight outliers**. This will highlight and identify the outliers on the table.

**Regions in Order**

Each region is ordered from highest to lowest along the x-axis. The y-axis highlights the relative difference between the regions.

Real-world examples:

Qualified Census Tracts: this Region chart is a good way to see geographies that have a high concentration of qualified census tracts. For example, choose Qualified Census Tracts as Data and a county with the zip code sub-geography as your region. When you hover over the tallest bars (on the left) it will show which zip codes have the most qualified census tracts from what you selected.

Finding *Regions in Order* in Seek:

In the **Distribution** view, you can find the Regions in Order on a variable you’ve selected. This chart will change if you click the “previous” or “next” buttons at the top to change variable.

**Histogram**

A histogram divides the total *range* (the *minimum* value to *maximum* value) into 100 equal *intervals* along the x-axis. The number of regions that fit within each interval is represented on the y-axis. Histograms can help to visualize the variability within the dataset in a more compact way.

Real-world examples:

Digital Equity: When you choose an Internet Access variable, the Histogram will show you how many regions have a high and low amount of households with internet access. If you have a tall bar (high number of regions) near 0, then looking for internet access grants is a good idea for your community.

Finding *Histogram* in Seek:

In the **Distribution** view, you can find the Histogram of Regions on a variable you’ve selected by scrolling down. This histogram will change if you click the “previous” or “next” buttons at the top to change variable.

**Correlation Matrix**

The correlation matrix helps to illustrate how datasets relate to each other by building a grid (matrix) of the *r values* for every intersection within the data. *Correlations* measure how closely two datasets relate to each other. The relationship can be positive (if *X* increases, *Y* also increases) or negative (if *X* increases, *Y* will decrease). The coefficient (*r*) is used to express the strength and direction of the relationship. The *r value* ranges from 1 (perfectly positive correlation) to -1 (perfectly negative correlation). A value of zero means the datasets do not correlate.

Finding *Correlation Matrix* in Seek:

In the **Relationships** view, you will find correlation matrix for all the variables you have selected across all regions you have selected.

**Regression Line**

When two datasets are visualized in a scatter plot, if there is a correlation, a regression line can be drawn that best fits the data and visualizes the direction and strength of the relationship.

Finding *Regression Line* in Seek:

In the **Relationships** view, you can find the regression line by picking any colored box inside the correlation matrix.

**Normalization**

Normalization modifies data values to have a common scale that can make comparisons easier. For example, if you *normalize* (divide) by the *total population* you can see how regions relate to each other as percentages of the population instead of trying to compare absolute data values without the context.

Real-world examples:

Community Development: median income per capita. This is a calculation of median income normalized (divided by) Total population and will give you an indication of how wealthy your constituency is.

Finding *Normalization* in Seek:

In the **Data modal**, you can click any ‘Selected’ indicator and add a normalization unit. This will divide the original indicator by that normalizer. You will also likely want to change your **format** to Percent.