Gaining insights from data is not as easy as just looking at it. Discovering something about a community requires some knowledge of how to interpret data. This article provides an overview of statistical concepts you can explore using Seek.

Statistics in Seek's Table View

In the Table view, check the box next to "Show Statistics" to see most statistics at the bottom, or check "Highlight outliers" to view outliers within the table.

Mean

Definition: The mean (average) is the center value in a set of numbers, calculated by taking the sum of all values and dividing by the total number of values.
Examples:
- Health: Insurance analysts often calculate the mean age of the individuals they provide insurance for so they can know the average age of their customers.
- Community Development: calculate the mean age of the individuals that reside in the community so they can know that average age

Median

Definition: The median is the middle value in a set of numbers ordered low to high, with half of the values above and half below.
Examples:
- Development analysts can view the median income in certain regions so that they can be informed of what the typical “middle” salary is.

Mode

Definition: The mode is the value that appears most frequently in a dataset.

Standard Deviation

Definition: The standard deviation (SD) measures the amount values deviate from the mean. A low SD indicates values are clustered around the mean (the data values are all similar), while a high SD indicates more spread (the data values differ more).

Outliers

Definition: Outliers are values that significantly differ from the rest, usually calculated as more than three standard deviations from the median. If something is an outlier, it is usually an indication to investigate further.
Examples:
- In the example of unemployment rate (pictured), a particular census tract is very high. There is likely something to be reviewed more closely; perhaps the number of residents of working age is very small, or residents have low economic and educational opportunities.

Statistics in Seek's Distribution View

Select the data you'd like to explore further using the dropdown at the top of the Distribution view.

Regions in Order

Definition: Regions can be ordered from highest to lowest to show how they compare to each other.
Examples:
- Qualified Census Tracts (pictured): this Region chart is a good way to see geographies that have a high concentration of qualified census tracts.
  - Choose Qualified Census Tracts as Data.
  - Choose a county as your Region with ZIP Codes selected as your subregions.
  - When you hover over the tallest bars (on the left) it will show which zip codes have the most qualified census tracts within your selected region.

Histogram

Definition: A histogram divides a dataset's range into equal intervals to show how much the data values differ.
Examples:
- Digital Equity (pictured): When you choose an Internet Access variable, the Histogram will show you how many regions have a high and low amount of households with internet access.
  - If you have a tall bar (high number of regions) near 0, then looking for internet access grants is a good idea for your community.

Statistics in Seek's Relationships View

View the full correlation matrix or select the relationship you'd like to explore further by clicking on the square where the two indicators intersect.

Correlation Matrix

Definition: A correlation matrix shows how datasets relate to each other by measuring the strength and direction of their relationships with an "r" value.
Direction: A positive direction or correlation means as one variable's values go up, the other one does too. A negative direction means as one variable's values go up, the other goes down.
Strength: The "r" value can be between -1 and 1. Values near 0 mean there is a limited or no relationship between two variables. The closer they are to -1 or 1, the stronger the relationship.

Regression Line

Definition: In a scatter plot or correlation, a regression line visualizes the relationship's direction and strength between two correlated datasets. Its slope is the "r" value described above.

Other

Normalization

Definition: Normalization modifies data to a common scale for easier comparisons, often shown as percentages. It allows you to compare across geographies and geographic scales to see how common something is despite differences in population sizes.
Finding Normalization in Seek: When you have the Data Selection screen open and have picked some data, click on a piece of data in the right panel. Then select what you want to divide the data by.