Skip to main content

Formatting Data for Upload

Learn how to format your files for upload, whether or not they contain geographic information.

Jennifer Funk avatar
Written by Jennifer Funk
Updated over a week ago

Format options

Before you upload your own data into mySidewalk, it’s important to make sure your file is clean, clear, and well-structured. Formatting your data correctly upfront will help the platform interpret it accurately, speed up your upload process, and unlock better insights.

To help you out, we’ve created a two-part series that walks through the most common formats:

  • Option 1: Formatting Files with Geographic Information

    • For data tied to multiple places like ZIP codes, counties, or cities.

    • This format is appropriate in most cases.

  • Option 2: Formatting Files without Geographic Information

    • Files might not include geographic information within the file itself if:

      • they're super simple, only containing a few rows, and/or

      • all the data within is for a single location.

Both guides include examples, tips for cleaning up your data, and even prompts you can copy/paste into AI (like ChatGPT or Claude) to help with formatting.

Want to skip some of the prep work?

Our Smart Upload Formatting Guide shows how you can upload your data using our newest AI-supported feature—which accepts more flexible formats and helps you clean and map your data during the upload process.

Accepted file types

In addition to format, you should also select a filetype to upload. CSV is what we recommend.

mySidewalk supports uploading CSV, GeoJSON, SHP, KML, GML, and GDB. But CSVs are what allow you to take full advantage of mySidewalk's features.

Why use CSV? Uploading as a CSV enables you to match the places in your file with the places in our library. That means you can:

  • Create bivariate maps with your data and mySidewalk data

  • Create correlations with your data and mySidewalk data

  • Normalize your data by mySidewalk data like total population, area, and more.


Option 1: Files with Geographic Information

Let’s walk through how to prepare your file so that each row represents a place (like a county), and each column contains a specific variable and time period. You’ll also learn how to ask an AI assistant to help clean things up.

Proper formatting leads to smoother uploads, accurate maps, and clearer insights—getting it right from the start saves time and ensures better results.

🧭 Step 1: One row = one place

Each row in your file should represent a single location (ZIP, city, county, etc.).

Example:

County Name

Jackson County

Clay County

Hint: For files that contain data that all represents one location, you can use this format as long as you keep to the one row = one place rule. But you might find it simpler to use the format for files without geographic information. Do what works best for you.

📅 Step 2: Time belongs in the column name

Don't use a separate “Year” column. Instead, include the year (or other time period) in the column name.

County Name

Grants_2020

Grants_2021

Payroll_2021

Jackson County

3,000,000

3,100,000

2,400,000

Remember to use consistent time formatting. Examples:

  • Grants_2020

  • Grants_Jan_2021

  • Grants_2021_q2

👥 Step 3: Include breakdowns as separate columns

If you have subcategories (e.g., executive vs. non-executive payroll), use clear, separate column names:

County Name

Payroll_Total_2021

Payroll_Exec_2021

Payroll_NonExec_2021

Jackson County

2400000

900000

1500000

🧹 Step 4: Clean before upload

✅ Do:

  • Only one header row

  • Fill empty cells with “0” or “null”

  • Remove symbols like $, %, and commas

🚫 Don’t:

  • Put “Year” in its own column

  • Include extra header rows or totals as separate rows

🤖 Step 5: Ask AI to help you reformat (optional)

If your file is messy or structured differently, you can ask an AI to reformat it.

Try something like:

  • “I have a spreadsheet where each row is a county, and columns are ‘2020 Grants’, ‘2021 Grants’, ‘2021 Payroll’. Can you help me rename the columns to use underscores, like ‘Grants_2020’, and remove % or $ symbols?”

Or:

  • “My file has rows for each combination of county and year, with time in a column. Can you pivot it so each column becomes a variable like ‘Grants_2020’?”

🧪 Bringing it All Together: A Final Example

County

Grants_2020

Payroll_Total_2021

Payroll_Exec_2021

Payroll_NonExec_2021

Jackson County

3000000

2400000

900000

1500000

Now you’re ready to upload!


Option 2: Files without Geographic Information

If your data isn’t tied to multiple geographic areas—but still includes multiple variables and time periods—you should use a wide format.

In this structure, each row represents one subject (like an organization or summary report), and each column represents a specific variable + time combination.

To help interpret your data more easily, include a “Variable Label” column as the first column. This column gives a short, human-friendly name or description of the dataset or subject.

🧭 Step 1: One row = one subject or dataset

Each row represents a single dataset summary or reporting entity. The “Variable Label” column gives a descriptive label for the row.

Example

Variable Label

Grants

Payroll

Summary Totals

2800000

2200000

📅 Step 2: Time belongs in the column name (optional)

If your data file contains a time element, each column header should include both the variable name and time period.

Example

Variable Label

Grants_2019

Grants_2020

Summary Totals

2800000

3000000

➕ Step 3: Add more variables and breakdowns

Each variable and time combo gets its own column. You can expand this structure by adding other variable types (like Payroll), and use consistent naming.

Example

Variable Label

Grants_2019

Grants_2020

Payroll_2019

Payroll_2020

Summary Totals

2800000

3000000

2200000

2400000

Example with Breakdowns

Variable Label

Grants_2020

Payroll_Total_2020

Payroll_Exec_2020

Payroll_NonExec_2020

Summary Totals

3000000

2400000

900000

1500000

🧹 Step 4: Clean before upload

✅ Do:

  • Use a Variable Label column to provide context while you upload

  • Use one row per subject/dataset

  • Fill blanks with 0, null, or none

  • Remove formatting like commas, $, %

🚫 Don’t:

  • Use separate rows for each variable or time period

  • Put the year in a separate column

  • Mix naming styles (e.g. Pay)Exec_2020 vs. Payroll_Exec_2020)

🤖 Step 5: Ask AI to Help You Reformat

If your file is in long format (e.g. one row per variable per year), you can ask the AI assistant to help restructure:

  • “Can you pivot this long-format table into a wide format with one row per dataset, one column per variable-year, and include a variable label at the beginning of each row?”

Now your file is fully prepped for upload into mySidewalk!

❓ Need Additional Help?

Did this answer your question?