Before you start
mySidewalk allows you to add your own data to be used in visualizations alongside mySidewalk data. We know that your data can compliment the data that mySidewalk curates so we want to make sure you can use it! This can allow you to build a data story that showcases your community in the best way possible.
However, data is particular! We need to clean and format the data so it can be used within mySidewalk. We accept several different file formats but it is important to remember that data in mySidewalk needs a TIME, PLACE and PURPOSE. Over the next five minutes, you will gain an in depth understanding of those three things and from then on, uploading and leveraging data in mySidewalk will be a breeze!
How do I prepare my data for upload
You have to clean and prepare your data before it can be uploaded into mySidewalk. First, clean out any blanks or formatting. We can accept known symbols like $$ and %%. Next, make sure that you only have one header for each column. Feel free to use names formatted like Covid_Total Cases and Covid_Total Vaccines if you have some subcategories you want to leverage. Finally, make sure that your first column represents the geography you want to upload (there are some exceptions, we will talk about those later). We will talk more about geography later as there are several ways to incorporate that!
If you are looking to use a custom set of data from a spreadsheet, .csv, or from your own spatial data files, you can add these to mySidewalk through the Add Layer option. If you have data available in a file that could easily be dropped into GIS, you should have no trouble dragging and dropping it into the upload layer section. Below are some more tips and tricks to make sure you are good to go!
We accept several file formats including .csv, .shp and .gdb among others. Look here for more information about the file type you want to use!
Note: uploads must be smaller than 50MB in size, contain a GIS layer with less than 5,000 features and be one of our supported file formats listed on the upload page.
Geometry / Geography is required to use your data in mySidewalk.
If you are doing a layer upload, make sure you geography reference is in the first column.
If you are uploading a .csv, you have two choices for upload. You may include your own geographic data or use ours! To use your own, you may include a column with latitude and longitude. Make sure to use EPSG: 3857 coordinates for latitude and longitude and name that column x, y or lat, long. To use ours, clean your data first, then go here.
Typically, each upload should contain only one shape/feature. For example, if you have the shape information for several council districts, and want to show mySidewalk data based on those boundaries, you will need to upload each one individually as its own file.
Clean Your Data (required)
Make sure you only have one row of column headers and it’s at the top of your file. Use headers formatted like Covid_Total Cases and Covid_Total Vaccines if you have some subcategories you want to leverage in the names. (read about combining headers here if you are using Excel).
Avoid commas and do not use the exact same name in more than one column.
Remove any unnecessary data. It’s essential to upload only the data you need. A mass upload including data points you don’t plan to use in mySidewalk will cause issues later.
Remove any blank columns or rows and put a value in blank cells (“0”, if applicable, otherwise “null” or “none” will do). Having extra blank rows and/or columns in your file will cause the upload to fail.
NOTE: Think critically about your data! Was there a zero measurement of your indicator, or was a measurement not collected / not available? For example, were there really zero opioid fatalities in your zip code, or was data not collected for your zip code? If there were zero fatalities, put the number zero in your cell. If not, leave that cell blank, or put “NULL” or “None” in that cell instead.
A “total” value needs to be represented as a separate column. Having a spreadsheet full of ZIP codes and an additional row for the overall county total, for example, will cause issues when you try to map the data within the platform. Adding your totals as a column instead of a row will allow you to normalize your data by that total. It’s okay if the cells in that column all contain the same value.
Leveraging Time (optional)
Specifying the time period (year, month, day) in the column name allows the mySidewalk data upload system to recognize it as time period. This will not only lead to a cleaner data selection experience but will also enable you to make a time series chart!
It works best if you put the time period at the beginning or end of a column name (i.e. ‘2010_population’ or ‘population_April_01_2020’)
Be sure to be consistent in how the time period is formatted. Changing between 'Jan 2010' and 'January 2010' will result in the system creating 2 different months
The data upload system currently only recognizes dates between January 1, 1900 and January 1, 2100.
Year - Use a 4 digit year
i.e. 2010, 2020, 2050, 1961
Month - include month and year
The following formats all work to specify the month January 2019.
January_2019 | 2019_January | January 2019 | Jan 2019 | 2019_01 | 01 2019
Day - include the day, month, and year
The following formats all work to specify the day June 30, 2019.
2019_06_30 | 2019_June_30 | 06_30_2019 | June 30 2019
The data upload can also recognize variables, the name for the contained data. Population, households, total 311 calls, total crime, people with health insurance, and people in poverty are all examples of variables.
Word use & spelling matters for the system to recognize the variable as the same thing (i.e. ‘2010_pop’ and ‘2020_population’ will be mapped as two different variables of 'pop' and 'population', respectively)
You can have multiple variables in the same layer (i.e. households, population, area, educational attainment, etc)
By repeating the name 'population' and 'people_in_poverty' with the 3 different years, the system will recognize 2 variables: 'population' and 'people_in_poverty' and then pair each of those with the 3 years: 2010, 2015, and 2020.
Why does this matter? Well, it means in the selection experience, you will be given the option to choose between 2 variables 'population' or 'people_in_poverty' (rather than all 6) and then be able to choose between 3 years. It also means you will be able to view each variable in a time series chart.
Using mySidewalk Geographies
You may assign your data to the geographic information that mySidewalk already maintains. We call this georeferencing and there are a couple of options to make this happen.
Manually assign a single geographic location. This is when an entire list of indicators only applies to a single location. Think about infectious cases over time, you can make your rows the times and the columns can be things like “cases” and “tests”. When you choose to georeference to a single geography, you can assign all of that to 1 place (like a county or metropolitan area). Keep in mind that the time will append itself to the column headers. Learn more here.
Manually assign multiple geographic locations. This is used when you have multiple locations (they should be your rows!) and the same set of indicators for each location. Make sure not to duplicate any rows when you do this, the indicators should be the columns, not multiple rows. Learn more here.
Automatically assign using SHID values. A SHID value is something that mySidewalk uses to uniquely identify a specific geographic location. For example, there will be one and only one SHID for each county in the US. You can export the SHID values from Seek and put them as the row headers in your data spreadsheet (put SHID in the top row for this column). mySidewalk will automatically identify and assign the geographies you include in this scenario. You can still change them though, if a mistake occurred! Learn more here.
Success! What Now?
A successful upload will result in a layer being created and show you the layer management page. Good for you! You made it happen!
On this page, you can: update the name of your file, add tags, add a description and change the label of your file. It will also reveal if you have uploaded time in a way the system can recognize.
Pro Tip: choose a field to assign as the label – geography often makes the most sense
Let’s talk a little bit about tags. They are a feature that can help you search for your data in the future. Some people choose to use tags to label categories, for others, it makes more sense to use time or even geography. However you choose to use tags, make sure you use a system that makes sense to you! After you make one, the tag will appear in the top left corner of every screen where you can choose your own data. This is a great way to get to exactly what you need, fast!
From here, you can click “New Report” to start building visualizations or choose “Reports” from the menu and choose an existing report. The layer you built will be accessible from components if you choose “Your Data”.
Bonus! You get a Gold Star!
There are a handful of SUPER COOL things that you can do when you start to get comfortable with uploading your own data. When you leverage the georeferencing tool (assigning mySidewalk geographies to your data) you can start to use our data and your own data in the same visualizations. Correlations and bivariate maps are two easy ways to see your own data with ours! Make sure you have georeferenced your data, then select a correlation for user data. Select a layer and then when you choose your second data set, choose mySidewalk to leverage all the data values mySidewalk provides! You can do the same thing with bivariate maps.
If you uploaded your own geography using a geoJSON or other spatial file format, you can use that geographic boundary for more than just showing your own data. That boundary is now available as a ‘Custom Boundary’ and can be used to apportion mySidewalk data to that shape. Make sure to only put one feature per upload but then it can be used anywhere you can apportion custom geographies!
Taking the time one step further, you can specify a group name for your years, months, or days using a double underscore "__" separator. The format would be time_period__group_name_for_time_periods__variable
The example above has the group of time periods 'months' specified for the variable 'population' and the group of time period 'years' specified for the variable 'employment_rate'