At Algorex Health, we make a lot of maps and get a lot of questions about the tools we use to make them. So, I thought I would briefly describe our process and the tools we use. I covered the basics of geographic charting in a past blog post and some of that terminology will be repeated here.
We use to two primary data visualization systems at Algorex Health both of which were chosen for their support withthin the Python/Jupyter stack, ease of use, and/or aesthetic flexibility:
- Altair: Altair is a python library that generates schemas that use Vega and Vegalite specification for visualization, those in turn rely on D3 for robust charting in a variety of formats. Altair becuase it relies on Vega-lite has a broad library of chart types including maps however Altair is all about the data. Every element of a map must be expressed in data and there is no base layer or tile layer. The tile layer is the part of a map you probably most recognize. It is the streets, borders, lakes, oceans that are automatically placed behind your map elements from a service like google maps. Altair maps don't support having a tile layer so when that is needed for interactivity we move on to Folium.
- Folium: Folium is a python api that generates maps using the Leaflet javascript library which is one of the defacto standards for web based interactive maps. In rare cases, we write leafletJS directly but generally Folium supports everything we need.
For this post, we will utilize Folium. I think the difference between most of our maps and the ones our clients are creating are in the rich use of layers. Tableau and other GUI applications make great maps but their geographies are predefined and it can be difficult to layer data from multiple types of shapes. Folium, which visualizes maps only provides the most flexibility and interactivity.
In the example here we will layer the Median Income by census tract for the Boston area and transit stations on the MBTA. This will layer two types of data: Median income by shape, and transit stations which are specific points.
Loading the data¶
There are three data files we need for this blog post all freely accessible:
- Locations and Shapes for the MBTA Mass Transit system which is available from the MassGIS agency.
- Census Tract shapes, available from the US Census.
- The median income data also available from the US Census.
We will load these data files using Pandas and GeoPandas to look at the data.
import pandas as pd
import geopandas as gpd
mbta_stations = gpd.read_file("data/mbta/MBTA_NODE.shp")
census_tracts = gpd.read_file("data/census_tracts_ma/tl_2018_25_tract.shp")
median_income_data = pd.read_csv("data/census_tracts_ma/ACS_17_5YR_S1903.csv", dtype={'GEO.id2':str}).rename(columns={
'HC03_EST_VC02':'median_income', 'GEO.id2':'GEOID'
})
The MBTA data has a point geometry for each MBTA station.
mbta_stations.head()
Those values don't look like latitude and longitudes. We need these geometries to be standardized if we are going to layer everything together. We fix this by changing the CRS of the file which will convert the geometry to a standard value.
crs = {'init' :'epsg:4326'}
mbta_stations = mbta_stations.to_crs(crs)
mbta_stations.head()
Much better now let's look at census tracts which we will convert as well just to be safe.
census_tracts = census_tracts.to_crs(crs)
census_tracts.head()
Lastly, here is the median income data. The median income for each census tract is on column 7 which we named median_income
.
median_income_data.head()
We need to merge the median income data with the tract shapes before we proceed which uses a little pandas magic. The GEOID is the common field between them. We will also select out only the columns we need from the median income data.
columns = ['median_income','GEOID']
census_tracts = census_tracts.join(median_income_data[columns],rsuffix='ic')
Make the Map¶
For the first map, we will use Altair. This map won't have base tiles but we will be able layer each element. The first layer will be a choropleth of median income followed by a layer of each MBTA station. Altair expects JSON data so there is some data wrangling before the map is drawn.
Filter Criteria¶
Because we do this all the time, it is helpful to filter out some of the data right now. This comes from having some hard lessons.
- Restrict to only the counties you need. The MBTA mass transit lines are only operating in three counties so we can restrict to just those.
- Remove census geographies with a lot of water. The field
AWATER
has the area in square meters of the census geography that is made of water whileALAND
is the area of land. Anything with more water than land we are going to filter out.
filtered_census_tracts = census_tracts[(census_tracts.COUNTYFP.isin(('021','025','017'))) &
(census_tracts.AWATER/census_tracts.ALAND < .55)]
To initiate a folium map, we will first set a few parameters.
import folium
import branca.colormap as cm
folium_parameters = { 'location':[42.3601, -71.0589],
'zoom_start':11,
'width':300,
'height':300}
boston_map = folium.Map(**folium_parameters)
boston_map
Now we will begin to add layers starting with the census tracts. Because these are colored shapes, we will use the folium GeoJson
class. This will draw one shape for every feature in the data. We will also style each feature using a linear color map so that the colors will be assigned based on the median income of the census tract. The colormap functions are supplied by the branca
companion package.
color_scale= cm.linear.PuRd_07.scale(filtered_census_tracts.median_income.min(), 175000)
def tract_styles(feature):
return {
'fillColor': color_scale(feature['properties']['median_income']),
'color': 'grey',
'weight': .5,
'dashArray': '5, 5'
,'fillOpacity':.8
}
choropleth = folium.GeoJson(filtered_census_tracts.dropna().to_json(), style_function=tract_styles)
choropleth.add_to(boston_map)
Adding the stations works a little differently. We could use the same GeoJson
class above but the result would be that the stations would be visualized as markers which looks a little funny.
marker_only_map = folium.Map(**folium_parameters)
folium.GeoJson(mbta_stations.to_json()).add_to(marker_only_map)
marker_only_map
Because we want to control the points drawn and make them circles not map markers, we will instead add each one to the map individually calling a style function that will assign it's color, based on its MBTA line.
def station_style(station):
def station_colors(line):
station_colors = {'ORANGE':'orange','BLUE':'blue','RED':'red','SILVER':'silver','GREEN':'green'}
return station_colors.get(line,'grey')
return folium.CircleMarker(
radius=2,
location=(station.geometry.y, station.geometry.x),
color=station_colors(station.LINE),
fill=True,
fill_opacity=.8)
stations = folium.FeatureGroup()
for station in mbta_stations.itertuples():
station_style(station).add_to(stations)
stations.add_to(boston_map)
The other note is that the base layer, which defaults to OpenStreetMaps, is a little crowded. We don't need to show how someone navigates in Boston, we just need the minimal geography to give the viewer their bearings. To do this, we will change the default map tiles. Folium and Leaflet support a web standard for fetching tiles but you can also use any of the freely available tile styles shown in the leaflet documentation.
For this chart a basic gray style works best.
boston_map.add_tile_layer(**{'tiles':'https://server.arcgisonline.com/ArcGIS/rest/services/Canvas/World_Light_Gray_Base/MapServer/tile/{z}/{y}/{x}',
'attr':'Tiles © Esri — Esri, DeLorme, NAVTEQ'})
boston_map.width = 500
boston_map
This blog bost originally appeared at Algorex Health