At Algorex Health, we make a lot of maps and get a lot of questions about the tools we use to make them. So, I thought I would briefly describe our process and the tools we use. I covered the basics of geographic charting in a past blog post and some of that terminology will be repeated here.

We use to two primary data visualization systems at Algorex Health both of which were chosen for their support withthin the Python/Jupyter stack, ease of use, and/or aesthetic flexibility:

Altair: Altair is a python library that generates schemas that use Vega and Vegalite specification for visualization, those in turn rely on D3 for robust charting in a variety of formats. Altair becuase it relies on Vega-lite has a broad library of chart types including maps however Altair is all about the data. Every element of a map must be expressed in data and there is no base layer or tile layer. The tile layer is the part of a map you probably most recognize. It is the streets, borders, lakes, oceans that are automatically placed behind your map elements from a service like google maps. Altair maps don't support having a tile layer so when that is needed for interactivity we move on to Folium.
Folium: Folium is a python api that generates maps using the Leaflet javascript library which is one of the defacto standards for web based interactive maps. In rare cases, we write leafletJS directly but generally Folium supports everything we need.

For this post, we will utilize Folium. I think the difference between most of our maps and the ones our clients are creating are in the rich use of layers. Tableau and other GUI applications make great maps but their geographies are predefined and it can be difficult to layer data from multiple types of shapes. Folium, which visualizes maps only provides the most flexibility and interactivity.

In the example here we will layer the Median Income by census tract for the Boston area and transit stations on the MBTA. This will layer two types of data: Median income by shape, and transit stations which are specific points.

Loading the data¶

There are three data files we need for this blog post all freely accessible:

Locations and Shapes for the MBTA Mass Transit system which is available from the MassGIS agency.
Census Tract shapes, available from the US Census.
The median income data also available from the US Census.

We will load these data files using Pandas and GeoPandas to look at the data.

In [1]:

import pandas as pd
import geopandas as gpd

mbta_stations = gpd.read_file("data/mbta/MBTA_NODE.shp")
census_tracts = gpd.read_file("data/census_tracts_ma/tl_2018_25_tract.shp")
median_income_data = pd.read_csv("data/census_tracts_ma/ACS_17_5YR_S1903.csv", dtype={'GEO.id2':str}).rename(columns={
    'HC03_EST_VC02':'median_income', 'GEO.id2':'GEOID'
})

The MBTA data has a point geometry for each MBTA station.

In [2]:

mbta_stations.head()

Out[2]:

	STATION	LINE	TERMINUS	ROUTE	geometry
0	Ashmont	RED	Y	A - Ashmont C - Alewife	POINT (236007.5383913815 892693.0225654766)
1	Harvard	RED	N	A - Ashmont B - Braintree C - Alewife	POINT (231387.2739390731 902684.0159827843)
2	Kendall/MIT	RED	N	A - Ashmont B - Braintree C - Alewife	POINT (234087.917269595 901406.5508935936)
3	Capen Street	RED	N	Mattapan Trolley	POINT (234055.4375255629 890869.3748197779)
4	Tufts Medical Center	ORANGE	N	Forest Hills to Oak Grove	POINT (235900.3235432804 899934.3133242205)

Those values don't look like latitude and longitudes. We need these geometries to be standardized if we are going to layer everything together. We fix this by changing the CRS of the file which will convert the geometry to a standard value.

In [3]:

crs = {'init' :'epsg:4326'}
mbta_stations = mbta_stations.to_crs(crs)
mbta_stations.head()

Out[3]:

	STATION	LINE	TERMINUS	ROUTE	geometry
0	Ashmont	RED	Y	A - Ashmont C - Alewife	POINT (-71.06343014439 42.28388354622533)
1	Harvard	RED	N	A - Ashmont B - Braintree C - Alewife	POINT (-71.11890607237824 42.37402923068517)
2	Kendall/MIT	RED	N	A - Ashmont B - Braintree C - Alewife	POINT (-71.08619149399846 42.36241534488428)
3	Capen Street	RED	N	Mattapan Trolley	POINT (-71.08720482389784 42.2675530566276)
4	Tufts Medical Center	ORANGE	N	Forest Hills to Oak Grove	POINT (-71.06428155077408 42.34908000707444)

Much better now let's look at census tracts which we will convert as well just to be safe.

In [4]:

census_tracts = census_tracts.to_crs(crs)
census_tracts.head()

Out[4]:

	STATEFP	COUNTYFP	TRACTCE	GEOID	NAME	NAMELSAD	MTFCC	FUNCSTAT	ALAND	AWATER	INTPTLAT	INTPTLON	geometry
0	25	027	724100	25027724100	7241	Census Tract 7241	G5020	S	53031300	1639215	+42.2566908	-072.1581690	POLYGON ((-72.21781799999999 42.27018, -72.217...
1	25	027	759100	25027759100	7591	Census Tract 7591	G5020	S	25570221	1427830	+42.2096822	-072.0401777	POLYGON ((-72.07887699999999 42.214751, -72.07...
2	25	023	510503	25023510503	5105.03	Census Tract 5105.03	G5020	S	2561883	3203	+42.0942619	-071.0655746	POLYGON ((-71.080483 42.095539, -71.0801759999...
3	25	005	652000	25005652000	6520	Census Tract 6520	G5020	S	439925	0	+41.6266977	-070.9278159	POLYGON ((-70.931105 41.63193, -70.929509 41.6...
4	25	023	510600	25023510600	5106	Census Tract 5106	G5020	S	5303844	49230	+42.0805604	-071.0602559	POLYGON ((-71.07875799999999 42.088274, -71.07...

Lastly, here is the median income data. The median income for each census tract is on column 7 which we named median_income.

In [5]:

median_income_data.head()

Out[5]:

	GEO.id	GEOID	GEO.display-label	HC01_EST_VC02	HC01_MOE_VC02	HC02_EST_VC02	HC02_MOE_VC02	median_income	HC03_MOE_VC02	HC01_EST_VC04	...	HC02_EST_VC52	HC02_MOE_VC52	HC03_EST_VC52	HC03_MOE_VC52	HC01_EST_VC53	HC01_MOE_VC53	HC02_EST_VC53	HC02_MOE_VC53	HC03_EST_VC53	HC03_MOE_VC53
0	1400000US25001010100	25001010100	Census Tract 101, Barnstable County, Massachus...	1772	135	1772	135	47500.0	9839.0	1635	...	41.3	8.6	31595.0	9777.0	113	50	10.3	4.5	NaN	NaN
1	1400000US25001010206	25001010206	Census Tract 102.06, Barnstable County, Massac...	1503	157	1503	157	59042.0	13567.0	1449	...	32.5	16.3	22177.0	17369.0	60	48	9.0	7.2	80882.0	12715.0
2	1400000US25001010208	25001010208	Census Tract 102.08, Barnstable County, Massac...	796	134	796	134	62844.0	12976.0	772	...	38.9	14.4	43676.0	25949.0	19	22	5.8	7.2	NaN	NaN
3	1400000US25001010304	25001010304	Census Tract 103.04, Barnstable County, Massac...	1145	142	1145	142	71250.0	15039.0	1113	...	27.4	12.4	22986.0	18810.0	0	12	0.0	7.4	NaN	NaN
4	1400000US25001010306	25001010306	Census Tract 103.06, Barnstable County, Massac...	1301	150	1301	150	55694.0	9955.0	1208	...	26.9	14.6	NaN	NaN	10	14	1.9	2.6	NaN	NaN

5 rows × 243 columns

We need to merge the median income data with the tract shapes before we proceed which uses a little pandas magic. The GEOID is the common field between them. We will also select out only the columns we need from the median income data.

In [6]:

columns = ['median_income','GEOID']
census_tracts = census_tracts.join(median_income_data[columns],rsuffix='ic')

Make the Map¶

For the first map, we will use Altair. This map won't have base tiles but we will be able layer each element. The first layer will be a choropleth of median income followed by a layer of each MBTA station. Altair expects JSON data so there is some data wrangling before the map is drawn.

Filter Criteria¶

Because we do this all the time, it is helpful to filter out some of the data right now. This comes from having some hard lessons.

Restrict to only the counties you need. The MBTA mass transit lines are only operating in three counties so we can restrict to just those.
Remove census geographies with a lot of water. The field AWATER has the area in square meters of the census geography that is made of water while ALAND is the area of land. Anything with more water than land we are going to filter out.

In [7]:

filtered_census_tracts = census_tracts[(census_tracts.COUNTYFP.isin(('021','025','017'))) &
                                      (census_tracts.AWATER/census_tracts.ALAND < .55)]

To initiate a folium map, we will first set a few parameters.

In [8]:

import folium 
import branca.colormap as cm

folium_parameters = { 'location':[42.3601, -71.0589],
    'zoom_start':11,
    'width':300,
    'height':300}

boston_map = folium.Map(**folium_parameters)
boston_map

Out[8]:

Now we will begin to add layers starting with the census tracts. Because these are colored shapes, we will use the folium GeoJson class. This will draw one shape for every feature in the data. We will also style each feature using a linear color map so that the colors will be assigned based on the median income of the census tract. The colormap functions are supplied by the branca companion package.

In [9]:

color_scale= cm.linear.PuRd_07.scale(filtered_census_tracts.median_income.min(), 175000)
def tract_styles(feature):
    return {
        'fillColor': color_scale(feature['properties']['median_income']),
        'color': 'grey',
        'weight': .5,
    'dashArray': '5, 5'
        ,'fillOpacity':.8
    }
    
choropleth = folium.GeoJson(filtered_census_tracts.dropna().to_json(), style_function=tract_styles)
choropleth.add_to(boston_map)

Out[9]:

<folium.features.GeoJson at 0x7f7a7c52c080>

Adding the stations works a little differently. We could use the same GeoJson class above but the result would be that the stations would be visualized as markers which looks a little funny.

In [10]:

marker_only_map = folium.Map(**folium_parameters)

folium.GeoJson(mbta_stations.to_json()).add_to(marker_only_map)
marker_only_map

Out[10]:

Because we want to control the points drawn and make them circles not map markers, we will instead add each one to the map individually calling a style function that will assign it's color, based on its MBTA line.

In [11]:

def station_style(station):
    def station_colors(line):
        station_colors = {'ORANGE':'orange','BLUE':'blue','RED':'red','SILVER':'silver','GREEN':'green'}
        return station_colors.get(line,'grey')
    return folium.CircleMarker(
        radius=2,
        location=(station.geometry.y, station.geometry.x),
        color=station_colors(station.LINE),
        fill=True,
        fill_opacity=.8)
stations = folium.FeatureGroup()
for station in mbta_stations.itertuples():
    station_style(station).add_to(stations)

stations.add_to(boston_map)

Out[11]:

<folium.map.FeatureGroup at 0x7f7a6b570f98>

The other note is that the base layer, which defaults to OpenStreetMaps, is a little crowded. We don't need to show how someone navigates in Boston, we just need the minimal geography to give the viewer their bearings. To do this, we will change the default map tiles. Folium and Leaflet support a web standard for fetching tiles but you can also use any of the freely available tile styles shown in the leaflet documentation.

For this chart a basic gray style works best.

In [12]:

boston_map.add_tile_layer(**{'tiles':'https://server.arcgisonline.com/ArcGIS/rest/services/Canvas/World_Light_Gray_Base/MapServer/tile/{z}/{y}/{x}',
    'attr':'Tiles &copy; Esri &mdash; Esri, DeLorme, NAVTEQ'})
boston_map.width = 500
boston_map

Out[12]:

This blog bost originally appeared at Algorex Health

In [ ]: