Case Study: SQL Queries & Geospatial Visualization

2 minute read

I recently completed a case study as a second round interview for a startup based in NYC. The Objective was to confirm my ability to effectively query a SQL database and also show my ability to create a dataframe in Python and plot pertinent population data geospatially.

If you’re interested in looking under the hood you can see the code I used to complete the challenge here.

First up, SQL queries

Querying a SQL database and utilizing basic aggregate functions are fairly straightforward tasks. One could easily learn how to complete these queries through any of a number of free online tutorials. Window Functions aren’t as straightforward. A more advanced feature of SQL, Window Functions allow you to perform aggregate functions on a subset of data and often allow you to do so in a way that optimizes computational performance. One might utilize a Window Function in SQL when looking to calculate a moving average over a specific time frame.

One query in this case study was employed to: “calculate the percentage size of the city relative to the city with the largest population in the state.”

SELECT states.name, cities.name, cities.population, MAX(population)
    OVER (PARTITION BY state_id) AS max_state_population,
    (CAST(population AS float) / (MAX(population)
    OVER (PARTITION BY state_id))*100)
FROM cities
JOIN states ON states.id = cities.state_id
ORDER BY states.name;

Python Dataframe and Geospatial Visualization

Next, I connected to my local SQL database in Python to create a dataframe in Python with the goal of:

Determining Longitude and Latitude of all cities and inserting these into dataframe

I did this by creating a basic function to look up the latitude and longitude for a list of values using geopy. I inserted the resulting lists back into the dataframe and then plotted populations of active cities.

from geopy.geocoders import Nominatim
lat_list = []
long_list = []

def LatLong(values):
    for x in values:
        geolocator = Nominatim()
        location = geolocator.geocode(x)
        lat_list.append(location.latitude)
        long_list.append(location.longitude)
        
cities = df['city'].tolist()
LatLong(cities)

Plotting population on maps via a point with size relatively to population.

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np
 
m = Basemap(projection='mill', 
            llcrnrlat=20,
            llcrnrlon=-130,
            urcrnrlat=50,
            urcrnrlon=-60)
m.drawcountries()
m.drawcoastlines()

lon = df['longitude'].values
lat = df['latitude'].values
size = df['national_population'].values
    
xs, ys = m(lon, lat)
m.scatter(xs, ys, marker = 'o', color = 'red', s=size*1000)


m.shadedrelief()

plt.legend
plt.title('US Metros')
plt.show()

m = Basemap(projection='mill', 
            llcrnrlat=20,
            llcrnrlon=-130,
            urcrnrlat=45,
            urcrnrlon=-110)
m.drawcountries()
m.drawcoastlines()

lon = westcoast['longitude'].values
lat = westcoast['latitude'].values
size = westcoast['westcoast_pop'].values
    
xs, ys = m(lon, lat)
m.scatter(xs, ys, marker = 'o', color = 'red', s=size*500)


m.shadedrelief()

plt.legend
plt.title('West Coast Metros')
plt.show()

m = Basemap(projection='mill', 
            llcrnrlat=30,
            llcrnrlon=-90,
            urcrnrlat=50,
            urcrnrlon=-60)
m.drawcountries()
m.drawcoastlines()
# m.drawstates(color='black')

lon = northeast['longitude'].values
lat = northeast['latitude'].values
size = northeast['northeast_pop'].values
    
xs, ys = m(lon, lat)
m.scatter(xs, ys, marker = 'o', color = 'red', s=size*500)


m.shadedrelief()

plt.legend
plt.title('Northeast Metros')
plt.show()

If you’re interested in looking under the hood you can see the code I used to complete the challenge here.

Share on

Twitter Facebook Google+ LinkedIn

Daniel Demoray

If you’re interested in looking under the hood you can see the code I used to complete the challenge here.

First up, SQL queries

Python Dataframe and Geospatial Visualization

Determining Longitude and Latitude of all cities and inserting these into dataframe

Plotting population on maps via a point with size relatively to population.

If you’re interested in looking under the hood you can see the code I used to complete the challenge here.

Share on