Quantary

Michael Wimsatt's GitHub blog

Home
About Quantary
Contact Me

View my GitHub profile
View my vanity page

Was March Really So Cold?

Wow, this winter seemed long, didn’t it? That really hit home in March when the cold weather just seemed to go on and on. It seemed like every day the temperature “felt like it was in the 20’s (Fahrenheit). In NYC, in particular, it seemed like - even when the temperature rose to a balmy low 40’s, the wind just would not give up.

Well, I just discovered that the makers of the excellent Dark Sky app have released a new weather app called forecast.io and that it has an API with not only the current forecast, but “current” and “forecast” data going back up to 60 years in some places.

So, rather than keep speculating about whether this March was really so cold, I can find out for sure - and learn some new data analysis tricks while I’m at it.

Get everything set up

I’m leveraging a code snippet I found to handle the forecast.io API, with some minor modifications. Also, I stored some basic data in a config file (initially to protect my forecast.io API key, but also to make it more modular).

from requests_forecast import Forecast
from config import *

forecast = Forecast(FORECAST_API_KEY)
forecast.timezone = LOC_TIMEZONE

Select the dates

Initially, I’m choosing to pull weather at noon each day of March for each of the years 2003 - 2013 (the Marches I have lived in New York). If I were really good, the parameters for this would be in the config file too.

from datetime import datetime
dates = []
years = range(2003, 2014)
for year in years:
    days = range(1,32)
    for day in days:
        dates.append(datetime(year=year, month=3, day=day, hour=12))

Download the data

I’m downloading the data here and storing serializing it with pickle so I don’t need to keep using API calls after the first successful pull.

import pickle as pkl
forecasts = [forecast.get(LOC_LAT, LOC_LONG, time=date) for date in dates]
pkl.dump(forecasts, open('data/forecasts.pkl', 'w'))

Collect temperatures and wind speeds in a Pandas DataFrame

I’m using pandas for data analysis, natch. For now, I’m focusing solely on temperature and wind speed. I still have the other data in that pickle file should I think up something else I want to see.

from pandas import DataFrame, DatetimeIndex
weather_df = DataFrame({'Temperature': [x['currently']['temperature'] for x in forecasts],
                        'Wind Speed':   [x['currently']['windSpeed'] for x in forecasts]},
                       index=DatetimeIndex([x['currently']['time'] for x in forecasts]))

Add a wind chill index

I’m using a wind chill formula I found on Wikipedia.

def windchill(temp, wind):
    if temp>50 or wind<=3:
        return temp
    else:
        return 35.74 + 0.6215*temp - 35.75*wind**0.16 + 0.4275*temp*wind**0.16

weather_df['Wind Chill'] = weather_df.apply(
        lambda row: windchill(row['Temperature'], row['Wind Speed']),
        axis = 1)

Group by years and take a look at averages

weather_df.groupby(lambda x: x.year).aggregate(np.mean).plot();

OK, so it looks like the trend in temps and wind chill is up if anything.

weather_df.boxplot(column='Temperature',by=(lambda x: x.year));

I love box plots. It looks like temps stayed in a tight range this March compared to most past Marches. Maybe it was the lack of a single 60+ degree day that drove me batty.

weather_df.boxplot(column='Wind Speed',by=(lambda x: x.year));

Winds were no higher, on average than recent years.

weather_df.boxplot(column='Wind Chill',by=(lambda x: x.year));

And, in fact, taking wind and temperature together suggests that - while last year was relatively warm, this year didn’t feel especially cold compared to prior years.

Initial conclusion

Based on noon-time measurements alone, it’s not obvious that this March should have been any more uncomfortable than past Marches. Possibly a daily average (pulled, say, hourly) would be more instructive.

Using more data

I decided to pull hourly data from 6AM to 6PM for each of the days to see if a full-day view might yield more info. I wrote a quick Python tool to download more data and store it in Pickle files. I won’t go into it here, but I’ll put the code on GitHub. Then I pulled the data into a DataFrame called bigdf.

bigdf.describe()
           Temperature   Wind Speed   Wind Chill
    count  4433.000000  4433.000000  4433.000000
    mean     43.541081    11.017018    38.487087
    std       9.928982     4.955918    13.199414
    min      13.320000     0.000000    -6.070190
    25%      37.190000     7.350000    29.845367
    50%      43.270000    10.600000    37.693510
    75%      49.550000    14.100000    46.533495
    max      74.180000    32.390000    74.180000
bigdf.groupby(lambda x: x.year).aggregate(np.mean).plot();

Nothing new here…

bigdf.boxplot(column='Temperature',by=(lambda x: x.year));
bigdf.boxplot(column='Wind Speed',by=(lambda x: x.year));
bigdf.boxplot(column='Wind Chill',by=(lambda x: x.year));

Rather than look at all the temperatures across the month, let’s look at each day’s average temperature.

avgdf = bigdf.groupby(['Year', 'Day']).agg(np.mean).reset_index()
avgdf.boxplot('Temperature', 'Year');
avgdf.boxplot('Wind Speed', 'Year');
avgdf.boxplot('Wind Chill', 'Year');

Well, I got nothin’. Except that, accounting for wind, this March didn’t seem to have a single daytime hour that felt warmer than 60 degrees - and that is legitimately rare for March. It’s also arguably in line with the notion that spring was slow in coming this year. In 2011 most temperatures were actually lower, but there were quite a few warm spikes. I don;t know if these were all on one day, since I didn’t aggregate the data daily, but it still looks like there was more relief then than this year. The other takeaway? Looking at some of this historic data, it’s actually been much worse frequently.

Other explanations

So why did March just seem so damned cold? Maybe it was just that the cold lingered late in the month, or that a cold March came after a cold February and we were just sick of it. Maybe understanding the cold requires a more nuanced analysis. Maybe we just like to complain and recent memory is exaggerated in our minds. Obviously my ability to objectively compare successive Marches in my mind over 10 years is, well, suspect.

I did get to learn more than I wanted to know about Pandas copies versus views and grouping. Actually I needed to learn this stuff. Hopefully I’ll remember it. I also got my first Stack Overflow post out of the deal.

    comments powered by Disqus