People always ask if we’re the coldest spot in town. I can’t really answer that, but I can find out if we’re the coldest reporting weather station in the region.

Once again, we’ll use PostgreSQL window functions to investigate. The following query finds the station in zone 222 (the National Weather Service region that includes Fairbanks) reporting the coldest temperature every hour during the winter, counts up all the stations that “won,” and then ranks them. The outermost query gets the total number of hourly winners and uses this to calculate the percentage of hours that each station was the coldest reporting station.

Check it out:

SELECT station, count, round(count / sum(count) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) * 100, 1) AS percent FROM ( SELECT station, count(*) AS count FROM ( SELECT station, dt_local, temp_f, rank() OVER ( PARTITION BY dt_local ORDER BY temp_f ) FROM ( SELECT location || ' (' || station_id || ')' AS station, date_trunc('HOUR', dt_local) AS dt_local, temp_f FROM observations INNER JOIN stations USING (station_id) WHERE zone_id = 222 AND dt_local between '2010-10-01' and '2011-03-31' ) AS foo ) AS bar WHERE bar.rank = 1 GROUP BY station ORDER BY count desc ) AS foobar;

And the results:

station | count | percent ----------------------------------------+-------+--------- Goldstream Creek (DW1454) | 2156 | 51.0 Chena Hot Springs (CNRA2) | 484 | 11.5 Eielson Air Force Base (PAEI) | 463 | 11.0 Parks Highway, MP 325.4 (NHPA2) | 282 | 6.7 Small Arms Range (SRGA2) | 173 | 4.1 Ballaine Road (AS115) | 153 | 3.6 Fairbanks Airport (PAFA) | 125 | 3.0 Fort Wainwright (PAFB) | 107 | 2.5 Ester Dome (FBSA2) | 103 | 2.4 Eagle Ridge Road (C6333) | 81 | 1.9 Keystone Ridge (C5281) | 33 | 0.8 Skyflight Ave (D6992) | 21 | 0.5 14 Mile Chena Hot Springs Road (AP823) | 21 | 0.5 College Observatory (FAOA2) | 11 | 0.3 Geophysical Institute (CRC) | 10 | 0.2 DGGS College Road (C6400) | 1 | 0.0

Answer: Yep. We’re the coldest.

Update: Thinking about this a little bit more, the above analysis is biased against stations that don't report every hour. Another way to look at this is to calculate the hourly average temperature, subtract this from the data for each station during that hour, and then average those results for the whole winter. The query is made more complex because several stations report temperatures more than once an hour. If we simply averaged all these observations together with the stations that only reported once, these stations would bias the resulting hourly average. So we average each station's hourly data, then use that to calculate the zone average for the hour. Here's the query, and the results:

SELECT station, round(avg(diff), 1) AS avg_diff FROM ( SELECT station, dt_local, temp_f - avg(temp_f) OVER ( PARTITION BY dt_local ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS diff FROM ( SELECT location || ' (' || station_id || ')' AS station, date_trunc('HOUR', dt_local) AS dt_local, avg(temp_f) AS temp_f FROM observations INNER JOIN stations USING (station_id) WHERE zone_id = 222 AND dt_local between '2010-10-01' and '2011-03-31' GROUP BY station, date_trunc('HOUR', dt_local) ) AS foo ) AS bar GROUP BY station ORDER BY avg_diff;

station | avg_diff ----------------------------------------+---------- Goldstream Creek (DW1454) | -6.8 Eielson Air Force Base (PAEI) | -3.8 Fort Wainwright (PAFB) | -3.1 Fairbanks Airport (PAFA) | -2.9 Small Arms Range (SRGA2) | -2.8 Chena Hot Springs (CNRA2) | -2.3 DGGS College Road (C6400) | -0.7 Ballaine Road (AS115) | -0.6 College Observatory (FAOA2) | 1.0 North Bias Drive (RSQA2) | 1.3 14 Mile Chena Hot Springs Road (AP823) | 3.1 Skyflight Ave (D6992) | 3.3 Geophysical Institute (CRC) | 3.5 Eagle Ridge Road (C6333) | 3.8 Parks Highway, MP 325.4 (NHPA2) | 4.5 Keystone Ridge (C5281) | 5.1 Ester Dome (FBSA2) | 5.1 Birch Hill Recreation Area (BHS) | 6.8

Yesterday I looked at how wind might be affecting my bicycling to and from work. Today I’ll examine the idea that Miller Hill is confounding the effect of wind on average speed by excluding this portion of the trip from the analysis. To do this, I include a bounding box comparison in the SQL statement that extracts the wind factors for track points. The additional `WHERE` condition looks like this:

ST_Within(point_utm, ST_SetSRID(ST_MakeBox2D(ST_Point(454861,7193973), ST_Point(458232,7199159)), 32606))

The same `ST_Within` test is used in the calculation of average speed for each of the trips from work to home. After compiling the wind factors and average speeds, we compare the two using R. Here are the updated results:

lm(formula = mph ~ wind, data = data) Residuals: Min 1Q Median 3Q Max -1.87808 -0.55299 0.04038 0.62790 1.19076 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 16.8544 0.2176 77.442 <2e-16 *** wind 0.3896 0.2002 1.946 0.0683 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9445 on 17 degrees of freedom Multiple R-squared: 0.1822, Adjusted R-squared: 0.1341 F-statistic: 3.788 on 1 and 17 DF, p-value: 0.06834

This time around the model and both coefficients are statistically significant (finally!), and “wind factor” is positively correlated with my average speed over the part of the route that doesn’t include Miller Hill and Railroad drive. It’s not a major contributor, but it does explain approximately 18% of the variation in average speed.

Cycling with the wind

Photo by ~BostonBill~

I decided to look at wind a little more deeply after yesterday’s bike ride home. It seemed clear to me that the wind was strongly at my back for much of the route. It wasn’t my fastest ride home, but it was close, and it didn’t feel like I was working all that hard.

Here’s the process. First, examine all my bicycling tracks individually, using PostGIS’s `ST_Azimuth` function to calculate the direction I was traveling at each point. The query uses another of the new window functions (`lead`) in PostgreSQL 8.4.

SELECT point_id, dt_local, ST_Azimuth( point_utm, lead(point_utm) OVER (PARTITION BY tid ORDER BY dt_local) ) / (2 * pi()) *360 FROM points WHERE tid = TID ORDER BY dt_local;

Then, for each point, find the direction the wind was blowing. This is a pretty slow query, but I haven’t found a better way to compare timestamps in the database to find the closest record. This technique, based on converting both timestamps to “epoch,” which is the number of seconds since January 1st, 1970, is faster than using an `interval` type of operation (like: `WHERE obs_dt - POINT_DT BETWEEN interval '-3 minutes' AND interval '3 minutes'`).

SELECT obs_dt, wdir, wspd FROM observations WHERE abs(extract(epoch from obs_dt) - extract(epoch from POINT_DT)) < 5 * 60 AND wspd IS NOT NULL AND wdir IS NOT NULL ORDER BY abs(extract(epoch from obs_dt) - extract(epoch from POINT_DT)) LIMIT 1;

Now I’ve got the direction I was traveling and the direction the wind is coming from. I wrote a Python function that returns a value from –1 (wind is in my face) to 1 (wind is at my back). The procedure is to convert the wind directions to unit *u* and *v* vectors and get the distance between the endpoints of each vector. The distances are then scaled such that wind behind the direction traveled range from 0 – 1, and from –1 – 0 for wind blowing against the direction traveled.

def wind_effect(mydir, winddir): """ Returns a number from 1 (wind at my back) to -1 (wind in my face) based on the directions passed in. Remember that wind direction is where the wind is *from*, so a wind direction of 0 and a mydir of 0 means the wind is in my face. """ try: mydir = float(mydir) winddir = float(winddir) except: return(None) my_spd = 1.0 wind_spd = 1.0 u_mydir = -1 * my_spd * math.sin(math.radians(mydir)) v_mydir = -1 * my_spd * math.cos(math.radians(mydir)) u_winddir = -1 * wind_spd * math.sin(math.radians(winddir)) v_winddir = -1 * wind_spd * math.cos(math.radians(winddir)) distance = math.sqrt((u_mydir - u_winddir)**2 + (v_mydir - v_winddir)**2) factor = (1.41421356 - distance) if factor < 0.0: factor = factor / -0.58578644 else: factor = factor / -1.41421356 return(factor)

Finally, multiply this value by the wind speed at that time, and sum all these values for an entire bicycling track. The result is a “wind factor.” A positive wind factor means the wind was generally at my back during the ride, negative means it was blowing in my face. Yesterday’s ride home had the highest wind factor (1.07) among trips since June. So the wind really was at my back!

Can “wind factor” help predict average speed? Here’s the R and results:

$ R --save < wind_from_abr.R > data<-read.table('wind_factor_from_abr',header=TRUE) > model<-lm(speed ~ wind, data) > summary(model) Call: lm(formula = speed ~ wind, data = data) Residuals: Min 1Q Median 3Q Max -0.90395 -0.46782 -0.04334 0.40286 0.85918 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 14.7796 0.1471 100.48 <2e-16 *** wind 0.4369 0.2875 1.52 0.147 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.5522 on 17 degrees of freedom Multiple R-squared: 0.1196, Adjusted R-squared: 0.06784 F-statistic: 2.31 on 1 and 17 DF, p-value: 0.1469

Hmm. Not a whole lot of help here. The model is close to being statistically significant (although it’s not…), and it’s not very predictive (only 12% of the variation in average speed is explained by wind factor). However, the directionality of the (not quite statistically significant) `wind` coefficient is correct. A positive wind factor is (weakly) correlated with a higher average speed.

Thinking more about my route from work, I suspect that the route is actually two trips: the trip from ABR to the bottom of Miller Hill (4.8 miles) and the two mile trip over Miller Hill to our house. I’ll bet that wind becomes statistically significant if I only consider the first part of the trip: wind doesn’t have as much effect on a hill climb, and after making it over the top, the rest is a bumpy, gravel road where speed is determined more by safety than wind or how hard I’m pedalling. I think this might also resolve the question of why the ride home is so much easier than to work. It’s not because I’m glad to be out of work or because I’m carrying a lunchbox full of food to work, it’s because it’s downhill from ABR to the bottom of Miller Hill.

West weather sensors

Fairbanks had some very hot weather earlier this week, including breaking a high temperature record at the airport on Wednesday afternoon. We got up to 94°F here on the Creek, but what was worse was that the low temperature on Wednesday night was 60°F, too warm to cool the house much. We were tempted to sleep out in the back cabin because the house was so warm.

It’s been cooler in the last couple days. But how much cooler?

The new version of PostgreSQL (8.4) has support for window functions, which make it easier to answer questions like this. Window functions allow you to calculate aggregates (average, sum, etc.) at one level, while display the data from another level. In this case, I want to compare the hourly average temperatures over the last 24 hours with the overall hourly average temperature over the past week. Without window functions, I’d need to combine a query that yields the hourly average temperature over the last 24 hours with a query that calculates overall seven-day hourly average temperatures. And if I want the difference between the two, the first two queries become a subquery of a third query.

Here’s the query:

SELECT dt, t_avg, seven_day_avg::numeric(4,1), (t_avg - seven_day_avg)::numeric(4,1) AS anomaly FROM ( SELECT dt, t_avg::numeric(4,1), avg(t_avg::numeric(4,1)) OVER (PARTITION BY extract(hour from dt)) AS seven_day_avg FROM hourly WHERE dt > current_timestamp - interval ’7 days’ ) AS sevenday WHERE dt > current_timestamp - interval ’24 hours’ ORDER BY dt;

And the result for the last 24 hours of temperature data:

dt | t_avg | seven_day_avg | anomaly ------------------+-------+---------------+--------- 2009-07-10 11:00 | 67.1 | 72.3 | -5.2 2009-07-10 12:00 | 70.8 | 75.8 | -5.0 2009-07-10 13:00 | 72.9 | 77.4 | -4.5 2009-07-10 14:00 | 74.1 | 78.5 | -4.4 2009-07-10 15:00 | 74.6 | 80.2 | -5.6 2009-07-10 16:00 | 75.9 | 80.4 | -4.5 2009-07-10 17:00 | 76.1 | 81.0 | -4.9 2009-07-10 18:00 | 76.9 | 80.4 | -3.5 2009-07-10 19:00 | 76.5 | 79.3 | -2.8 2009-07-10 20:00 | 73.1 | 77.0 | -3.9 2009-07-10 21:00 | 69.1 | 73.5 | -4.4 2009-07-10 22:00 | 63.7 | 68.2 | -4.5 2009-07-10 23:00 | 57.6 | 62.3 | -4.7 2009-07-11 00:00 | 52.1 | 56.5 | -4.4 2009-07-11 01:00 | 48.5 | 52.6 | -4.1 2009-07-11 02:00 | 45.5 | 49.3 | -3.8 2009-07-11 03:00 | 43.4 | 47.9 | -4.5 2009-07-11 04:00 | 42.2 | 47.1 | -4.9 2009-07-11 05:00 | 44.8 | 47.6 | -2.8 2009-07-11 06:00 | 47.5 | 49.7 | -2.2 2009-07-11 07:00 | 51.2 | 53.8 | -2.6 2009-07-11 08:00 | 55.3 | 59.2 | -3.9 2009-07-11 09:00 | 60.4 | 64.0 | -3.6 2009-07-11 10:00 | 65.5 | 68.3 | -2.8

Conclusion? It’s been several degrees cooler in the last 24 hours compared with the last seven days.

And window functions are groovy.