sun, 30-apr-2006, 20:53

A couple months ago I got my first Apple since the Mac Classic I had in college. It's a MacBook Pro and so far I really like it. I've managed to get it to do almost everything my Linux laptop could do, but now I've got access to iTunes and Adobe's Creative Suite (although it's slow under Rosetta). If Apple would allow me to change the focus behavior, and implement the X11 cut and paste, it'd be the perfect system for a laptop.

On campus I have access to the iTunes playlists of all the people on the wireless network that are sharing their music library. And I have mine shared so other people can check out the artists I enjoy. Unfortunately, iTunes doesn't tell you what songs connected users are listening to or who is actually connected.

Since OS X is Unix, it's easy enough to examine the process tree and discover what network and filesystem connections iTunes is making. Running:

ps -axo 'pid command' | grep -v grep | grep 'iTunes ' | awk '{print $1}'

will show the process ID for iTunes. Once you have this number, you can use lsof -p [pid] to show all the files (and network connections, which are treated like files in Unix) that iTunes is using. Filtering the results by your iTunes library (grep /Users/$USER/Music/iTunes/iTunes Music/) yields the songs that are being played, both locally and over the network. And searching for ESTABLISHED shows the network connections. The last part of these lines show the IP addresses of the computers connected to you, and if there are two lines with the same destination IP address, that means they are actually playing from your music library.

To automate this, I wrote a Python script watch_itunes.py that automates this process. Note that this is a command-line tool, running from a terminal window. There are Dashboard widgets that are supposed to do this, but the one I tried didn't work, perhaps because I have an Intel mac.

To use the script: ./watch_itunes.py

By default, it will examine the process tree every 15 seconds, showing what's playing and who is connected or playing from your music library. Run it with -h to see a list of command line options.

Here's what it shows right now:

192.168.1.101 is connected but not listening to music
Portastatic                Bright Ideas               05 Little Fern.m4a

192.168.1.101 is listening to music
Arcade Fire                Funeral                    09 Rebellion (Lies).m4a
Portastatic                Bright Ideas               05 Little Fern.m4a

In the first two lines, I'm listening to Little Fern, and another computer is connected to my library, but isn't playing anything. In the second set of lines, they started listeing to Rebellion (Lies). The program will keep printing lines like these until you exit the program with Control-C.

tags: music  OS X  sysadmin 
tue, 25-apr-2006, 06:30

For many years I've used the Unix calendar program to send me an email reminder of upcoming events and holidays. Unix calendar files are very simple text files with one event per line like this:

Apr 22  We bring Koidern home, 2006

Google recently added a calendar to their set of web programs, and like most things Google does, it offers a clean and elegant implementation. Best of all, it's on the web, so you can access the same calendar information from anywhere there's an Internet connection.

These days, calendar files are typically in iCal format. I wanted to convert my Unix calendar file over to iCal so I could import the data into Google calendar. Python to the rescue!

Download the script: calendar_to_ics.py

To use it: cat ~/.calendar/calendar | ./calendar_to_ics.py > /tmp/calendar.ics

Import the file you created into your Google calendar by clicking on the Manage Calendars link, and going to the Import Calendar tab. The script is only designed to handle simple events that take place once a year, on the same day, and it only accepts dates in MMM DD format. But Python is easy to read and hack, so if you have improvements, please email them to me and I'll incorporate them into the script.

tags: linux  sysadmin 
sat, 22-apr-2006, 18:37
six dogs

We got a new dog named Koidern today. She had problems with other dogs in her previous kennel, so we're hoping that she does better here. So far so good, but we're still in the early phases of the introduction. From left to right in the photo, there's Kiva, Nika, Buddy, Deuce, Koidern and Piper. Andrea is on the couch petting as many as she can get her hands on.

Koidern will be four in June and she's one-quarter saluki.

tags: Buddy  dogs  house  Kiva  Koidern  Nika  Piper 
sun, 26-feb-2006, 09:57

A couple days ago, in an article about prospect analysis in baseball (subscription required) Nate Silver produced a cool table showing the year to year correlations of the six major batting events. This morning while I wait for my dough to rise, I decided to replicate this analysis with my new found baseball hack ability.

You can download the R program code for the analysis by clicking on the link.

Here's the result, showing the 2004 to 2005 correlations for rate-adjusted batting statistics for all players with more than 250 at-bats in both seasons:

Hits / PA           0.422
Singles / PA        0.663
Doubles / PA        0.369
Triples / PA        0.501
Home Runs / PA      0.702
Walks / PA          0.718
Strikeouts / PA     0.813
Plate appearances   0.405

What Silver was trying to show by presenting his table (which included all year to year correlations since World War II) is that "there's really no such thing as a doubles hitter."

You can see from looking at the table that there's very little relationship between how many doubles a hitter hit in 2004 and how many they got in 2005. But a home run hitter in one season is likely to hit them at the same rate in the next season. Also note that strikeouts and walks are very highly correlated. So, 2004 and 2005 strikeout leader Adam Dunn is likely to strike out more than 150 times in 2006. Thankfully for Reds fans, he'll probably also hit more than 40 home runs.

The last number is also interesting. There isn't a great correlation between plate appearances between seasons. This is probably a combination of older players breaking down between 2004 and 2005, and younger players stepping in to take their place at the plate.

tags: baseball  books 
sat, 25-feb-2006, 20:34

Yesterday I discovered (and ordered!) a new book from O'Reilly called Baseball Hacks by Joseph Alder. I've got a bookshelf full of O'Reilly books on other computer subjects, so I'm very excited to see this. On the web site for the book, there are a couple example hacks from the book.

Last year I spent some time getting the Lahman database into MySQL so I could fool around with some advanced baseball statistics. The Lahman database is a Microsoft Access database, and doesn't allow re-distribution, so for an open-source advocate like me, this isn't exactly the best source for baseball information. It took me a few days to get it all into MySQL successfully, and any of my improvements couldn't be distributed.

Well from reading the sample hacks, I discovered there's a less restrictive database that's also available for MySQL (a free database server). In addition, the author of Baseball Hacks shows how to connect a MySQL database with the fantastic statistical package R. R is also free, and is incredibly powerful. I also found previous article by the same author. Some of what appears below is based on that article.

Anyway, I can't wait to get the book to see what's in it, but meantime I did a very simple analysis comparing payroll to wins for the 2005 season. For the 2005 season, team payroll numbers range from a low of $29.7 million for the Tampa Bay Devil Rays to the Yankee's astronomical payroll of $208.3 million. The second place team, the Red Sox, spent only $123.5 million on player payrolls in 2005. What does all that money buy? I'm sure the owners hope it'll buy them enough wins to make it to the playoffs, and hopefully win the World Series. The White Sox, winners in 2005, were 13th in payroll at $75.2 million.

It turns out that payroll doesn't really account for a lot of whether a team wins or loses. It explained only 24% of the variation in wins in 2005. For comparison, a team's hits and earned run average explains 72% of the variation in wins. Obviously, getting lots of hits, and keeping your opponent from scoring runs will contribute to winning a lot of games.

But what I want to see is whether a team did better than expected based on their player spending. The Yankees didn't wind up with the best record in baseball, despite spending more than twice as much as every other team in baseball except the Red Sox. How badly did they under-perform?

Not that badly, actually. The plot below shows the relationship between payroll and wins for 2005. The straight line is the regression line showing the best linear fit to the data. The team letters on the plot show how they actually performed. Teams that show up above the line, played better than their salaries would have predicted. Those below, did much worse.

Payroll v. Wins, 2005

For example, look how far the Chicago White Sox (CHA) are from the regression line. The Cardinals also wind up well above what we would expect based solely on their salaries (and that's with Scott Rolen on the DL the whole season!). Also check out the Cleveland Indians. They're a team that has a lot of very good younger players who aren't eligible for arbitration yet, but have loads of talent.

You can see the Yankees over on the right, far from all the other teams. Based on their payroll, they should have won 102 games in 2005, but only managed 95. The Kansas City Royals were much worse, only managing to win 56 games when their player salaries predicted 75 wins. It's easy to explain why teams like the Dodgers or Giants didn't do well in 2005---their high paid players were injured for most of the year---but something else must be going on with Seattle and Kansas City.

What does all this tell us about baseball? Well, I'd argue that this metric (payroll vs. wins) tells us something about how effective the front office of a team is. Smart general managers will pick up talent that is undervalued by the market, buying more wins than they're paying for. Also, teams with a good farm system can "grow their own" talent, rather than having to buy it on the free market. Teams like Cleveland and Oakland are good examples of this. The excesses of George Steinbrenner should have been enough to buy a World Series championship, but the Yankee front office overpaid for all their veteran talent, and in 2005, they didn't live up to their high salaries.

If you want to see the R code I used to generate the plot, you can download it from the link.

tags: baseball  books 

<< 0 1 2 3 4 5 6 >>
Meta Photolog Archives