Rodger is using computers in the intro labs to run a giant phylogenetic analysis for his dissertation. He was planning to walk around with a flash drive to run the analyses, but I persuaded him to let me build some resources to do it over the network. For a long time, I've wanted to have a mechanism that would facilitate people distributing jobs out to all the workstations in our computer labs. We built the first part on Friday.
I started by looking at Apple's "Folder Actions" -- I thought maybe we could just create a folder that we could put stuff in and have an analysis run. But it turns out the Folder Actions are bound to a particular user and only work when that user is logged in. Fail. The mechanism that makes them work is also spread out all over the freaking file system: parts in ~/Library/Preferences and in /private/var/db/ and uses an undocumented faceless binary that is located in some hidden folder someplace. Ugh. So much of what Apple does looks shiny on the outside, but is implemented in a way that's seriously difficult to build on.
After wasting an hour building an Applescript that would implement a folder action, I started over and made a launchd file that would start a PHP script to run the analysis. It works great: it grabs a file, makes a directory with the same name (minus the extension), puts the file inside, and launches Beast. And same strategy could be used to launch any other kind of analysis or process.
We ran into a few bugs getting the whole thing working: first, we found that Beast wasn't getting launched with enough memory, so we had to modify the flags it was getting launched with. Then we had to get it out to all the machines in the intro labs, and we found a few machines that hadn't gotten updated properly and had to fix those. Then we ran into a really baffling problem: we could only ssh into one of the four intro labs.
At first, when Rodger reported it, I didn't believe that the problem was happening. Then I tried it and found that it was in fact true: I could ssh into the one new lab, but not into the other three older labs. I alerted the technical staff and George and I walked through everything: Yes, the machines were configured properly. Yes, you could ssh into them from themselves. Yes, you could ssh into them from other servers. We narrowed down the problem to just the one server: it could ssh into one lab but not the others. We looked at the arp tables, checked the netmask, nothing. At the end of the day, we rebooted the server, but the problem persisted. Then George suggested checking the name resolution. "Right! Maybe it's the hosts file," I said. "It doesn't have a hosts file," George said. "Oh, yes it does," I said. And, in fact, the hosts file was ancient and had bogus entries for the intro lab machines that weren't working. Once we cleaned those out, it worked fine. Whew.
The next step in the system is to set up shared-key authentication for moving the files and an automated system to put the files across, poll for completion of the job, and then pull the results back. But even with that, it's still a lot more convenient than walking around with a flash drive.
- Steven D. Brewer's blog
- Log in to post comments