You are here

Unix Programming

For a couple of years, we've had a web-based system that I wrote that lets people authenticate against their University account and then set a password on the local authentication systems we have in Biology, Microbiology, Geocsiences, Chemistry, and Biochemistry. It's been a godsend for the end-users that can reset multiple passwords at the same time and not have to know which of the 10 or 12 password stores is really the one that matters right now. I used to spent hours helping people sort out which particular one wasn't what they thought and then resetting all of the other passwords to match. Now, they can do it themselves.

The system has been based on Pubcookie, but the University has been migrating away from Pubcookie to Shibboleth. I've been trying to get Shibboleth built for months, but it was a higher priority to get MySQL, Apache and PHP updated, so we'd been working on that first. But finally, a couple of weeks ago, I was able to dive in and start working on shibboleth.

They warn you that compiling shibboleth on Solaris can be rough. There are a bunch of prerequisites (curl, openssl, boost, xerces, opensaml, xml-security, xmltooling) some of which also have prerequisites. First, I discovered that that curl had been compiled against LDAP which was compiled against libsasl, which was causing some incompatibilities. I recompiled curl without ldap. Then, in compiling xerces, I was having problems linking with libiconv, so I had to munge the LDFLAGS to add -liconv later in the search path. But eventually I got shibboleth installed, configured it, started up the daemon, and tried to connect. I was redirected, authenticated, but when it tried to catch the redirect back, it failed: the daemon had dumped core.

I checked and double-checked everything. We weren't getting any error messages in the logs. I loaded the core file into a debugger and tried to interpret what it was saying. Eventually, I posted to the shibboleth-users mailing list and got a curt reply from the main developer saying that the backtrace that I'd provided was insufficient to evaluate where the problem was and said the application would need to be "stepped".

While investigating how to step the application, without much success, I found that I could start the daemon without dissociating from the controlling terminal. When I did that, and tried to connect, I got an error message back that said the problem was in the xerces transcoding library -- our old friend libiconv. I had already had indications there were problems with it, so I went back and looked at that some more. I found that xerces could use either gnu or solaris iconv and was choosing gnu. Our gnu package was a third-party compile and I suspected it might be linked against different, incompatible libraries. So I selected the solaris libiconv, recompiled xerces and all of the dependent libraries, tried again, and it worked!

I drafted a brief report for shibboleth-users that summarized what I'd done. I explained that I had given up trying to step the daemon because "I only play a unix programmer on TV." One of the other folks on the list replied to the account of my solution, saying "Could have fooled me!" The lead developer also thanked me and asked for a bit more detailed info so that he could adjust the documentation to help people avoid the problem in the future. I'm always happy when I can find ways to pay my dues to the Free Software community.