I thought it would be worth writing this up quickly as a blog post, just so it’s documented, though I’m guessing the bug is common knowledge by now. The process of finding out the issue was (eventually) enlightening for me though, especially how far the initial problem was from the bug.
I was having problems last week deploying changes to one of our projects hosted on Heroku. I’d done a full run-down of dependencies trying to bring in security and bug fixes, making everything Python 2.7 and Ubuntu Trusty compatible (or, for Heroku, cedar-14 stack compatible). Everything worked fine locally, even using
foreman (which is the Heroku tool that runs your code as if it was deployed on Heroku—in this case running through gunicorn with gevent). However, on deploying to a clean app and database on Heroku, the Persona Single-Sign-On authentication wasn’t working. The project’s settings are slightly involved, but the fact the admin site was working and I was getting a login page at all indicated that things were probably okay on the Django side of things. Persona itself worked fine locally, as well as on stage and production deployments on Heroku. I suspected DNS issues, but this turned out not to be the case—site domains and urls were being resolved correctly.
The only difference I could see between my new app and the stage deployment (that was working) was the database version (I’d deliberately matched the PostgreSQL version on my new app to that used in production, while staging was a point release ahead, for some reason) and the build stack that Heroku was using (part of what I was doing was testing the cedar-14 stack, which is based on trusty and supports Python 2.7.9). The customer was keen on having a test instance, so I decided to deploy on staging instead, but with an updated stack. I provisioned a new database to match production (and with production data), and used the cedar-14 stack, and everything worked fine (well apart from the bugs in my pre-Christmas development changes, but that was why I needed a test deployment for them to look at).
So there was something iffy in the test Heroku deployment I had. Off I set trying to debug a live Heroku deployment. Now after this experience, I must say I am in the market for decent logging/debugging tools for Heroku, so any suggestions are welcome. Papertrail, I found to be almost useless in this scenario, so was reduced to using django logging and running “
heroku logs --app <app_name>” to see the output. So basically,
Being unable to directly inject debug logging into a 3rd party library on Heroku, I decided to call the deeper API directly from my own code, and immediately I got a 500 error, and a stack trace – the definition of some internal SSL calls had changed in Python 2.7.9, and gevent was relying on them: reported here http://bugs.python.org/issue22438. So technically not a Python bug, but the app blew up querying SSL URLs when running in Gunicorn + gevent. This was the cause of my authentication issue, as even though this project doesn’t currently run over SSL, Persona verification does. This was not initially causing 500 errors, as although it raised a
TypeError in an authentication backend, Django was then just falling back to the default authentication, failing and returning no valid user.
So, bug found, and my current work-around is to stick to a Python 2.7.8 runtime (technically unsupported by Heroku), until either gevent or Python is updated. But why did I not see the issue on staging or locally using
foreman? The Heroku stack on staging turned out to be using Python 2.7.4, and locally my virtualenv was running Python 2.7.3.
A few lessons to take away, I suppose:
- Make sure you include the Python runtime in your list of dependencies to check and versions to match on Development vs. Staging vs. Production.
- Sometimes underlying components silently riding over errors can mask the true source of odd behaviour (I really should have twigged about the authentication backends, but also custom backends should deal with
TypeErrorsfrom internal calls properly).
- Logging is super-useful, I should probably use it more, and in a smarter way.
- I would love a decent debugging tool for Heroku.
- Don’t rely on underscore methods in Python internal libraries (stares daggers at the gevent developers).
If you weren’t aware of this bug, watch out for it!
Update: According to comments on the gevent issue on github, this might be an issue on Amazon even with Python 2.7.8 as they have backported the SSL code from 2.7.9 to their 2.7 runtimes on AWS. This emphasises the need to be aware of what Python runtimes are being deployed on cloud services, where debugging may be more difficult.
About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.