It looks like the Obama AMA has really caused some serious fires in reddit’s backend infrastructure. The site’s been down for the last 10 minutes.
I would’ve thought that they’d have brought in some additional computing power for such an event, should’ve been easy for them considering they have a cloud deployment. Maybe this gives them greater reason to hire more engineers. I found it impressive that they served billions of impressions with just 2 engineers a short while ago..
Also, it says a lot about the “Come Cloud with us, we’ll help you scale” marketing bandwagon. We’ve seen time and again issues with EC2s infrastructure and if EC2 doesn’t have issues right now (http://status.aws.amazon.com/) then it’s just sad that they can’t order a gazillion instances for this event and have it scale easily.
Definitely makes me think that we still have a long way to go to compute in a truly ‘elastic’ way.
reddit definitely does have some crazy infrastructure in place but this would’ve been one of the most important moments in reddit history (so far..) and I’m sad to see that their engineers are probably going to get blamed for this..
EDIT: Okay, they’re back in read only mode.. I wonder how they’ll hack in some write access for the AMA while keeping everything else read only. Time for some app server redeployments! Funsies!
EDIT2: And they’re gone again sigh
No, it’s probably due to the massive number of reads/writes pumping through their system. I’m pretty sure that their database isn’t able to handle what’s going on at the moment.
Also, if you were to use Varnish as a cache, that wouldn’t help you much with write access because every time there is a write (a new post..) you’d have to invalidate the cache. In a high volume scenario such as this (tons of posts coming in..), the page is as good as dynamic, even if you DO cache it for a bit.
reddit’s backend could be written in assembly hand-tuned by God Himself, and having half the world hit it at once would still cause bottlenecks at the database.