Phil Windley writes:
Dave Sifry gives some details about the Technorati outage this past weekend. Seems an electrical fire in the data center their co-lo at was the culprit. Running a 24/7 Web application reliably isn’t easy and it isn’t cheap. It took us several years of problems and study to hit on a solution at iMALL. We finally did figure it out and that was a real lightening of my load. One of the answers is product engineers, an engineer on the operations side whose job it is to make the product (not just the server) work. Properly incented, a product engineer will drive all of the emergency and contingency planning, along with ensuring that engineering delivers a system that can be reliably operated.
This just serves as a reminder that it’s still hard (and costly) to run a web service reliably.