October 1, 2004

Running Reliable Services Is Hard

Service Oriented Architectures By: ams

Phil Windley writes:

Dave Sifry gives some details about the Technorati outage this past weekend. Seems an electrical fire in the data center their co-lo at was the culprit. Running a 24/7 Web application reliably isn’t easy and it isn’t cheap. It took us several years of problems and study to hit on a solution at iMALL. We finally did figure it out and that was a real lightening of my load. One of the answers is product engineers, an engineer on the operations side whose job it is to make the product (not just the server) work. Properly incented, a product engineer will drive all of the emergency and contingency planning, along with ensuring that engineering delivers a system that can be reliably operated.

This just serves as a reminder that it’s still hard (and costly) to run a web service reliably.

  • blog

  • companies & initiatives

  • April 2019
    M T W T F S S
    « May    
    1234567
    891011121314
    15161718192021
    22232425262728
    2930  
  • archive

  • categories