Recently at work, i had to remind some folks of something important: In order to provide the best service possible, you should build things that can fail.
At work, we're in the process of moving a lot of things from big iron systems to a more distributed architecture. What's generally counter-intuitive when doing something like that is things break a lot more. Machines and services go up and down all the time. The absolute worst thing one can do in that situation is to provide a boolean "All or Nothing" approach. If you do that, your customers may wind up a good deal like the passengers aboard a certain spaceship awaiting packets of lemon soaked paper napkins. In spite of what those of us with OCD insist on telling ourselves, ours is an imperfect world, and the sooner we adjust to that fact, the better.
As an example, right now i'm prevented on fixing a bug because a system that a system that a system i rely upon is down for unfathomable reasons. Said system provides a single element of data that while useful, isn't really critical. The data could be zeroed (so that folks that are looking for it don't break) or faked (since there's no dependency issues) with few the wiser.
Chances are, if you really reduce the set of data you absolutely have to send (and i don't mean "You absolutely need to send the user's background graphic otherwise the world will end!", i'm talking the absolute smallest data set you can send and have the site be tolerably functional), you'll be amazed by what little data you really need. Heck look at Twitter as an example. While they can provide a huge pile 'o data, ultimately the smallest set is the message, who sent it, and when.
Why pare down to the base essentials? Because it's easier to focus on three or four items and make sure that set of data is critically available. The nice thing is that once you've done that, it's fairly easy to tackle the next set of less critically available data while knowing that should that fail, you're not off the air and so on. When you build things to fail, you're ensuring that your stuff will work as well as possible. That makes your system more robust and reliable, which means that services that rely on you are more robust and reliable.
And thus, this is how you get to Yahoo/Google levels of reliable services, because both of those folks absolutely build things to fail.

