Two modes of cascading failure here: Request of Death, and cascading failures. I...

Two modes of cascading failure here: Request of Death, and cascading failures. If a request kills a particular server you should let the error flow upstream, otherwise it will just bounce from server to server until it's killed all of them.

For the latter, someone related a real-world example of this to me the other day. Say you have a bunch of people managing customers. Every employee has 4 customers, and those take up all of their time.

You get a new customer. Instead of hiring a new rep, you give someone a 5th customer to manage. They struggle, and eventually they quit. Now, all of your employees have 5 customers. Sooner or later one of those will also quit, and then it's a race to see who can get out the door fastest.

The moral of that story is that all the load balancing in the world is for naught if you haven't done your capacity planning properly. And once the system starts to buckle it may be too late to bring new capacity online (since startup usually consumes more resources).