normal accidents: early excerpt ``for some systems that have this kind of complexity, such as universities or research and development labs, the accident will not spread and be serious because there is a lot of slack available, and time to spare, and other ways to get things done. [kc: above was written in the mid 80s and doesn't describe many universities today. nothing on the Internet can safely assume `time to spare'.] but suppose the system is also 'tightly coupled', that is, processes happen very fast and can't be turned off, the failed parts cannot be isolated from other parts, or there is no other way to keep the production going safely. then recovery from the initial disturbance is not possible; it will spread quickly and irretrievably for at least some time. indeed, operator action or the safety systems may make it worse, since for a time it is not known what the problem really is.'' -- p.5 normal accidents if that doesn't remind you of debugging BGP configs it's because you haven't. most BGP admins i know are more "used to" than "understanding" configuration too many people are configuring BGP for us to accept any less transparency than with driving a car ok maybe a truck. but definitely not a space shuttle. we abjectly lack a houston.