normal accidents: early excerpt 

    ``for some systems that have this kind of complexity, such as 
      universities or research and development labs, the accident 
      will not spread and be serious because there is a lot of slack 
      available, and time to spare, and other ways to get things done.  
    [kc: above was written in the mid 80s and doesn't describe many universities today.  
     nothing on the Internet can safely assume `time to spare'.]
      but suppose the system is also 'tightly coupled', that is, 
      processes happen very fast and can't be turned off, 
      the failed parts cannot be isolated from other parts, or
      there is no other way to keep the production going safely.
      then recovery from the initial disturbance is not possible;
      it will spread quickly and irretrievably for at least some time.  
      indeed, operator action or the safety systems may make it worse, 
      since for a time it is not known what the problem really is.'' 
                                                        -- p.5 normal accidents

   if that doesn't remind you of debugging BGP configs it's because you haven't.
most BGP admins i know are more "used to" than "understanding" configuration
too many people are configuring BGP for us to accept any less transparency 
               than with driving a car
ok maybe a truck.  but definitely not a space shuttle.  we abjectly lack a houston.