When thinking about safety risks from ML, there are two common approaches, which I'll call the Engineering approach and the Philosophy approach:
- The Engineering approach tends to be empirically-driven, drawing experience from existing or past ML systems and looking at issues that either: (1) are already major problems, or (2) are minor problems, but can be expected to get worse in the future. Engineering tends to be bottom-up and tends to be both in touch with and anchored on current state-of-the-art systems.
- The Philosophy approach tends to think more about the limit of very advanced systems. It is willing to entertain thought experiments that would be implausible with current state-of-the-art systems and is open to considering abstractions without knowing many details.
While Engineering and Philosophy agree on some things, for the most part, they make wildly different predictions about what the key safety risks from ML will be and how we should address them:
- Both Engineering and Philosophy would agree on some high-level points: they would agree that misaligned objectives are an important problem with ML systems that is likely to get worse. Philosophy is more confident that misaligned objectives are a big problem and thinks they could pose an existential threat to humanity if not addressed.
- Engineering and Philosophy would both agree that out-of-distribution robustness is an important issue. Philosophy is more worried about whether systems can generalize from settings where humans can provide data, to settings where they cannot provide data even in principle.
- Engineering tends to focus on tasks where current ML systems don't work well, weighted by their impact and representativeness. Philosophy focuses on tasks that have a certain abstract property that seems important, such as imitative deception.
In my experience, people who strongly subscribe to the Engineering worldview tend to think of Philosophy as fundamentally confused and ungrounded, while those who strongly subscribe to Philosophy think of most Engineering work as misguided and orthogonal (at best) to the long-term safety of ML