Intro to AI Safety

“We will, sooner or later, build an artificial agent with general intelligence”.

Artificial Agent → Agents have goals and choose actions to further those goals.

Intelligence → Is the thing that lets an agent choose effective actions.

General Intelligence → Generality is the ability to behave intelligently in a wide range of domains.

What this text says is that in a system as large as real life, if an objective function is designed with a subset of the total objectives, the intelligence will try to maximize its subset regardless of how much it harms other objectives.

It’s also important to consider that an agent that wants to satisfy its objective function will know that it has a turn-off button so it will fight or deceive the person so it can complete its goal.

Actions that intelligent agents will do by default:

Self Preservation → Don’t turn me off
Goal Preservation → They want to achieve their goal
Resource Acquisition → More energy, more compute power
Self Improvement → More efficient code

We might get one shot to make an AGI that will help us and won't make matters worse because of the expected actions it will take on its own.