AI alignment is a field that attempts to solve the problem of “how do you stop something with the ability to deceive, plan ahead, seek and maintain power, and parallelize itself from just doing that to everything”.
AI alignment is “the problem of building machines which faithfully try to do what we want them to do”. An AI is aligned if its actual goals (what it’s “trying to do”) are close enough to the goals intended by its programmers, its users, or humanity in general. Otherwise, it’s misaligned. The concept of alignment is important because many goals are easy to state in human language terms but difficult to specify in computer language terms. As a current example, a self-driving car might have the human-language goal of “travel from point A to point B without crashing”. “Crashing” makes sense to a human, but requires significant detail for a computer. “Touching an object” won’t work, because the ground and any potential passengers are objects. “Damaging the vehicle” won’t work, because there is a small amount of wear and tear caused by driving. All of these things must be carefully defined for the AI, and the closer those definitions come to the human understanding of “crash”, the better the AI is “aligned” to the goal that is “don’t crash”. And even if you successfully do all of that, the resulting AI may still be misaligned because no part of the human-language goal mentions roads or traffic laws. Pushing this analogy to the extreme case of an artificial general intelligence (AGI), asking a powerful unaligned AGI to e.g. “eradicate cancer” could result in the solution “kill all humans”. In the case of a self-driving car, if the first iteration of the car makes mistakes, we can correct it, whereas for an AGI, the first unaligned deployment might be an existential risk.
Yes, because AI assistants are going to get too good to not use. And they are going to be made infinitely more powerful by being able to see and hear everything around you.