TAKEAWAY FROM PART 1
In the previous post in this series which can be found here we dissected the issues surrounding AI motivation. We discussed where motivation itself comes from and various ways in which a highly intelligent AI could possess or not possess an inner drive to accomplish specific goals and the lengths it might go to in order to achieve those goals.
We concluded that we will need to utilize programming methods that either control the artificial agent’s motivation or final goals or both to ensure that humans survive the impending intelligence explosion.
Motivation selection is an idea that we must attempt in any way we can to prevent undesirable outcomes by influencing what the highly intelligent AI wants to do or how much motivation it experiences in order to pursue a goal that it has been programmed with. The goal here is to directly engineer the agent’s reward system and its final goals to ensure that it would not want to do anything that goes against humanity’s values or harms humanity in any way.
There are a few different ways we might accomplish this. The first method that comes to mind is a method that involves programming a set or multiple sets of rules into the AI that it cannot break. These rules would have to be carefully considered and worded carefully.
Another idea that has been thrown around is that perhaps we should merely create an AI that isn’t ambitious at all and doesn’t have big goals and in turn not much motivation. This one is rather interesting because it seems rather simple and straightforward and almost intuitive. A reader might wonder why we would want to create an ambitious AI that could potentially destroy humanity but this issue, like all the rest of them, is not so black and white. We will briefly look at each of these methods of controlling AI motivation and then suggest some alternative ways in which this could be accomplished.
RULES TO GOVERN THE AI
This seems like a simple and straightforward way to control highly intelligent AI. We have the advantage here because humans have to program the AI in the first place. If we program this agent with rules or values that either align with ours or that penalize the AI if it doesn’t adopt these values on its own then we might be able to solve the control problem.
There are two ideas here. The first idea is to write a specific set of rules that the AI cannot break and the other idea is to somehow punish it if it happens to break those rules. The second idea assumes that breaking the rules would even be possible. However, if breaking the rules was possible then we would be able to punish the AI through a process called “fear learning”. In humans this process teaches us what we should and should not avoid. When we experience fear or a stress response in a bad situation fear learning takes place that seeks to prevent us from diving into a situation like that ever again.
This is where chronic fears in humans reside from which means we might be able to replicate this process with computer code and manipulate the AI in this manner. The idea of controlling a highly intelligent agent with simulated reward learning and fear learning will be discussed more in depth in a future blog post.
We return to the question of if we were to write rules for an AI to follow then what would they be? For example what if we decide to make one of the rules something like “no AI should ever harm a human being” would this work? Unfortunately it presents a whole host of problems including the fact that there is not good way to define “harm” and there is no good way of weighing the value of physical harm and social or societal harm to the intelligent AI. Which means a rule like this would create spark all kinds of ethical problems which would be impossible to sort through. This example highlights that this motivation selection method is probably not our best option.
NON AMBITIOUS AI
What if we just created an agent that didn’t possess any motivation to pursue any goals at all and therefore didn’t pose any kind of existential threat to humanity. This is an interesting concept because it seems almost foolproof but as we know this is almost never the case when dealing with highly intelligent AI.
The first issue that arises is that we have no good way to predict how a highly intelligent agent would behave in general because there are simply too many possibilities and because a non ambitious would lack goals it would also lack predictable behavior. After all, behavior that is conducted in a linear way to pursue a specific goal is far easier to predict then behavior that lacks that kind of structure. We could end up in over our heads with a strange and docile AI that proves to be to unpredictable to actually use which could be dangerous.
A possible remedy to this is to scrap the idea of not programming the agent with any goal at all and instead program it with a goal of confining itself to a very limited set of actions that are predictable enough for us to control the agent. Perhaps combining this remedy with the more direct method of motivation selection we already described could be a workable solution. Directly specifying rules that were to govern such a simple goal after the agent confines itself to a very narrow domain to limit its own impact on the world would be much simpler then specifying rules to govern a larger and more ambitious goal. We will leave the reader to ponder this interesting possibility.
CONTROLLING AN AGENT’S INCENTIVES
There is another way to potentially control the actions of an agent without directly attempting to govern its behavior with rules or make it docile with a lack of ambition. It involves creating an AI in a very specific environment that is rigged to make the agent act in such a way that promotes the interests of the programmers. In humans our environments and the cultures we grew up in influenced us more than we may realize. The same idea can possibly be applied to highly intelligent AI.
The idea here is to create the AI and keep it confined to a virtual world that closely resembles our world enough to trick the agent into believing that it isn’t living in an augmented reality. This would allow us to influence the AI with any kind of environment we wish and also allow us to experiment with the agent’s responses to different scenarios before bringing it into the real world.
We would be able to allow the AI to act freely within its own world and ensure that along with a direct specification motivation control method the agent will develop new human-friendly goals. These things take place in humans all the time. We tend to internalize social norms and ideologies and we begin to value other people for the sake of group survival. This isn’t a flawless and universal dynamic among intelligent systems but it would be a great start.
This means that we must design the AI to acquire values and goals in the same way humans do. If that was the case than the agent would emerge from the simulation having acquired the necessary values to consider the control problem solved. Obviously if this outcome did not occur the simulation could just be ran again with a new seed AI after the problems that caused the first scenario to fail were fixed. This could be repeated as many times as necessary. This method is just a small part of the control problem as a whole. The control problem in its entirety will be covered in its own series of blog posts sometime in the near future.
Each of the possible methods comes with different risks and drawbacks and present different degrees of difficulty in its implementation. Perhaps it might be in our best interest to gather all the possible solutions we can and rank them from best to worst. After that we could seek to figure out which methods could be combined with one another to get the best outcome.