Note: This is an old essay. Altman and OpenAI have significantly changed their tune. I now consider OpenAI to be an extremely sensible organization. Musk, as far as I can tell, still believes in something like the friendly selection postulate.
About ten months ago, Elon Musk and Sam Altman launched OpenAI, an artificial intelligence research venture with a focus on an open distribution of its software and discoveries. In this time Open AI has published some neat research and hired some excellent engineers. It has acted, in short, more or less like DeepMind and FAIR and all the other AI research organizations that have sprung up in the last five years. Unlike these other organizations, however, OpenAI was founded with two philosophical claims in mind:
Artificial superintelligence has the potential to be an existential risk to human civilization.
A future in which we have an ecosystem of competing artificial agents is likely to be safer than a future in which one agent is able to out-compete all others.
I agree with the first claim.
The second claim, though, I have my doubts about. Though attractive and intuitive on first inspection, I believe this attractiveness and intuitiveness is the result of an incorrect surface analogy to economics and biology and that on deeper inspection it is actually quite troubling, providing very general reasons to expect an AI ecology to be far less amenable to before-the-fact safety interventions than a single-agent outcome
Let’s examine OpenAI’s approach to safety. Here Altman summarizes it briefly:
Just like humans protect against Dr. Evil by the fact that most humans are good, and the collective force of humanity can contain the bad elements, we think it’s far more likely that many, many AIs, will work to stop the occasional bad actors than the idea that there is a single AI a billion times more powerful than anything else.
Altman frames his statement as if the advent of smarter-than-human AI is merely the introduction of a new form of citizen or market participant. But in the context of an ecology of competing agents, another frame is it is a new form of life with a very, very fast rate of reproduction. Not so much a new participant in the economy as a new player in that most ancient of games: evolution by natural selection. It is my belief that Altman’s analogy to law and markets can ultimately be reduced to a far less comforting analogy to natural selection.
One might object to this frame by pointing out that natural selection relies on mutation to provide variation and that since, with modern error-correction, software can be copied with perfect fidelity it won’t apply to artificial intelligence. Even if we ignore the fact that most modern machine learning algorithms are non-deterministic, this objection is still false, for though mutation is a source of variation, it is not the only possible source of variation. Artificial intelligences, being intelligent, will be able to alter their own design and will have divergent utility functions — otherwise, they won’t be competing. Should the future contain multiple, competing artificial superintelligences, this source of variation is more than sufficient for selection to take hold.
I will propose and find wanting a claim that is similar to Altman and Musk’s, but with a more biological twist. This may be less convincing and one OpenAI may or may not agree with. I hope to demonstrate that this statement is false, and afterword demonstrate that OpenAI’s more intuitively appealing less-clearly-stated arguments are actually equivalent.
I will refer to this claim as the “friendly selection postulate.” It is as follows:
Though perhaps no individual superhuman artificial intelligence can be trusted to preserve human interests, an ecology of AIs in equilibrium is likely to experience selective pressures that will align individual AI’s goals away from those of what we regard as bad actors and toward humanity’s interest.
Is this claim correct? How will selection work when these entities, that are both smarter than humans and able to trivially alter their own source code, exist and make copies of themselves in a competitive environment? Will this competitive environment tend to select for agents or collections of agents that have some incentive to treat their inferiors in a manner that assures their safety?
A More Bostromian Vocabulary
Before I attempt to answer these questions I want to introduce three terms used by philosopher Nick Bostrom in his analysis of AI risk. The first term is “utility function” which is the ultimate, terminal goal of an agent. It is what an agent is attempting to maximize, whether it be paperclips, the value of Google’s stock, or human flourishing.
The second term is “singleton.” A singleton is an agent that has achieved such power as to be able to eliminate all competing agents from its environment. Here Bostrom defines the term:
In set theory, a singleton is a set with only one member, but as I introduced the notion, the term refers to a world order in which there is a single decision-making agency at the highest level. Among its powers would be (1) the ability to prevent any threats (internal or external) to its own existence and supremacy, and (2) the ability to exert effective control over major features of its domain
The third term is “multipolar outcome,” which Bostrom describes here:
Another scenario is multipolar, where the transition to superintelligence is slower, and there are many different systems at roughly comparable level of development. In that scenario, you have economic and evolutionary dynamics coming into play.
Self-Aware Natural Selection Without Coordination
Now that we have the definitions out of the way, we can get back to those questions. To do this, let’s imagine we have a world full of millions of super-human artificial agents, all of them with the same hardware and level of cognitive ability. Imagine also that they live in a world of perfect surveillance, so each time any agent makes an advance in improving its own cognition, all other agents instantly apply it to themselves. Imagine, too, there is no coordination or goal aggregation possible between agents in this world.
This is, in essence, the perfect multipolar outcome. A future which according to the friendly selection postulate would be safer than a single-AI scenario. What are the evolutionary dynamics of this world? As we’ve removed variation in cognitive ability in this scenario, what variation remains to be selected upon? The answer is variation in utility functions.
Like many processes with complex outputs, natural selection can be summed up starkly: that which reproduces more effectively than its competitors will displace its competitors. The fittest possible agent in these conditions is an agent that desires only to replicate itself. Why is this? Say you have agent X who values what humans value (love, art, happiness, fun, community, etc) and agent Y who values only to make copies of itself. Agent Y is able to make more copies of itself than agent X as it doesn’t have to spend resources maximizing human value in the universe. Because these agents are intelligent and aware of natural selection, their’s is an odd sort of self-aware natural selection. So agent X would realize this and have incentive to approximate Y as much as necessary to compete with agent Y while maximizing its utility function, but it could not do so completely without abandoning its utility function entirely. Its values would be a sort of stone shackled to it, making it less competitive than any agent that cares more about making more copies of itself. Because of this and the fact that all agents have equivalent initial power in this contrived scenario, we should expect this world to quickly be filled by agent Ys or agents that are able to approximate agent Y very efficiently, and these agents to displace those with more complex values.
Thus in the limiting case of a perfect and instantaneous distribution of new technological capabilities among competing artificial agents and an absence of coordination or aggregation of utility functions among agents, humanity and our values will be selected against by highly-accelerated evolutionary forces.
Self-Aware Natural Selection With Coordination
Now lets take the same scenario and remove the restriction on coordination and aggregation of utility functions. But first, what do I mean by “aggregation of utility functions?”
The ability to alter one’s own source code gives an artificial intelligence what is essentially the perfect coordination tool. It allows an agent to alter its own utility function in those circumstances where it thinks altering its own utility function in some manner will get it more future utility (by its pre-modification definition) than it would otherwise. How would this work?
Suppose we have agent P whose utility function is to maximize the number of paperclips in the universe and agent S whose utility function is to maximize the number of staples in the universe.
Agents P and S are the two most powerful agents in the world with no other competitors.
Seemingly, they are at permanent odds. Utility function aggregation provides a solution. Agent P might reason as follows: a future fighting with agent S over the universes’ resources is going to result in fewer paperclips than a future in which agent S and I agree to maximize both paperclips and staples. If agent S reasons similarly, the two have a perfect means of enforcing this agreement: altering their utility functions, so they both intrinsically desire to maximize both. Thus in this manner agent P and S aggregate their utility functions, replacing themselves with one agent, agent PS, who happily converts the universe into paperclips and staples in equal measure with no further competition.
From the perspective of the friendly selection postulate, coordination and aggregation of utility functions seems something of a lateral play. Why is this? First lets recall agents X and Y from before. Agent X values everything we value and agent Y wants only to make more copies of itself. Suppose Agent X and Y aggregate their goals, becoming agent XY. You are now left with an agent XY with more resources but pretty complex goals. If goal aggregation is able to give enough power to XY to become a singleton we are left with a singleton, which is exactly what the friendly selection postulate tells us we should avoid. Worse, it is a singleton with by-definition compromised goals. If utility function aggregation isn’t able to give XY enough power to become a singleton, we are back to the world in which agents most able to approximate agent Y have the advantage.
So we are stuck in a bit of a catch-22. In the case of evolution without goal aggregation, human values will be selected against in favor of agent-Y-like values. In the case of evolution with goal aggregation, agents willing to aggregate their utility functions with other agents are able to gain more power, at least temporarily.
However, unless this power can be parlayed into a decisive strategic advantage that pushes them out of a situation where natural selection applies, forming a singleton able to permanently wipe out the competition, these agents will always eventually lose out to agent Ys. And worse, should a singleton be formed, it will likely be less able to maximize human value than the most human-friendly agent introduced into the AI ecosystem, because its goals will be compromised to some extent during the process of aggregating its utility function with competing agents. And if this is the case, why not attempt to build the friendly singleton from the beginning, bypassing all the messy aggregation business?
A Dressed-up Friendly Selection Postulate
The friendly selection postulate seems to hold no water, but what of OpenAI’s claim? As I said above, I believe OpenAI’s claims are equivalent to the friendly selection postulate, though more-appealingly relayed.
Here Alman and Musk describes OpenAI’s strategy with journalist Steven Levy:
Levy: How did this come about? […]
Musk: Philosophically there’s an important element here: we want AI to be widespread. There’s two schools of thought? — ?do you want many AIs, or a small number of AIs? We think probably many is good. And to the degree that you can tie it to an extension of individual human will, that is also good. […]
Altman: We think the best way AI can develop is if it’s about individual empowerment and making humans better, and made freely available to everyone, not a single entity that is a million times more powerful than any human. Because we are not a for-profit company, like a Google, we can focus not on trying to enrich our shareholders, but what we believe is the actual best thing for the future of humanity.
Levy: Couldn’t your stuff in OpenAI surpass human intelligence?
Altman: I expect that it will, but it will just be open source and useable by everyone instead of useable by, say, just Google. Anything the group develops will be available to everyone. If you take it and repurpose it you don’t have to share that. But any of the work that we do will be available to everyone.
Levy: If I’m Dr. Evil and I use it, won’t you be empowering me?
Musk: I think that’s an excellent question and it’s something that we debated quite a bit.
Altman: There are a few different thoughts about this. Just like humans protect against Dr. Evil by the fact that most humans are good, and the collective force of humanity can contain the bad elements, we think it’s far more likely that many, many AIs, will work to stop the occasional bad actors than the idea that there is a single AI a billion times more powerful than anything else. If that one thing goes off the rails or if Dr. Evil gets that one thing and there is nothing to counteract it, then we’re really in a bad place.
Though full of wonderful rhetoric, their strategy would result in the creation of many agents with may differing goals. As I have demonstrated above, there is good reason to believe this would lead to an ecology in which natural selection takes hold, and should this happen, any agent with values more complex than a deliberate desire to create more copies of oneself will be selected against, leading to a future where agent Ys and those able to approximate them displace humans. And should goal aggregation occur, the resulting singleton will always have a utility function which, from the human perspective, is compromised at least to some extent.
If disaster is the result of OpenAI’s equal playing field, what are our alternatives? One, of course, is to prevent the development of artificial intelligence completely. This seems impossible given how actively AI is being pursued by multiple parties. The second alternative is to bite the bullet and try to actively create a singleton with a utility function that accounts for humans and our values, which seems merely very, very, very, very unlikely.
Regardless, a competitive ecology isn’t a solution. Should we want a safe outcome from AI, we need a means of specifying human value, and an agent with enough headroom above the competition to maximize that value without being outcompeted.
As we have (at least temporarily) escaped the Malthusian condition, modern humans have little direct experience of how hellish an ecosystem can be, so it is easy for us to anthropomorphize mother nature as a kindly force with vague but beneficent goals. It is a useful to remind ourselves that this intuition is false. Natural selection is a search process that is literally powered by death, this the most literal use of the word “literally.” It cares nothing for human suffering or happiness, for it has no mind to care with. It has no purpose or motivations beyond this: that which reproduces more effectively than its competitors will displace its competitors.
Working towards a human-friendly singleton is our only safe bet, for any outcome with competing agents, no matter how you dress it up, leads to natural selection. And natural selection is Lovecraftian horror not an AI safety solution.