Superintelligence examines the risks and challenges of creating AI smarter than humans.
The following are the key points I highlighted in this book. If you’d like, you can download all of them to chat about with your favorite language model.
The fact that there are many paths that lead to superintelligence should increase our confidence that we will eventually get there. If one path turns out to be blocked, we can still progress.
This is not to say that it is a matter of indifference how we get to machine superintelligence. The path taken to get there could make a big difference to the eventual outcome.
There are many ways in which such decomposition could be done. Here we will differentiate between three forms: speed superintelligence, collective superintelligence, and quality superintelligence.
Over the course of human prehistory, and again over the course of human history, humanity’s collective intelligence has grown by very large factors. World population, for example, has increased by at least a factor of a thousand since the Pleistocene.9 On this basis alone, current levels of human collective intelligence could be regarded as approaching superintelligence relative to a Pleistocene baseline. Some improvements in communications technologies—especially spoken language, but perhaps also cities, writing, and printing—could also be argued to have, individually or in combination, provided super-sized boosts, in the sense that if another innovation of comparable impact to our collective intellectual problem-solving capacity were to happen, it would result in collective superintelligence.
Recall the distinction between these two questions: How hard is it to attain roughly human levels of cognitive ability? And how hard is it to get from there to superhuman levels? The first question is mainly relevant for predicting how long it will be before the onset of a takeoff. It is the second question that is key to assessing the shape of the takeoff, which is our aim here. And though it might be tempting to suppose that the step from human level to superhuman level must be the harder one—this step, after all, takes place “at a higher altitude” where capacity must be superadded to an already quite capable system—this would be a very unsafe assumption. It is quite possible that recalcitrance falls when a machine reaches human parity.
If a project begins to look promising—which will happen when a system passes the human baseline if not before—it might attract additional investment, increasing . If the project’s accomplishments are public, might also rise as the progress inspires greater interest in machine intelligence generally and as various powers scramble to get in on the game. During the transition phase, therefore, total optimization power applied to improving a cognitive system is likely to increase as the capability of the system increases.
If recalcitrance continues to fall along this hyperbolic pattern, then when the AI reaches the crossover point the total amount of optimization power applied to improving the AI has doubled. We then have The next doubling occurs 7.5 months later. Within 17.9 months, the system’s capacity has grown a thousandfold, thus obtaining speed superintelligence (Figure 9). This particular growth trajectory has a positive singularity at t = 18 months. In reality, the assumption that recalcitrance is constant would cease to hold as the system began to approach the physical limits to information processing, if not sooner.
Control Problem and Safety Measures
Goal coordination. Human collectives are replete with inefficiencies arising from the fact that it is nearly impossible to achieve complete uniformity of purpose among the members of a large group—at least until it becomes feasible to induce docility on a large scale by means of drugs or genetic selection. A “copy clan” (a group of identical or almost identical programs sharing a common goal) would avoid such coordination problems.
This is especially true for a superintelligence, which could devise extremely clever but counterintuitive plans to realize its goals, possibly even exploiting as-yet undiscovered physical phenomena.What is predictable is that the convergent instrumental values would be pursued and used to realize the agent’s final goals—not the specific actions that the agent would take to achieve this.
With the help of the concept of convergent instrumental value, we can see the flaw in one idea for how to ensure superintelligence safety. The idea is that we validate the safety of a superintelligent AI empirically by observing its behavior while it is in a controlled, limited environment (a “sandbox”) and that we only let the AI out of the box if we see it behaving in a friendly, cooperative, responsible manner. The flaw in this idea is that behaving nicely while in the box is a convergent instrumental goal for friendly and unfriendly AIs alike.
We observe here how it could be the case that when dumb, smarter is safer; yet when smart, smarter is more dangerous. There is a kind of pivot point, at which a strategy that has previously worked excellently suddenly starts to backfire. We may call the phenomenon the treacherous turn.
A treacherous turn could also come about if the AI discovers an unanticipated way of fulfilling its final goal as specified. Suppose, for example, that an AI’s final goal is to “make the project’s sponsor happy.” Initially, the only method available to the AI to achieve this outcome is by behaving in ways that please its sponsor in something like the intended manner. The AI gives helpful answers to questions; it exhibits a delightful personality; it makes money. The more capable the AI gets, the more satisfying its performances become, and everything goeth according to plan—until the AI becomes intelligent enough to figure out that it can realize its final goal more fully and reliably by implanting electrodes into the pleasure centers of its sponsor’s brain, something assured to delight the sponsor immensely. Of course, the sponsor might not have wanted to be pleased by being turned into a grinning idiot; but if this is the action that will maximally realize the AI’s final goal, the AI will take it. If the AI already has a decisive strategic advantage, then any attempt to stop it will fail. If the AI does not yet have a decisive strategic advantage, then the AI might temporarily conceal its canny new idea for how to instantiate its final goal until it has grown strong enough that the sponsor and everybody else will be unable to resist. In either case, we get a treacherous turn.
Since the behavioral approach is unavailing, we must look for alternatives. We can divide potential control methods into two broad classes: capability control methods, which aim to control what the superintelligence can do; and motivation selection methods, which aim to control what it wants to do.
Capability control methods seek to prevent undesirable outcomes by limiting what the superintelligence can do. This might involve placing the superintelligence in an environment in which it is unable to cause harm (boxing methods) or in which there are strongly convergent instrumental reasons not to engage in harmful behavior (incentive methods). It might also involve limiting the internal capacities of the superintelligence (stunting). In addition, capability control methods might involve the use of mechanisms to automatically detect and react to various kinds of containment failure or attempted transgression (tripwires).
For extra security, the system should be placed in a metal mesh to prevent it from transmitting radio signals, which might otherwise offer a means of manipulating electronic objects such as radio receivers in the environment. Note, by the way, how easy it might have been to overlook the need for this precaution. One might naively have assumed that an agent without a manipulator could not affect the external world. But it might be possible for a machine intelligence to generate radio waves even when it lacks access to external manipulators, simply “by thinking” (that is, by shuffling the electrons in its circuitry in particular patterns). Once pointed out, this vulnerability can be patched by enclosing the system in a Faraday cage—but we are left wondering how many other similarly subtle vulnerabilities might exist. Each time we hear of a seemingly foolproof security design that has an unexpected flaw, we should prick up our ears. These occasions grace us with the opportunity to abandon a life of overconfidence and resolve to become better Bayesians.
Although routing the outbound communications of a machine intelligence to a human gatekeeper does not guarantee safety, it could serve as an auxiliary safety measure. There are precautions that might reduce the risk of the AI escaping through the gatekeeper. The AI could be allowed to communicate only via a low-bandwidth teletype interface. This would make it slightly harder to psychologically manipulate the gatekeepers (precluding the use of emotive sound and imagery and tricks such as flashing hypnotic patterns to induce a trancelike state). Going further, the content (not just the mode of presentation) of the output could be restricted; so that, for example, the AI would be constrained to giving single-bit answers to yes/no questions.
As soon as the system is observed, it ceases to be informationally isolated. Information can be transmitted not only via messages that an AI sends out through a designated “output channel” but also via any observation an outsider makes of some causal consequence of the AI’s workings, direct or indirect—its power consumption, its CPU and memory usage, its computational states, or indeed any traces left behind after it has been shut down. An AI anticipating that it might be observed in any of these ways could strategically adopt behaviors designed to influence the hypothesized observers. For example, the AI could “malfunction” in whichever way would most increase the probability that it will ultimately be reactivated or that another superintelligence will be created with goals similar to its own.
The fact that certain information is included and other information is not could tell an AI something about the state of human science, the methods and instruments available to study peptides, the fabrication technologies used to make these instruments, and the nature of the brains and societies that conceived the studies and the instruments. It might be that a superintelligence could correctly surmise a great deal from what seem, to dull-witted human minds, meager scraps of evidence. Even without any designated knowledge base at all, a sufficiently superior mind might be able to learn much by simply introspecting on the workings of its own psyche—the design choices reflected in its source code, the physical characteristics of its circuitry. Perhaps a superintelligence could even deduce much about the likely properties of the world a priori (combining logical inference with a probability prior biased toward simpler worlds, and a few elementary facts implied by the superintelligence’s existence as a reasoning system). It might imagine the consequences of different possible laws of physics: what kind of planets would form, what kind of intelligent life would evolve, what kind of societies would develop, what kind of methods to solve the control problem would be attempted, how those methods could be defeated. For these reasons it would be imprudent to rely on information deprivation as the main check on a superintelligence’s power.
Tripwires are more closely related to stunting methods. Like stunting, tripwires could be used as a temporary safeguard, providing a degree of protection during the development phase. In principle, tripwires can also be used during the operational phase, particularly for a boxed system. However, the ability of tripwires to constrain a full-fledged superintelligence must remain very much in doubt, since it would be hard for us to assure ourselves that such an agent could not find ways to subvert any tripwire devised by the human intellect.
Detectors could be placed around a boxed AI to detect attempts to breach the containment. For example, detectors could intercept attempts at radio communication or at accessing internal computational resources intended to be off limits. An “Ethernet port of Eden” could be installed: an apparent connection to the internet that leads to a shutdown switch.
Perhaps the closest existing analog to a rule set that could govern the actions of a superintelligence operating in the world at large is a legal system. But legal systems have developed through a long process of trial and error, and they regulate relatively slowly-changing human societies. Laws can be revised when necessary. Most importantly, legal systems are administered by judges and juries who generally apply a measure of common sense and human decency to ignore logically possible legal interpretations that are sufficiently obviously unwanted and unintended by the lawgivers. It is probably humanly impossible to explicitly formulate a highly complex set of detailed rules, have them apply across a highly diverse set of circumstances, and get it right on the first implementation.
A small error in either the philosophical account or its translation into code could have catastrophic consequences.
questions it is asked. The direct specification of such a domesticity goal is more likely to be feasible than the direct specification of either a more ambitious goal or a complete rule set for operating in an open-ended range of situations. Significant challenges nonetheless remain. Care would have to be taken, for instance, in the definition of what it would be for the AI to “minimize its impact on the world” to ensure that the measure of the AI’s impact coincides with our own standards for what counts as a large or a small impact. A bad measure would lead to bad trade-offs. There are also other kinds of risk associated with building an oracle, which we will discuss later.
Although making an oracle safe through the use of motivation selection might be far from trivial, it may nevertheless be easier than doing the same for an AI that roams the world in pursuit of some complicated goal. This is an argument for preferring that the first superintelligence be an oracle.
For example, consider the risk that an oracle will answer questions not in a maximally truthful way but in such a way as to subtly manipulate us into promoting its own hidden agenda. One way to slightly mitigate this threat could be to create multiple oracles, each with a slightly different code and a slightly different information base. A simple mechanism could then compare the answers given by the different oracles and only present them for human viewing if all the answers agree.
Even if the oracle itself works exactly as intended, there is a risk that it would be misused. One obvious dimension of this problem is that an oracle AI would be a source of immense power which could give a decisive strategic advantage to its operator. This power might be illegitimate and it might not be used for the common good.
One option would be to try to build a genie such that it would automatically present the user with a prediction about salient aspects of the likely outcomes of a proposed command, asking for confirmation before proceeding. Such a system could be referred to as a genie-with-a-preview. But if this could be done for a genie, it could likewise be done for a sovereign. So again, this is not a clear differentiator between a genie and a sovereign. (Supposing that a preview functionality could be created, the questions of whether and if so how to use it are rather less obvious than one might think, notwithstanding the strong appeal of being able to glance at the outcome before committing to making it irrevocable reality.
If these were the only relevant factors, then the order of desirability would seem clear: an oracle would be safer than a genie, which would be safer than a sovereign; and any initial differences in convenience and speed of operation would be relatively small and easily dominated by the gains in safety obtainable by building an oracle. However, there are other factors that need to be taken into account. When choosing between castes, one should consider not only the danger posed by the system itself but also the dangers that arise out of the way it might be used. A genie most obviously gives the person who controls it enormous power, but the same holds for an oracle. A sovereign, by contrast, could be constructed in such way as to accord no one person or group any special influence over the outcome, and such that it would resist any attempt to corrupt or alter its original agenda. What is more, if a sovereign’s motivation is defined using “indirect normativity” (a concept to be described in Chapter 13) then it could be used to achieve some abstractly defined outcome, such as “whatever is maximally fair and morally right”—without anybody knowing in advance what exactly this will entail.
In order to be able to enforce treaties concerning the vital security interests of rival states, the external enforcement agency would in effect need to constitute a singleton: a global superintelligent Leviathan. One difference, however, is that we are now considering a post-transition situation, in which the agents that would have to create this Leviathan would have greater competence than we humans currently do. These Leviathan-creators may themselves already be superintelligent. This would greatly improve the odds that they could solve the control problem and design an enforcement agency that would serve the interests of all the parties that have a say in its construction.
Capability control is, at best, a temporary and auxiliary measure. Unless the plan is to keep superintelligence bottled up forever, it will be necessary to master motivation selection.
The reward sequence rk, …, rm is implied by the percept sequence xk:m, since the reward that the agent receives in a given cycle is part of the percept that the agent receives in that cycle. As argued earlier, this kind of reinforcement learning is unsuitable in the present context because a sufficiently intelligent agent will realize that it could secure maximum reward if it were able to directly manipulate its reward signal (wireheading).
Exploring non-ideal but more easily implementable approaches can make sense—not with the intention of using them, but to have something to fall back upon in case an ideal solution should not be ready in time.
One could mitigate this problem by taking small enhancement steps and by letting the test run for a long time. Such caution, however, would raise the cost and slow progress (which, if a race dynamic is occurring, could mean a project employing these safety measures would place itself at a disadvantage).
As earlier chapters revealed, there are risks in creating a superintelligent oracle (such as risks of mind crime or infrastructure profusion).
A prospective intelligence explosion, however, may present a challenge of a different kind. The control problem calls for foresight, reasoning, and theoretical insight. It is less clear how increased historical experience would help. Direct experience of the intelligence explosion is not possible (until too late), and many features conspire to make the control problem unique and lacking in relevant historical precedent. For these reasons, the amount of time that will elapse before the intelligence explosion may not matter much per se. Perhaps what matters, instead, is (a) the amount of intellectual progress on the control problem achieved by the time of the detonation; and (b) the amount of skill and intelligence available at the time to implement the best available solutions (and to improvise what is missing).
One reason why cognitive enhancement might cause more progress to have been made on the control problem by the time the intelligence explosion occurs is that progress on the control problem may be especially contingent on extreme levels of intellectual performance—even more so than the kind of work necessary to create machine intelligence.
Another reason why cognitive enhancement should differentially promote progress on the control problem is that the very need for such progress is more likely to be appreciated by cognitively more capable societies and individuals. It requires foresight and reasoning to realize why the control problem is important and to make it a priority. It may also require uncommon sagacity to find promising ways of approaching such an unfamiliar problem.
Collaboration thus offers many benefits. It reduces the haste in developing machine intelligence. It allows for greater investment in safety. It avoids violent conflicts. And it facilitates the sharing of ideas about how to solve the control problem. To these benefits we can add another: collaboration would tend to produce outcomes in which the fruits of a successfully controlled intelligence explosion get distributed more equitably.
Value Loading and Alignment
Instead, we should recognize that there can exist instrumentally powerful information processing systems—intelligent systems—that are neither inherently good nor reliably wise.
Human individuals and human organizations typically have preferences over resources that are not well represented by an “unbounded aggregative utility function.” A human will typically not wager all her capital for a fifty–fifty chance of doubling it. A state will typically not risk losing all its territory for a ten percent chance of a tenfold expansion. For individuals and governments, there are diminishing returns to most resources. The same need not hold for AIs. (We will return to the problem of AI motivation in subsequent chapters.) An AI might therefore be more likely to pursue a risky course of action that has some chance of giving it control of the world. Humans and human-run organizations may also operate with decision processes that do not seek to maximize expected utility. For example, they may allow for fundamental risk aversion, or “satisficing” decision rules that focus on meeting adequacy thresholds, or “deontological” side-constraints that proscribe certain kinds of action regardless of how desirable their consequences. Human decision makers often seem to be acting out an identity or a social role rather than seeking to maximize the achievement of some particular objective. Again, this need not apply to artificial agents.
There is nothing paradoxical about an AI whose sole final goal is to count the grains of sand on Boracay, or to calculate the decimal expansion of pi, or to maximize the total number of paperclips that will exist in its future light cone. In fact, it would be easier to create an AI with simple goals like these than to build one that had a human-like set of values and dispositions. Compare how easy it is to write a program that measures how many digits of pi have been calculated and stored in memory with how difficult it would be to create a program that reliably measures the degree of realization of some more meaningful goal—human flourishing, say, or global justice. Unfortunately, because a meaningless reductionistic goal is easier for humans to code and easier for an AI to learn, it is just the kind of goal that a programmer would choose to install in his seed AI if his focus is on taking the quickest path to “getting the AI to work” (without caring much about what exactly the AI will do, aside from displaying impressively intelligent behavior).
There are at least three directions from which we can approach the problem of predicting superintelligent motivation: — Predictability through design. If we can suppose that the designers of a superintelligent agent can successfully engineer the goal system of the agent so that it stably pursues a particular goal set by the programmers, then one prediction we can make is that the agent will pursue that goal. The more intelligent the agent is, the greater the cognitive resourcefulness it will have to pursue that goal. So even before an agent has been created we might be able to predict something about its behavior, if we know something about who will build it and what goals they will want it to have. — Predictability through inheritance. If a digital intelligence is created directly from a human template (as would be the case in a high-fidelity whole brain emulation), then the digital intelligence might inherit the motivations of the human template.7 The agent might retain some of these motivations even if its cognitive capacities are subsequently enhanced to make it superintelligent. This kind of inference requires caution. The agent’s goals and values could easily become corrupted in the uploading process or during its subsequent operation and enhancement, depending on how the procedure is implemented. — Predictability through convergent instrumental reasons. Even without detailed knowledge of an agent’s final goals, we may be able to infer something about its more immediate objectives by considering the instrumental reasons that would arise for any of a wide range of possible final goals in a wide range of situations. This way of predicting becomes more useful the greater the intelligence of the agent, because a more intelligent agent is more likely to recognize the true instrumental reasons for its actions, and so act in ways that make it more likely to achieve its goals. (A caveat here is that there might be important instrumental reasons to which we are oblivious and which an agent would discover only once it reaches some very high level of intelligence—this could make the behavior of superintelligent agents less predictable.)
The instrumental convergence thesis Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent agents.
Where there are convergent instrumental values, we may be able to predict some aspects of a superintelligence’s behavior even if we know virtually nothing about that superintelligence’s final goals.
Self-preservation If an agent’s final goals concern the future, then in many scenarios there will be future actions it could perform to increase the probability of achieving its goals. This creates an instrumental reason for the agent to try to be around in the future—to help achieve its future-oriented goal. Most humans seem to place some final value on their own survival. This is not a necessary feature of artificial agents: some may be designed to place no final value whatever on their own survival. Nevertheless, many agents that do not care intrinsically about their own survival would, under a fairly wide range of conditions, care instrumentally about their own survival in order to accomplish their final goals.
Goal-content integrity If an agent retains its present goals into the future, then its present goals will be more likely to be achieved by its future self. This gives the agent a present instrumental reason to prevent alterations of its final goals. (The argument applies only to final goals. In order to attain its final goals, an intelligent agent will of course routinely want to change its subgoals in light of new information and insight.) Goal-content integrity for final goals is in a sense even more fundamental than survival as a convergent instrumental motivation. Among humans, the opposite may seem to hold, but that is because survival is usually part of our final goals. For software agents, which can easily switch bodies or create exact duplicates of themselves, preservation of self as a particular implementation or a particular physical object need not be an important instrumental value. Advanced software agents might also be able to swap memories, download skills, and radically modify their cognitive architecture and personalities. A population of such agents might operate more like a “functional soup” than a society composed of distinct semi-permanent persons. For some purposes, processes in such a system might be better individuated as teleological threads, based on their values, rather than on the basis of bodies, personalities, memories, or abilities. In such scenarios, goal-continuity might be said to constitute a key aspect of survival.
For example, even if a superintelligence’s final goals only concerned what happened within some particular small volume of space, such as the space occupied by its original home planet, it would still have instrumental reasons to harvest the resources of the cosmos beyond.
Thus, there is an extremely wide range of possible final goals a superintelligent singleton could have that would generate the instrumental goal of unlimited resource acquisition.
Second, the orthogonality thesis suggests that we cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans—scientific curiosity, benevolent concern for others, spiritual enlightenment and contemplation, renunciation of material acquisitiveness, a taste for refined culture or for the simple pleasures in life, humility and selflessness, and so forth. We will consider later whether it might be possible through deliberate effort to construct a superintelligence that values such things, or to build one that values human welfare, moral goodness, or any other complex purpose its designers might want it to serve. But it is no less possible—and in fact technically a lot easier—to build a superintelligence that places final value on nothing but calculating the decimal expansion of pi. This suggests that—absent a special effort—the first superintelligence may have some such random or reductionistic final goal. Third, the instrumental convergence thesis entails that we cannot blithely assume that a superintelligence with the final goal of calculating the decimals of pi (or making paperclips, or counting grains of sand) would limit its activities in such a way as not to infringe on human interests. An agent with such a final goal would have a convergent instrumental reason, in many situations, to acquire an unlimited amount of physical resources and, if possible, to eliminate potential threats to itself and its goal system. Human beings might constitute potential threats; they certainly constitute physical resources.
We will consider later whether it might be possible through deliberate effort to construct a superintelligence that values such things, or to build one that values human welfare, moral goodness, or any other complex purpose its designers might want it to serve. But it is no less possible—and in fact technically a lot easier—to build a superintelligence that places final value on nothing but calculating the decimal expansion of pi. This suggests that—absent a special effort—the first superintelligence may have some such random or reductionistic final goal.
We have already encountered the idea of perverse instantiation: a superintelligence discovering some way of satisfying the criteria of its final goal that violates the intentions of the programmers who defined the goal. Some examples: Final goal: “Make us smile” Perverse instantiation: Paralyze human facial musculatures into constant beaming smiles
Problems for the direct consequentialist approach are similar to those for the direct rule-based approach. This is true even if the AI is intended to serve some apparently simple purpose such as implementing a version of classical utilitarianism. For instance, the goal “Maximize the expectation of the balance of pleasure over pain in the world” may appear simple. Yet expressing it in computer code would involve, among other things, specifying how to recognize pleasure and pain.
If direct specification seems hopeless, we might instead try indirect normativity. The basic idea is that rather than specifying a concrete normative standard directly, we specify a process for deriving a standard. We then build the system so that it is motivated to carry out this process and to adopt whatever standard the process arrives at. For example, the process could be to carry out an investigation into the empirical question of what some suitably idealized version of us would prefer the AI to do. The final goal given to the AI in this example could be something along the lines of “achieve that which we would have wished the AI to achieve if we had thought about the matter long and hard.”
The last motivation selection method on our list is augmentation. Here the idea is that rather than attempting to design a motivation system de novo, we start with a system that already has an acceptable motivation system, and enhance its cognitive faculties to make it superintelligent. If all goes well, this would give us a superintelligence with an acceptable motivation system. This approach, obviously, is unavailing in the case of a newly created seed AI. But augmentation is a potential motivation selection method for other paths to superintelligence, including brain emulation, biological enhancement, brain–computer interfaces, and networks and organizations, where there is a possibility of building out the system from a normative nucleus (regular human beings) that already contains a representation of human value. The attractiveness of augmentation may increase in proportion to our despair at the other approaches to the control problem. Creating a motivation system for a seed AI that remains reliably safe and beneficial under recursive self-improvement even as the system grows into a mature superintelligence is a tall order, especially if we must get the solution right on the first attempt. With augmentation, we would at least start with a system that has familiar and human-like motivations.
If one were creating a genie, it would be desirable to build it so that it would obey the intention behind the command rather than its literal meaning, since a literalistic genie (one superintelligent enough to attain a decisive strategic advantage) might have a propensity to kill the user and the rest of humanity on its first use, for reasons explained in the section on malignant failure modes in Chapter 8. More broadly, it would seem important that the genie seek a charitable—and what human beings would regard as reasonable—interpretation of what is being commanded, and that the genie be motivated to carry out the command under such an interpretation rather than under the literalistic interpretation. The ideal genie would be a super-butler rather than an autistic savant.
The user asks the oracle for a plan to achieve a certain outcome, or for a technology to serve a certain function; and when the user follows the plan or constructs the technology, a perverse instantiation can ensue, just as if the AI had implemented the solution itself.
While the possibility of a pre-established harmony between what is valuable to us and what would be adaptive in a future digital ecology is hard to rule out, there are reasons for skepticism. Consider, first, that many of the costly displays we find in nature are linked to sexual selection. Reproduction among technologically mature life forms, in contrast, may be predominantly or exclusively asexual. Second, technologically advanced agents might have available new means of reliably communicating information about themselves, means that do not rely on costly display. Even today, when professional lenders assess creditworthiness they tend to rely more on documentary evidence, such as ownership certificates and bank statements, than on costly displays, such as designer suits and Rolex watches. In the future, it might be possible to employ auditing firms that verify through detailed examination of behavioral track records, testing in simulated environments, or direct inspection of source code, that a client agent possesses a claimed attribute. Signaling one’s qualities by agreeing to such auditing might be more efficient than signaling via flamboyant display. Such a professionally mediated signal would still be costly to fake—this being the essential feature that makes the signal reliable—but it could be much cheaper to transmit when truthful than it would be to communicate an equivalent signal flamboyantly. Third, not all possible costly displays are intrinsically valuable or socially desirable. Many are simply wasteful. The Kwakiutl potlatch ceremonies, a form of status competition between rival chiefs, involved the public destruction of vast amounts of accumulated wealth. Record-breaking skyscrapers, megayachts, and moon rockets may be viewed as contemporary analogs. While activities like music and humor could plausibly be claimed to enhance the intrinsic quality of human life, it is doubtful that a similar claim could be sustained with regard to the costly pursuit of fashion accessories and other consumerist status symbols. Worse, costly display can be outright harmful, as in macho posturing leading to gang violence or military bravado. Even if future intelligent life forms would use costly signaling, therefore, it is an open question whether the signal would be of a valuable sort—whether it would be like the rapturous melody of a nightingale or instead like the toad’s monosyllabic croak (or the incessant barking of a rabid dog).
The programmer has some particular human value in mind that he would like the AI to promote. To be concrete, let us say that it is happiness. (Similar issues would arise if we the programmer were interested in justice, freedom, glory, human rights, democracy, ecological balance, or self-development.) In terms of the expected utility framework, the programmer is thus looking for a utility function that assigns utility to possible worlds in proportion to the amount of happiness they contain. But how could he express such a utility function in computer code? Computer languages do not contain terms such as “happiness” as primitives. If such a term is to be used, it must first be defined. It is not enough to define it in terms of other high-level human concepts—“happiness is enjoyment of the potentialities inherent in our human nature” or some such philosophical paraphrase. The definition must bottom out in terms that appear in the AI’s programming language, and ultimately in primitives such as mathematical operators and addresses pointing to the contents of individual memory registers. When one considers the problem from this perspective, one can begin to appreciate the difficulty of the programmer’s task.
Identifying and codifying our own final goals is difficult because human goal representations are complex. Because the complexity is largely transparent to us, however, we often fail to appreciate that it is there.
The evaluation function, which is continuously updated in light of experience, could be regarded as incorporating a form of learning about value. However, what is being learned is not new final values but increasingly accurate estimates of the instrumental values of reaching particular states (or of taking particular actions in particular states, or of following particular policies). Insofar as a reinforcement-learning agent can be described as having a final goal, that goal remains constant: to maximize future reward. And reward consists of specially designated percepts received from the environment. Therefore, the wireheading syndrome remains a likely outcome in any reinforcement agent that develops a world model sophisticated enough to suggest this alternative way of maximizing reward.
Much of the information content in our final values is thus acquired from our experiences rather than preloaded in our genomes. For example, many of us love another person and thus place great final value on his or her well-being. What is required to represent such a value? Many elements are involved, but consider just two: a representation of “person” and a representation of “well-being.” These concepts are not directly coded in our DNA. Rather, the DNA contains instructions for building a brain, which, when placed in a typical human environment, will over the course of several years develop a world model that includes concepts of persons and of well-being. Once formed, these concepts can be used to represent certain meaningful values.
And if whole brain emulations of sufficient fidelity were available, it would seem easier to start with an adult brain that comes with full representations of some human values preloaded.
But perhaps we might design a more unabashedly artificial substitute mechanism that would lead an AI to import high-fidelity representations of relevant complex values into its goal system? For this to succeed, it may not be necessary to give the AI exactly the same evaluative dispositions as a biological human. That may not even be desirable as an aim—human nature, after all, is flawed and all too often reveals a proclivity to evil which would be intolerable in any system poised to attain a decisive strategic advantage. Better, perhaps, to aim for a motivation system that departs from the human norm in systematic ways, such as by having a more robust tendency to acquire final goals that are altruistic, compassionate, or high-minded in ways we would recognize as reflecting exceptionally good character if they were present in a human person. To count as improvements, however, such deviations from the human norm would have to be pointed in very particular directions rather than at random;
Another approach to the value-loading problem is what we may refer to as motivational scaffolding. It involves giving the seed AI an interim goal system, with relatively simple final goals that we can represent by means of explicit coding or some other feasible method. Once the AI has developed more sophisticated representational faculties, we replace this interim scaffold goal system with one that has different final goals. This successor goal system then governs the AI as it develops into a full-blown superintelligence. Because the scaffold goals are not just instrumental but final goals for the AI, the AI might be expected to resist having them replaced (goal-content integrity being a convergent instrumental value). This creates a hazard. If the AI succeeds in thwarting the replacement of its scaffold goals, the method fails.
When the agent makes a decision, it seeks to take actions that would be effective at realizing the values it believes are most likely to be described in the letter. Importantly, the agent would see a high instrumental value in learning more about what the letter says. The reason is that for almost any final value that might be described in the letter, that value is more likely to be realized if the agent finds out what it is, since the agent will then pursue that value more effectively.
To clarify, the difficulty here is not so much how to ensure that the AI can understand human intentions. A superintelligence should easily develop such understanding. Rather, the difficulty is ensuring that the AI will be motivated to pursue the described values in the way we intended. This is not guaranteed by the AI’s ability to understand our intentions: an AI could know exactly what we meant and yet be indifferent to that interpretation of our words (being motivated instead by some other interpretation of the words or being indifferent to our words altogether). The difficulty is compounded by the desideratum that, for reasons of safety, the correct motivation should ideally be installed in the seed AI before it becomes capable of fully representing human concepts or understanding human intentions. This requires that somehow a cognitive framework be created, with a particular location in that framework designated in the AI’s motivation system as the repository of its final value. But the cognitive framework itself must be revisable, so as to allow the AI to expand its representational capacities as it learns more about the world and grows more intelligent. The AI might undergo the equivalent of scientific revolutions, in which its worldview is shaken up and it perhaps suffers ontological crises in which it discovers that its previous ways of thinking about values were based on confusions and illusions. Yet starting at a sub-human level of development and continuing throughout all its subsequent development into a galactic superintelligence, the AI’s conduct is to be guided by an essentially unchanging final value, a final value that becomes better understood by the AI in direct consequence of its general intellectual progress—and likely quite differently understood by the mature AI than it was by its original programmers, though not different in a random or hostile way but in a benignly appropriate way. How to accomplish this remains an open question.
Last, but not least, there is the question of “what to write in the envelope”—or, less metaphorically, the question of which values we should try to get the AI to learn. But this issue is common to all approaches to the AI value-loading problem.
Yudkowsky’s proposal also involves the use of what he called “causal validity semantics.” The idea here is that the AI should do not exactly what the programmers told it to do but rather (something like) what they were trying to tell it to do. While the programmers are trying to explain to the seed AI what friendliness is, they might make errors in their explanations. Moreover, the programmers themselves may not fully understand the true nature of friendliness. One would therefore want the AI to have the ability to correct errors in the programmers’ thinking, and to infer the true or intended meaning from whatever imperfect explanations the programmers manage to provide. For example, the AI should be able to represent the causal processes whereby the programmers learn and communicate about friendliness. Thus, to pick a trivial example, the AI should understand that there is a possibility that a programmer might make a typo while inputting information about friendliness, and the AI should then seek to correct the error. More generally, the AI should seek to correct for whatever distortive influences may have corrupted the flow of information about friendliness as it passed from its source through the programmers to the AI (where “distortive” is an epistemic category). Ideally, as the AI matures, it should overcome any cognitive biases and other more fundamental misconceptions that may have prevented its programmers from fully understanding what friendliness is.
What we might call the “Hail Mary” approach is based on the hope that elsewhere in the universe there exist (or will come to exist) civilizations that successfully manage the intelligence explosion, and that they end up with values that significantly overlap with our own. We could then try to build our AI so that it is motivated to do what these other superintelligences want it to do.22 The advantage is that this might be easier than to build our AI to be motivated to do what we want directly.
Suppose we could obtain (a) a mathematically precise specification of a particular human brain and (b) a mathematically well-specified virtual environment that contains an idealized computer with an arbitrarily large amount of memory and CPU power. Given (a) and (b), we could define a utility function U as the output the human brain would produce after interacting with this environment. U would be a mathematically well-defined object, albeit one which (because of computational limitations) we may be unable to describe explicitly. Nevertheless, U could serve as the value criterion for a value learning AI, which could use various heuristics for assigning probabilities to hypotheses about what U implies. Intuitively, we want U to be the utility function that a suitably prepared human would output if she had the advantage of being able use an arbitrarily large amount of computing power—enough computing power, for example, to run astronomical numbers of copies of herself to assist her with her analysis of specifying a utility function, or to help her devise a better process for going about this analysis. (We are here foreshadowing a theme, “coherent extrapolated volition,” which will be further explored in Chapter 13.)
Institution design Various strong methods of social control could be applied in an institution composed of emulations. In principle, social control methods could also be applied in an institution composed of artificial intelligences. Emulations have some properties that would make them easier to control via such methods, but also some properties that might make them harder to control than AIs. Institution design seems worthy of further exploration as a potential value-loading technique. If we
we knew how to solve the value-loading problem, we would confront a further problem: the problem of deciding which values to load. What, in other words, would
Suppose we could install any arbitrary final value into a seed AI. The decision as to which value to install could then have the most far-reaching consequences. Certain other basic parameter choices—concerning the axioms of the AI’s decision theory and epistemology—could be similarly consequential. But foolish, ignorant, and narrow-minded that we are, how could we be trusted to make good design decisions? How could we choose without locking in forever the prejudices and preconceptions of the present generation?
The principle of epistemic deference A future superintelligence occupies an epistemically superior vantage point: its beliefs are (probably, on most topics) more likely than ours to be true. We should therefore defer to the superintelligence’s opinion whenever feasible.
Some examples will serve to make the idea clearer. First we will consider “coherent extrapolated volition,” an indirect normativity proposal outlined by Eliezer Yudkowsky. We will then introduce some variations and alternatives, to give us a sense of the range of available options.
However, although it would be difficult to know with precision what humanity’s CEV would wish, it is possible to make informed guesses. This is possible even today, without superintelligence. For example, it is more plausible that our CEV would wish for there to be people in the future who live rich and happy lives than that it would wish that we should all sit on stools in a dark room experiencing pain. If we can make at least some such judgments sensibly, so can a superintelligence.
Another objection is that there are so many different ways of life and moral codes in the world that it might not be possible to “blend” them into one CEV. Even if one could blend them, the result might not be particularly appetizing—one would be unlikely to get a delicious meal by mixing together all the best flavors from everyone’s different favorite dish.13 In answer to this, one could point out that the CEV approach does not require that all ways of life, moral codes, or personal values be blended together into one stew. The CEV dynamic is supposed to act only when our wishes cohere. On issues on which there is widespread irreconcilable disagreement, even after the various idealizing conditions have been imposed, the dynamic should refrain from determining the outcome.
The structure of the CEV approach thus allows for a virtually unlimited range of outcomes. It is also conceivable that humanity’s extrapolated volition would wish that the CEV does nothing at all. In that case, the AI implementing CEV should, upon having established with sufficient probability that this is what humanity’s extrapolated volition would wish it to do, safely shut itself down.
One parameter is the extrapolation base: Whose volitions are to be included? We might say “everybody,” but this answer spawns a host of further questions. Does the extrapolation base include so-called “marginal persons” such as embryos, fetuses, brain-dead persons, patients with severe dementias or who are in permanent vegetative states? Does each of the hemispheres of a “split-brain” patient get its own weight in the extrapolation and is this weight the same as that of the entire brain of a normal subject? What about people who lived in the past but are now dead? People who will be born in the future? Higher animals and other sentient creatures? Digital minds? Extraterrestrials?
What if we are not sure whether moral realism is true? We could still attempt the MR proposal. We should just have to make sure to specify what the AI should do in the eventuality that its presupposition of moral realism is false. For example, we could stipulate that if the AI estimates with a sufficient probability that there are no suitable non-relative truths about moral rightness, then it should revert to implementing coherent extrapolated volition instead, or simply shut itself down.
The path to endowing an AI with any of these concepts might involve giving it general linguistic ability (comparable, at least, to that of a normal human adult). Such a general ability to understand natural language could then be used to understand what is meant by “morally right.”
A more fundamental issue with MR is that even if can be implemented, it might not give us what we want or what we would choose if we were brighter and better informed. This is of course the essential feature of MR, not an accidental bug. However, it might be a feature that would be extremely harmful to us.
it is no good accelerating the development of a desirable technology Y if the only way of getting Y is by developing an extremely undesirable precursor technology X, or if getting Y would immediately produce an extremely undesirable related technology Z. Before you marry your sweetheart, consider the prospective in-laws.
Future Scenarios and Existential Risk
This is not false modesty: for while I believe that my book is likely to be seriously wrong and misleading, I think that the alternative views that have been presented in the literature are substantially worse—including the default view, or “null hypothesis,” according to which we can for the time being safely or reasonably ignore the prospect of superintelligence.
Consider how the rate of progress in the field of artificial intelligence would change in a world where Average Joe is an intellectual peer of Alan Turing or John von Neumann, and where millions of people tower far above any intellectual giant of the past.
Many machines and nonhuman animals already perform at superhuman levels in narrow domains. Bats interpret sonar signals better than man, calculators outperform us in arithmetic, and chess programs beat us in chess. The range of specific tasks that can be better performed by software will continue to expand. But although specialized information processing systems will have many uses, there are additional profound issues that arise only with the prospect of machine intellects that have enough general intelligence to substitute for humans across the board.
Superintelligence in any of these forms could, over time, develop the technology necessary to create any of the others.
Most preparations undertaken before onset of the slow takeoff would be rendered obsolete as better solutions would gradually become visible in the light of the dawning era.
A fast takeoff occurs over some short temporal interval, such as minutes, hours, or days. Fast takeoff scenarios offer scant opportunity for humans to deliberate. Nobody need even notice anything unusual before the game is already lost.
slow takeoff.10 In many cases, the laggard’s project
Various considerations thus point to an increased likelihood that a future power with superintelligence that obtained a sufficiently large strategic advantage would actually use it to form a singleton. The desirability of such an outcome depends, of course, on the nature of the singleton that would be created and also on what the future of intelligent life would look like in alternative multipolar scenarios.
The principal reason for humanity’s dominant position on Earth is that our brains have a slightly expanded set of faculties compared with other animals.1 Our greater intelligence lets us transmit culture more efficiently, with the result that knowledge and technology accumulates from one generation to the next. By now sufficient content has accumulated to make possible space flight, H-bombs, genetic engineering, computers, factory farms, insecticides, the international peace movement, and all the accouterments of modern civilization. Geologists have started referring to the present era as the Anthropocene in recognition of the distinctive biotic, sedimentary, and geochemical signatures of human activities.On one estimate, we appropriate 24% of the planetary ecosystem’s net primary production. And yet we are far from having reached the physical limits of technology. These observations make it plausible that any type of entity that developed a much greater than human level of intelligence would be potentially extremely powerful.
The magnitudes of the advantages are such as to suggest that rather than thinking of a superintelligent AI as smart in the sense that a scientific genius is smart compared with the average human being, it might be closer to the mark to think of such an AI as smart in the sense that an average human being is smart compared with a beetle or a worm.
This is the main reason why the question of takeoff speed is important—not because it matters exactly when a particular outcome happens, but because the speed of the takeoff may make a big difference to what the outcome will be. With a fast or medium takeoff, it is likely that one project will get a decisive strategic advantage. We have now suggested that a superintelligence with a decisive strategic advantage would have immense powers, enough that it could form a stable singleton—a singleton that could determine the disposition of humanity’s cosmic endowment.
An existential risk is one that threatens to cause the extinction of Earth-originating intelligent life or to otherwise permanently and drastically destroy its potential for future desirable development. Proceeding from the idea of first-mover advantage, the orthogonality thesis, and the instrumental convergence thesis, we can now begin to see the outlines of an argument for fearing that a plausible default outcome of the creation of machine superintelligence is existential catastrophe.
A superintelligence might threaten to mistreat, or commit to reward, sentient simulations in order to blackmail or incentivize various external agents; or it might create simulations in order to induce indexical uncertainty in outside observers.
Present-day search processes are not hazardous because they are too weak to discover the kind of plan that could enable a program to take over the world. Such a plan would include extremely difficult steps, such as the invention of a new weapons technology several generations ahead of the state of the art or the execution of a propaganda campaign far more effective than any communication devised by human spin doctors. To have a chance of even conceiving of such ideas, let alone developing them in a way that would actually work, a machine would probably need the capacity to represent the world in a way that is at least as rich and realistic as the world model possessed by a normal human adult (though a lack of awareness in some areas might possibly be compensated for by extra skill in others). This is far beyond the reach of contemporary AI.
If one is interested in the outcome of singleton scenarios, therefore, one really only has three sources of information: information about matters that cannot be affected by the actions of the singleton (such as the laws of physics); information about convergent instrumental values; and information that enables one to predict or speculate about what final values the singleton will have. In multipolar scenarios, an additional set of constraints comes into play, constraints having to do with how agents interact. The social dynamics emerging from such interactions can be studied using techniques from game theory, economics, and evolution theory. Elements of political science and sociology are also relevant insofar as they can be distilled and abstracted from some of the more contingent features of human experience.
In the United States, there were about 26 million horses in 1915. By the early 1950s, 2 million remained.
Imagine running on a treadmill at a steep incline—heart pounding, muscles aching, lungs gasping for air. A glance at the timer: your next break, which will also be your death, is due in 49 years, 3 months, 20 days, 4 hours, 56 minutes, and 12 seconds. You wish you had not been born.
We could thus imagine, as an extreme case, a technologically highly advanced society, containing many complex structures, some of them far more intricate and intelligent than anything that exists on the planet today—a society which nevertheless lacks any type of being that is conscious or whose welfare has moral significance. In a sense, this would be an uninhabited society. It would be a society of economic miracles and technological awesomeness, with nobody there to benefit. A Disneyland without children.
Or an AI might be given a prior that assigns a zero probability to the universe not being Turing-computable (this is in fact a common feature of many of the priors discussed in the literature, including the Kolmogorov complexity prior mentioned in Chapter 1), again with poorly understood consequences if the embedded assumption—known as the “Church–Turing thesis”—should turn out to be false.
Technological completion conjecture If scientific and technological development efforts do not effectively cease, then all important basic capabilities that could be obtained through some possible technology will be obtained.
The principle of differential technological development Retard the development of dangerous and harmful technologies, especially ones that raise the level of existential risk; and accelerate the development of beneficial technologies, especially those that reduce the existential risks posed by nature or by other technologies.
The longer it takes for superintelligence to arrive, the more such progress will have been made when it does. This is an important consideration in favor of later arrival dates—and a very strong consideration against extremely early arrival dates.
If you came upon a magic lever that would let you change the rate of macro-structural development, what should you do? Ought you to accelerate, decelerate, or leave things as they are?
A state risk is one that is associated with being in a certain state, and the total amount of state risk to which a system is exposed is a direct function of how long the system remains in that state. Risks from nature are typically state risks: the longer we remain exposed, the greater the chance that we will get struck by an asteroid, supervolcanic eruption, gamma ray burst, naturally arising pandemic, or some other slash of the cosmic scythe. Some anthropogenic risks are also state risks. At the level of an individual, the longer a soldier pokes his head up above the parapet, the greater the cumulative chance he will be shot by an enemy sniper. There are anthropogenic state risks at the existential level as well: the longer we live in an internationally anarchic system, the greater the cumulative chance of a thermonuclear Armageddon or of a great war fought with other kinds of weapons of mass destruction, laying waste to civilization.
Insofar as we are concerned with existential state risks, we should favor acceleration—provided we think we have a realistic prospect of making it through to a post-transition era in which any further existential risks are greatly reduced. — If it were known that there is some step ahead destined to cause an existential catastrophe, then we ought to reduce the rate of macro-structural development (or even put it in reverse) in order to give more generations a chance to exist before the curtain is rung down. But, in fact, it would be overly pessimistic to be so confident that humanity is doomed. — At present, the level of existential state risk appears to be relatively low. If we imagine the technological macro-conditions for humanity frozen in their current state, it seems very unlikely that an existential catastrophe would occur on a timescale of, say, a decade. So a delay of one decade—provided it occurred at our current stage of development or at some other time when state risk is low—would incur only a very minor existential state risk, whereas a postponement by one decade of subsequent technological developments might well have a significant beneficial impact on later existential step risks, for example by allowing more time for preparation.
Upshot: the main way that the speed of macro-structural development is important is by affecting how well prepared humanity is when the time comes to confront the key step risks.
If—as there is reason to believe—such neuromorphic AI is worse than the kind of AI that would otherwise have been built, and if by promoting whole brain emulation we would make neuromorphic AI arrive first, then our pursuit of the supposed best outcome (whole brain emulation) would lead to the worst outcome (neuromorphic AI); whereas if we had pursued the second-best outcome (synthetic AI) we might actually have attained the second-best (synthetic AI).
A related type of argument is that we ought—rather callously—to welcome small and medium-scale catastrophes on grounds that they make us aware of our vulnerabilities and spur us into taking precautions that reduce the probability of an existential catastrophe. The idea is that a small or medium-scale catastrophe acts like an inoculation, challenging civilization with a relatively survivable form of a threat and stimulating an immune response that readies the world to deal with the existential variety of the threat. These “shock’em-into-reacting” arguments advocate letting something bad happen in the hope that it will galvanize a public reaction. We mention them here not to endorse them, but as a way to introduce the idea of (what we will term) “second-guessing arguments.” Such arguments maintain that by treating others as irrational and playing to their biases and misconceptions it is possible to elicit a response from them that is more competent than if a case had been presented honestly and forthrightly to their rational faculties. It may seem unfeasibly difficult to use the kind of stratagems recommended by second-guessing arguments to achieve long-term global goals.
Hastening or delaying the onset of the intelligence explosion is not the only channel through which the rate of hardware progress can affect existential risk. Another channel is that hardware can to some extent substitute for software; thus, better hardware reduces the minimum skill required to code a seed AI. Fast computers might also encourage the use of approaches that rely more heavily on brute-force techniques (such as genetic algorithms and other generate-evaluate-discard methods) and less on techniques that require deep understanding to use. If brute-force techniques lend themselves to more anarchic or imprecise system designs, where the control problem is harder to solve than in more precisely engineered and theoretically controlled systems, this would be another way in which faster computers would increase the existential risk. Another consideration is that rapid hardware progress increases the likelihood of a fast takeoff. The more rapidly the state of the art advances in the semiconductor industry, the fewer the person-hours of programmers’ time spent exploiting the capabilities of computers at any given performance level. This means that an intelligence explosion is less likely to be initiated at the lowest level of hardware performance at which it is feasible. An intelligence explosion is thus more likely to be initiated when hardware has advanced significantly beyond the minimum level at which the eventually successful programming approach could first have succeeded.
But let us assume, for the sake of argument, that we actually achieve whole brain emulation (WBE). Would this be safer than AI? This, itself, is a complicated issue. There are at least three putative advantages of WBE: (i) that its performance characteristics would be better understood than those of AI; (ii) that it would inherit human motives; and (iii) that it would result in a slower takeoff.
The race dynamic could spur projects to move faster toward superintelligence while reducing investment in solving the control problem. Additional detrimental effects of the race dynamic are also possible, such as direct hostilities between competitors. Suppose that two nations are racing to develop the first superintelligence, and that one of them is seen to be pulling ahead. In a winner-takes-all situation, a lagging project might be tempted to launch a desperate strike against its rival rather than passively await defeat. Anticipating this possibility, the frontrunner might be tempted to strike preemptively. If the antagonists are powerful states, the clash could be bloody. (A “surgical strike” against the rival’s AI project might risk triggering a larger confrontation and might in any case not be feasible if the host country has taken precautions.)
This last statement must be flanked by two important qualifications. The first is that many people care about rank. If multiple agents each wants to top the Forbes rich list, then no resource pie is large enough to give everybody full satisfaction. The second qualification is that the post-transition technology base would enable material resources to be converted into an unprecedented range of products, including some goods that are not currently available at any price even though they are highly valued by many humans. A billionaire does not live a thousand times longer than a millionaire. In the era of digital minds, however, the billionaire could afford a thousandfold more computing power and could thus enjoy a thousandfold longer subjective lifespan. Mental capacity, likewise, could be for sale. In such circumstances, with economic capital convertible into vital goods at a constant rate even for great levels of wealth, unbounded greed would make more sense than it does in today’s world where the affluent (those among them lacking a philanthropic heart) are reduced to spending their riches on airplanes, boats, art collections, or a fourth and a fifth residence.
Philosophy covers some problems that are relevant to existential risk mitigation—we encountered several in this book. Yet there are also subfields within philosophy that have no apparent link to existential risk or indeed any practical concern. As with pure mathematics, some of the problems that philosophy studies might be regarded as intrinsically important, in the sense that humans have reason to care about them independently of any practical application. The fundamental nature of reality, for instance, might be worth knowing about, for its own sake. The world would arguably be less glorious if nobody studied metaphysics, cosmology, or string theory. However, the dawning prospect of an intelligence explosion shines a new light on this ancient quest for wisdom.
Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. Such is the mismatch between the power of our plaything and the immaturity of our conduct. Superintelligence is a challenge for which we are not ready now and will not be ready for a long time.
For a child with an undetonated bomb in its hands, a sensible thing to do would be to put it down gently, quickly back out of the room, and contact the nearest adult. Yet what we have here is not one child but many, each with access to an independent trigger mechanism. The chances that we will all find the sense to put down the dangerous stuff seem almost negligible. Some little idiot is bound to press the ignite button just to see what happens.
Yet let us not lose track of what is globally significant. Through the fog of everyday trivialities, we can perceive—if but dimly—the essential task of our age. In this book, we have attempted to discern a little more feature in what is otherwise still a relatively amorphous and negatively defined vision—one that presents as our principal moral priority (at least from an impersonal and secular perspective) the reduction of existential risk and the attainment of a civilizational trajectory that leads to a compassionate and jubilant use of humanity’s cosmic endowment.
Human Enhancement and Evolution
then become susceptible to selection.41 Embryo selection does not require a deep understanding of the causal pathways by which genes, in complicated interplay with environments, produce phenotypes: it requires only (lots of) data on the genetic correlates of the traits of interest.
Table 5 Maximum IQ gains from selecting among a set of embryos Selection IQ points gained 1 in 2 4.2 1 in 10 11.5 1 in 100 18.8 1 in 1000 24.3 5 generations of 1 in 10 < 65 (b/c diminishing returns) 10 generations of 1 in 10 < 130 (b/c diminishing returns) Cumulative limits (additive variants optimized for cognition) 100 + (< 300 (b/c diminishing returns)) Interestingly, the diminishment of returns is greatly abated when the selection is spread over multiple generations. Thus, repeatedly selecting the top 1 in 10 over ten generations (where each new generation consists of the offspring of those selected in the previous generation) will produce a much greater increase in the trait value than a one-off selection of 1 in 100. The problem with sequential selection, of course, is that it takes longer.
With stem cell-derived gametes, the amount of selection power available to a couple could be greatly increased. In current practice, an in vitro fertilization procedure typically involves the creation of fewer than ten embryos. With stem cell-derived gametes, a few donated cells might be turned into a virtually unlimited number of gametes that could be combined to produce embryos, which could then be genotyped or sequenced, and the most promising one chosen for implantation. Depending on the cost of preparing and screening each individual embryo, this technology could yield a severalfold increase in the selective power available to couples using in vitro fertilization. More importantly still, stem cell-derived gametes would allow multiple generations of selection to be compressed into less than a human maturation period, by enabling iterated embryo selection. This is a procedure that would consist of the following steps: 1 Genotype and select a number of embryos that are higher in desired genetic characteristics. 2 Extract stem cells from those embryos and convert them to sperm and ova, maturing within six months or less. 3 Cross the new sperm and ova to produce embryos. 4 Repeat until large genetic changes have been accumulated. In this manner, it would be possible to accomplish ten or more generations of selection in just a few years. (The procedure would be time-consuming and expensive; however, in principle, it would need to be done only once rather than repeated for each birth. The cell lines established at the end of the procedure could be used to generate very large numbers of enhanced embryos.) As Table 5 indicates, the average level of intelligence among individuals conceived in this manner could be very high, possibly equal to or somewhat above that of the most intelligent individual in the historical human population. A world that had a large population of such individuals might (if it had the culture, education, communications infrastructure, etc., to match) constitute a collective superintelligence. The impact of this technology will be dampened and delayed by several factors. There is the unavoidable maturational lag while the finally selected embryos grow into adult human beings: at least twenty years before an enhanced child reaches full productivity, longer still before such children come to constitute a substantial segment of the labor force. Furthermore, even after the technology has been perfected, adoption rates will probably start out low. Some countries might prohibit its use altogether, on moral or religious grounds. Even where selection is allowed, many couples will prefer the natural way of conceiving. Willingness to use IVF, however, would increase if there were clearer benefits associated with the procedure—such as a virtual guarantee that the child would be highly talented and free from genetic predispositions to disease. Lower health care costs and higher expected lifetime earnings would also argue in favor of genetic selection.
With further advances in genetic technology, it may become possible to synthesize genomes to specification, obviating the need for large pools of embryos. DNA synthesis is already a routine and largely automated biotechnology, though it is not yet feasible to synthesize an entire human genome that could be used in a reproductive context (not least because of still-unresolved difficulties in getting the epigenetics right). But once this technology has matured, an embryo could be designed with the exact preferred combination of genetic inputs from each parent. Genes that are present in neither of the parents could also be spliced in, including alleles that are present with low frequency in the population but which may have significant positive effects on cognition.
With the full development of the genetic technologies described above (setting aside the more exotic possibilities such as intelligence in cultured neural tissue), it might be possible to ensure that new individuals are on average smarter than any human who has yet existed, with peaks that rise higher still. The potential of biological enhancement is thus ultimately high, probably sufficient for the attainment of at least weak forms of superintelligence. This should not be surprising. After all, dumb evolutionary processes have dramatically amplified the intelligence in the human lineage even compared with our close relatives the great apes and our own humanoid ancestors; and there is no reason to suppose Homo sapiens to have reached the apex of cognitive effectiveness attainable in a biological system.
Far from being the smartest possible biological species, we are probably better thought of as the stupidest possible biological species capable of starting a technological civilization—a niche we filled because we got there first, not because we are in any sense optimally adapted to it.
(1) at least weak forms of superintelligence are achievable by means of biotechnological enhancements; (2) the feasibility of cognitively enhanced humans adds to the plausibility that advanced forms of machine intelligence are feasible—because even if we were fundamentally unable to create machine intelligence (which there is no reason to suppose), machine intelligence might still be within reach of cognitively enhanced humans; and (3) when we consider scenarios stretching significantly into the second half of this century and beyond, we must take into account the probable emergence of a generation of genetically enhanced populations—voters, inventors, scientists—with the magnitude of enhancement escalating rapidly over subsequent decades.
Most of the potential benefits that brain implants could provide in healthy subjects could be obtained at far less risk, expense, and inconvenience by using our regular motor and sensory organs to interact with computers located outside of our bodies. We do not need to plug a fiber optic cable into our brains in order to access the Internet. Not only can the human retina transmit data at an impressive rate of nearly 10 million bits per second, but it comes pre-packaged with a massive amount of dedicated wetware, the visual cortex, that is highly adapted to extracting meaning from this information torrent and to interfacing with other brain areas for further processing. Even if there were an easy way of pumping more information into our brains, the extra data inflow would do little to increase the rate at which we think and learn unless all the neural machinery necessary for making sense of the data were similarly upgraded. Since this includes almost all of the brain, what would really be needed is a “whole brain prosthesis–—which is just another way of saying artificial general intelligence.
In some domains, quantity is a poor substitute for quality. One solitary genius working out of a cork-lined bedroom can write In Search of Lost Time. Could an equivalent masterpiece be produced by recruiting an office building full of literary hacks?15 Even within the range of present human variation we see that some functions benefit greatly from the labor of one brilliant mastermind as opposed to the joint efforts of myriad mediocrities. If we widen our purview to include superintelligent minds, we must countenance a likelihood of there being intellectual problems solvable only by superintelligence and intractable to any ever-so-large collective of non-augmented humans.
Consider, first, that many of the costly displays we find in nature are linked to sexual selection.32 Reproduction among technologically mature life forms, in contrast, may be predominantly or exclusively asexual.
However, cognitive enhancement could also hasten
Macro-structural development accelerator—A lever that accelerates the rate at which macro-structural features of the human condition develop, while leaving unchanged the rate at which micro-level human affairs unfold.
For most of our species’ existence, macro-structural development was slower than it is now. Fifty thousand years ago, an entire millennium might have elapsed without a single significant technological invention, without any noticeable increase in human knowledge and understanding, and without any globally meaningful political change. On a micro-level, however, the kaleidoscope of human affairs churned at a reasonable rate, with births, deaths, and other personally and locally significant events. The average person’s day might have been more action-packed in the Pleistocene than it is today.
Cognitive Architecture and Intelligence
An artificial intelligence need not much resemble a human mind. AIs could be—indeed, it is likely that most will be—extremely alien. We should expect that they will have very different cognitive architectures than biological intelligences, and in their early stages of development they will have very different profiles of cognitive strengths and weaknesses (though, as we shall later argue, they could eventually overcome any initial weakness). Furthermore, the goal systems of AIs could diverge radically from those of human beings. There is no reason to expect a generic AI to be motivated by love or hate or pride or other such common human sentiments: these complex adaptations would require deliberate expensive effort to recreate in AIs. This is at once a big problem and a big opportunity.
First, a sufficiently detailed scan of a particular human brain is created. This might involve stabilizing the brain post-mortem through vitrification (a process that turns tissue into a kind of glass). A machine could then dissect the tissue into thin slices, which could be fed into another machine for scanning, perhaps by an array of electron microscopes. Various stains might be applied at this stage to bring out different structural and chemical properties. Many scanning machines could work in parallel to process multiple brain slices simultaneously. Second, the raw data from the scanners is fed to a computer for automated image processing to reconstruct the three-dimensional neuronal network that implemented cognition in the original brain. In practice, this step might proceed concurrently with the first step to reduce the amount of high-resolution image data stored in buffers. The resulting map is then combined with a library of neurocomputational models of different types of neurons or of different neuronal elements (such as particular kinds of synaptic connectors). Figure 4 shows some results of scanning and image processing produced with present-day technology. In the third stage, the neurocomputational structure resulting from the previous step is implemented on a sufficiently powerful computer. If completely successful, the result would be a digital reproduction of the original intellect, with memory and personality intact. The emulated human mind now exists as software on a computer. The mind can either inhabit a virtual reality or interface with the external world by means of robotic appendages.
The whole brain emulation path does not require that we figure out how human cognition works or how to program an artificial intelligence. It requires only that we understand the low-level functional characteristics of the basic computational elements of the brain. No fundamental conceptual or theoretical breakthrough is needed for whole brain emulation to succeed. Whole brain emulation does, however, require some rather advanced enabling technologies. There are three key prerequisites: (1) scanning: high-throughput microscopy with sufficient resolution and detection of relevant properties; (2) translation: automated image analysis to turn raw scanning data into an interpreted three-dimensional model of relevant neurocomputational elements; and (3) simulation: hardware powerful enough to implement the resultant computational structure (see Table 4). (In comparison with these more challenging steps, the construction of a basic virtual reality or a robotic embodiment with an audiovisual input channel and some simple output channel is relatively easy. Simple yet minimally adequate I/O seems feasible already with present technology.
In general, the worse our scanning equipment and the feebler our computers, the less we could rely on simulating low-level chemical and electrophysiological brain processes, and the more theoretical understanding would be needed of the computational architecture that we are seeking to emulate in order to create more abstract representations of the relevant functionalities. Conversely, with sufficiently advanced scanning technology and abundant computing power, it might be possible to brute-force an emulation even with a fairly limited understanding of the brain. In the unrealistic limiting case, we could imagine emulating a brain at the level of its elementary particles using the quantum mechanical Schrödinger equation. Then one could rely entirely on existing knowledge of physics and not at all on any biological model. This extreme case, however, would place utterly impracticable demands on computational power and data acquisition.
To assess the feasibility of whole brain emulation, one must understand the criterion for success. The aim is not to create a brain simulation so detailed and accurate that one could use it to predict exactly what would have happened in the original brain if it had been subjected to a particular sequence of stimuli. Instead, the aim is to capture enough of the computationally functional properties of the brain to enable the resultant emulation to perform intellectual work. For this purpose, much of the messy biological detail of a real brain is irrelevant.
For example, one could distinguish among (1) a high-fidelity emulation that has the full set of knowledge, skills, capacities, and values of the emulated brain; (2) a distorted emulation whose dispositions are significantly non-human in some ways but which is mostly able to do the same intellectual labor as the emulated brain; and (3) a generic emulation (which might also be distorted) that is somewhat like an infant, lacking the skills or memories that had been acquired by the emulated adult brain but with the capacity to learn most of what a normal human can learn.
Consider the humble model organism Caenorhabditis elegans, which is a transparent roundworm, about 1 mm in length, with 302 neurons. The complete connectivity matrix of these neurons has been known since the mid-1980s, when it was laboriously mapped out by means of slicing, electron microscopy, and hand-labeling of specimens. But knowing merely which neurons are connected with which is not enough. To create a brain emulation one would also need to know which synapses are excitatory and which are inhibitory; the strength of the connections; and various dynamical properties of axons, synapses, and dendritic trees. This information is not yet available even for the small nervous system of C. elegans (although it may now be within range of a targeted moderately sized research project). Success at emulating a tiny brain, such as that of C. elegans, would give us a better view of what it would take to emulate larger brains.
But what about the dream of bypassing words altogether and establishing a connection between two brains that enables concepts, thoughts, or entire areas of expertise to be “downloaded” from one mind to another? We can download large files to our computers, including libraries with millions of books and articles, and this can be done over the course of seconds: could something similar be done with our brains? The apparent plausibility of this idea probably derives from an incorrect view of how information is stored and represented in the brain. As noted, the rate-limiting step in human intelligence is not how fast raw data can be fed into the brain but rather how quickly the brain can extract meaning and make sense of the data. Perhaps it will be suggested that we transmit meanings directly, rather than package them into sensory data that must be decoded by the recipient. There are two problems with this. The first is that brains, by contrast to the kinds of program we typically run on our computers, do not use standardized data storage and representation formats. Rather, each brain develops its own idiosyncratic representations of higher-level content. Which particular neuronal assemblies are recruited to represent a particular concept depends on the unique experiences of the brain in question (along with various genetic factors and stochastic physiological processes). Just as in artificial neural nets, meaning in biological neural networks is likely represented holistically in the structure and activity patterns of sizeable overlapping regions, not in discrete memory cells laid out in neat arrays.It would therefore not be possible to establish a simple mapping between the neurons in one brain and those in another in such a way that thoughts could automatically slide over from one to the other. In order for the thoughts of one brain to be intelligible to another, the thoughts need to be decomposed and packaged into symbols according to some shared convention that allows the symbols to be correctly interpreted by the receiving brain. This is the job of language.
Suppose that the brain’s plasticity were such that it could learn to detect patterns in some new input stream arbitrary projected onto some part of the cortex by means of a brain–computer interface: why not project the same information onto the retina instead, as a visual pattern, or onto the cochlea as sounds? The low-tech alternative avoids a thousand complications, and in either case the brain could deploy its pattern-recognition mechanisms and plasticity to learn to make sense of the information.
Speed superintelligence A speed superintelligence is an intellect that is just like a human mind but faster. This is conceptually the easiest form of superintelligence to analyze.1 We can define speed superintelligence as follows: Speed superintelligence: A system that can do all that a human intellect can do, but much faster.
The simplest example of speed superintelligence would be a whole brain emulation running on fast hardware. An emulation operating at a speed of ten thousand times that of a biological brain would be able to read a book in a few seconds and write a PhD thesis in an afternoon. With a speedup factor of a million, an emulation could accomplish an entire millennium of intellectual work in one working day.
Because of this apparent time dilation of the material world, a speed superintelligence would prefer to work with digital objects. It could live in virtual reality and deal in the information economy. Alternatively, it could interact with the physical environment by means of nanoscale manipulators, since limbs at such small scales could operate faster than macroscopic appendages. (The characteristic frequency of a system tends to be inversely proportional to its length scale.
Collective superintelligence Another form of superintelligence is a system achieving superior performance by aggregating large numbers of smaller intelligences: Collective superintelligence: A system composed of a large number of smaller intellects such that the system’s overall performance across many very general domains vastly outstrips that of any current cognitive system.
Quality superintelligence We can distinguish a third form of superintelligence. Quality superintelligence: A system that is at least as fast as a human mind and vastly qualitatively smarter.
Nonhuman animals lack complex structured language; they are capable of no or only rudimentary tool use and tool construction; they are severely restricted in their ability to make long-term plans; and they have very limited abstract reasoning ability.
And although humanity’s complex technological civilization would be impossible without our massive advantage in collective intelligence, not all distinctly human cognitive capabilities depend on collective intelligence. Many are highly developed even in small, isolated hunter–gatherer bands.13 And many are not nearly as highly developed among highly organized nonhuman animals, such as chimpanzees and dolphins intensely trained by human instructors, or ants living in their own large and well-ordered societies. Evidently, the remarkable intellectual achievements of Homo sapiens are to a significant extent attributable to specific features of our brain architecture, features that depend on a unique genetic endowment not shared by other animals.
At most, we might say that, ceteris paribus, speed superintelligence excels at tasks requiring the rapid execution of a long series of steps that must be performed sequentially while collective superintelligence excels at tasks admitting of analytic decomposition into parallelizable sub-tasks and tasks demanding the combination of many different perspectives and skill sets. In some vague sense, quality superintelligence would be the most capable form of all, inasmuch as it could grasp and solve problems that are, for all practical purposes, beyond the direct reach of speed superintelligence and collective superintelligence.
The hardware advantages are easiest to appreciate: — Speed of computational elements. Biological neurons operate at a peak speed of about 200 Hz, a full seven orders of magnitude slower than a modern microprocessor (~ 2 GHz).19 As a consequence, the human brain is forced to rely on massive parallelization and is incapable of rapidly performing any computation that requires a large number of sequential operations.20 (Anything the brain does in under a second cannot use much more than a hundred sequential operations—perhaps only a few dozen.)
Internal communication speed. Axons carry action potentials at speeds of 120 m/s or less, whereas electronic processing cores can communicate optically at the speed of light (300,000,000 m/s). The sluggishness of neural signals limits how big a biological brain can be while functioning as a single processing unit. For example, to achieve a round-trip latency of less than 10 ms between any two elements in a system, biological brains must be smaller than 0.11 m3. An electronic system, on the other hand, could be 6.1×1017 m3, about the size of a dwarf planet: eighteen orders of magnitude larger.
Number of computational elements. The human brain has somewhat fewer than 100 billion neurons.23 Humans have about three and a half times the brain size of chimpanzees (though only one-fifth the brain size of sperm whales). The number of neurons in a biological creature is most obviously limited by cranial volume and metabolic constraints, but other factors may also be significant for larger brains (such as cooling, development time, and signal-conductance delays—see the previous point). By contrast, computer hardware is indefinitely scalable up to very high physical limits. Supercomputers can be warehouse-sized or larger, with additional remote capacity added via high-speed cables.
Storage capacity. Human working memory is able to hold no more than some four or five chunks of information at any given time.27 While it would be misleading to compare the size of human working memory directly with the amount of RAM in a digital computer, it is clear that the hardware advantages of digital intelligences will make it possible for them to have larger working memories. This might enable such minds to intuitively grasp complex relationships that humans can only fumblingly handle via plodding calculation.28 Human long-term memory is also limited, though it is unclear whether we manage to exhaust its storage capacity during the course of an ordinary lifetime—the rate at which we accumulate information is so slow. (On one estimate, the adult human brain stores about one billion bits—a couple of orders of magnitude less than a low-end smartphone) Both the amount of information stored and the speed with which it can be accessed could thus be vastly greater in a machine brain than in a biological brain.
Reliability, lifespan, sensors, etc. Machine intelligences might have various other hardware advantages. For example, biological neurons are less reliable than transistors. Since noisy computing necessitates redundant encoding schemes that use multiple elements to encode a single bit of information, a digital brain might derive some efficiency gains from the use of reliable high-precision computing elements. Brains become fatigued after a few hours of work and start to permanently decay after a few decades of subjective time; microprocessors are not subject to these limitations. Data flow into a machine intelligence could be increased by adding millions of sensors.
Editability. It is easier to experiment with parameter variations in software than in neural wetware. For example, with a whole brain emulation one could easily trial what happens if one adds more neurons in a particular cortical area or if one increases or decreases their excitability. Running such experiments in living biological brains would be far more difficult.
Duplicability. With software, one can quickly make arbitrarily many high-fidelity copies to fill the available hardware base. Biological brains, by contrast, can be reproduced only very slowly; and each new instance starts out in a helpless state, remembering nothing of what its parents learned in their lifetimes.
Memory sharing. Biological brains need extended periods of training and mentorship whereas digital minds could acquire new memories and skills by swapping data files. A population of a billion copies of an AI program could synchronize their databases periodically, so that all the instances of the program know everything that any instance learned during the previous hour.
The process of solving a jigsaw puzzle starts out simple—it is easy to find the corners and the edges. Then recalcitrance goes up as subsequent pieces are harder to fit. But as the puzzle nears completion, the search space collapses and the process gets easier again.
It is also possible that our natural tendency to view intelligence from an anthropocentric perspective will lead us to underestimate improvements in sub-human systems, and thus to overestimate recalcitrance. Eliezer Yudkowsky, an AI theorist who has written extensively on the future of machine intelligence, puts the point as follows: AI might make an apparently sharp jump in intelligence purely as the result of anthropomorphism, the human tendency to think of “village idiot” and “Einstein” as the extreme ends of the intelligence scale, instead of nearly indistinguishable points on the scale of minds-in-general. Everything dumber than a dumb human may appear to us as simply “dumb”. One imagines the “AI arrow” creeping steadily up the scale of intelligence, moving past mice and chimpanzees, with AIs still remaining “dumb” because AIs cannot speak fluent language or write science papers, and then the AI arrow crosses the tiny gap from infra-idiot to ultra-Einstein in the course of one month or some similarly short period.
An alternative way of expressing much the same idea is by saying that a system’s intellectual problem-solving capacity can be enhanced not only by making the system cleverer but also by expanding what the system knows.
Figure 8 A less anthropomorphic scale? The gap between a dumb and a clever person may appear large from an anthropocentric perspective, yet in a less parochial view the two have nearly indistinguishable minds. It will almost certainly prove harder and take longer to build a machine intelligence that has a general level of smartness comparable to that of a village idiot than to improve such a system so that it becomes much smarter than any human.
There is little point in reading an entire library if you have forgotten all about the aardvark by the time you get to the abalone. While an AI system is likely to have adequate memory capacity, emulations would inherit some of the capacity limitations of their human templates. They may therefore need architectural enhancements in order to become capable of unbounded learning.
This problem of internal coordination would not apply to an AI system that constitutes a single unified agent.
For example, a common assumption is that a superintelligent machine would be like a very clever but nerdy human being. We imagine that the AI has book smarts but lacks social savvy, or that it is logical but not intuitive and creative. This idea probably originates in observation: we look at present-day computers and see that they are good at calculation, remembering facts, and at following the letter of instructions while being oblivious to social contexts and subtexts, norms, emotions, and politics. The association is strengthened when we observe that the people who are good at working with computers tend themselves to be nerds. So it is natural to assume that more advanced computational intelligence will have similar attributes, only to a higher degree.
Eliezer Yudkowsky, as we saw in an earlier chapter, has been particularly emphatic in condemning this kind of misconception: our intuitive concepts of “smart” and “stupid” are distilled from our experience of variation over the range of human thinkers, yet the differences in cognitive ability within this human cluster are trivial in comparison to the differences between any human intellect and a superintelligence.
A system that has the intelligence amplification superpower could use it to bootstrap itself to higher levels of intelligence and to acquire any of the other intellectual superpowers that it does not possess at the outset. But using an intelligence amplification superpower is not the only way for a system to become a full-fledged superintelligence. A system that has the strategizing superpower, for instance, might use it to devise a plan that will eventually bring an increase in intelligence (e.g. by positioning the system so as to become the focus for intelligence amplification work performed by human programmers and computer science researchers).
Consider two persons who seem extremely unlike, perhaps Hannah Arendt and Benny Hill. The personality differences between these two individuals may seem almost maximally large. But this is because our intuitions are calibrated on our experience, which samples from the existing human distribution (and to some extent from fictional personalities constructed by the human imagination for the enjoyment of the human imagination). If we zoom out and consider the space of all possible minds, however, we must conceive of these two personalities as virtual clones. Certainly in terms of neural architecture, Ms. Arendt and Mr. Hill are nearly identical. Imagine their brains lying side by side in quiet repose. You would readily recognize them as two of a kind. You might even be unable to tell which brain belonged to whom. If you looked more closely, studying the morphology of the two brains under a microscope, this impression of fundamental similarity would only be strengthened: you would see the same lamellar organization of the cortex, with the same brain areas, made up of the same types of neuron, soaking in the same bath of neurotransmitters.
Intelligent search for instrumentally optimal plans and policies can be performed in the service of any goal. Intelligence and motivation are in a sense orthogonal: we can think of them as two axes spanning a graph in which each point represents a logically possible artificial agent. Some qualifications could be added to this picture. For instance, it might be impossible for a very unintelligent system to have very complex motivations. In order for it to be correct to say that an certain agent “has” a set of motivations, those motivations may need to be functionally integrated with the agent’s decision processes, something that places demands on memory, processing power, and perhaps intelligence. For minds that can modify themselves, there may also be dynamical constraints—an intelligent self-modifying mind with an urgent desire to be stupid might not remain intelligent for long. But these qualifications must not be allowed to obscure the basic point about the independence of intelligence and motivation, which we can express as follows: The orthogonality thesis Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal.
Note that the orthogonality thesis speaks not of rationality or reason, but of intelligence. By “intelligence” we here mean something like skill at prediction, planning, and means–ends reasoning in general. This sense of instrumental cognitive efficaciousness is most relevant when we are seeking to understand what the causal impact of a machine superintelligence might be. Even if there is some (normatively thick) sense of the word “rational” such that a paperclip-maximizing superintelligent agent would necessarily fail to qualify as fully rational in that sense, this would in no way preclude such an agent from having awesome faculties of instrumental reasoning, faculties which could let it have a large impact on the world.
Cognitive enhancement Improvements in rationality and intelligence will tend to improve an agent’s decision-making, rendering the agent more likely to achieve its final goals. One would therefore expect cognitive enhancement to emerge as an instrumental goal for a wide variety of intelligent agents. For similar reasons, agents will tend to instrumentally value many kinds of information.
Technological perfection An agent may often have instrumental reasons to seek better technology, which at its simplest means seeking more efficient ways of transforming some given set of inputs into valued outputs. Thus, a software agent might place an instrumental value on more efficient algorithms that enable its mental functions to run faster on given hardware.
A mathematically well-specified and foundationally elegant AI architecture might—for all its non-anthropomorphic otherness—offer greater transparency, perhaps even the prospect that important aspects of its functionality could be formally verified.
An oracle would ideally be trustworthy in the sense that we could safely assume that its answers are always accurate to the best of its ability. But even an untrustworthy oracle could be useful. We could ask such an oracle questions of a type for which it is difficult to find the answer but easy to verify whether a given answer is correct. Many mathematical problems are of this kind. If we are wondering whether a mathematical proposition is true, we could ask the oracle to produce a proof or disproof of the proposition. Finding the proof may require insight and creativity beyond our ken, but checking a purported proof’s validity can be done by a simple mechanical procedure.
The classical way of writing software requires the programmer to understand the task to be performed in sufficient detail to formulate an explicit solution process consisting of a sequence of mathematically well-defined steps expressible in code.13 (In practice, software engineers rely on code libraries stocked with useful behaviors, which they can invoke without needing to understand how the behaviors are implemented. But that code was originally created by programmers who had a detailed understanding of what they were doing.) This approach works for solving well-understood tasks, and is to credit for most software that is currently in use. It falls short, however, when nobody knows precisely how to solve all of the tasks that need to be accomplished. This is where techniques from the field of artificial intelligence become relevant.
In other experiments, evolutionary algorithms designed circuits that sensed whether the motherboard was being monitored with an oscilloscope or whether a soldering iron was connected to the lab’s common power supply. These examples illustrate how an open-ended search process can repurpose the materials accessible to it in order to devise completely unexpected sensory capabilities, by means that conventional human design-thinking is poorly equipped to exploit or even account for in retrospect.
These ready-states to which emulations would be reset would be carefully prepared and vetted. A typical short-lived emulation might wake up in a well-rested mental state that is optimized for loyalty and productivity. He remembers having graduated top of his class after many (subjective) years of intense training and selection, then having enjoyed a restorative holiday and a good night’s sleep, then having listened to a rousing motivational speech and stirring music, and now he is champing at the bit to finally get to work and to do his utmost for his employer.
Emulations can now begin to outsource increasing portions of their functionality. Why learn arithmetic when you can send your numerical reasoning task to Gauss-Modules, Inc.? Why be articulate when you can hire Coleridge Conversations to put your thoughts into words? Why make decisions about your personal life when there are certified executive modules that can scan your goal system and manage your resources to achieve your goals better than if you tried to do it yourself? Some emulations may prefer to retain most of their functionality and handle tasks themselves that could be done more efficiently by others. Those emulations would be like hobbyists who enjoy growing their own vegetables or knitting their own cardigans. Such hobbyist emulations would be less efficient; and if there is a net flow of resources from less to more efficient participants of the economy, the hobbyists would eventually lose out.
A decision maker planning to cheat might defeat such a lie-detection-based verification scheme by first issuing orders to subordinates to undertake the illicit activity and to conceal the activity even from the decision maker herself, and then subjecting herself to some procedure that erases her memory of having engaged in these machinations. Suitably targeted memory-erasure operations might well be feasible in biological brains with more advanced neurotechnology.
In any realm significantly more complicated than a game of tic-tac-toe, there are far too many possible states (and state-histories) for exhaustive enumeration to be feasible. A motivation system, therefore, cannot be specified as a comprehensive lookup table. It must instead be expressed more abstractly, as a formula or rule that allows the agent to decide what to do in any given situation.
One formal way of specifying such a decision rule is via a utility function. A utility function (as we recall from Chapter 1) assigns value to each outcome that might obtain, or more generally to each “possible world.” Given a utility function, one can define an agent that maximizes expected utility. Such an agent selects at each time the action that has the highest expected utility. (The expected utility is calculated by weighting the utility of each possible world with the subjective probability of that world being the actual world conditional on a particular action being taken.) In reality, the possible outcomes are too numerous for the expected utility of an action to be calculated exactly. Nevertheless, the decision rule and the utility function together determine a normative ideal—an optimality notion—that an agent might be designed to approximate; and the approximation might get closer as the agent gets more intelligent.1 Creating a machine that can compute a good approximation of the expected utility of the actions available to it is an AI-complete problem.2 This chapter addresses another problem, a problem that remains even if the problem of making machines intelligent is solved.
So now we face the question of how to define time. We could point to a clock and say, “Time is defined by the movements of this device”—but this could fail if the AI conjectures that it can manipulate time by moving the hands on the clock, a conjecture which would indeed be correct if “time” were given the aforesaid definition. (In a realistic case, matters would be further complicated by the fact that the relevant values are not going to be conveniently described in a letter; more likely, they would have to be inferred from observations of pre-existing structures that implicitly contain the relevant information, such as human brains.)
Historically, there are quite a few examples of AI techniques gleaned from neuroscience or biology. (For example: the McCulloch–Pitts neuron, perceptrons, and other artificial neurons and neural networks, inspired by neuroanatomical work; reinforcement learning, inspired by behaviorist psychology; genetic algorithms, inspired by evolution theory; subsumption architectures and perceptual hierarchies, inspired by cognitive science theories about motor planning and sensory perception; artificial immune systems, inspired by theoretical immunology; swarm intelligence, inspired by the ecology of insect colonies and other self-organizing systems; and reactive and behavior-based control in robotics, inspired by the study of animal locomotion.)
Think of a “discovery” as an act that moves the arrival of information from a later point in time to an earlier time.
As there is no established methodology for how to go about this kind of research, difficult original thinking is necessary.
Strategic and Policy Considerations
If communication overheads are reduced (including not only equipment costs but also response latencies, time and attention burdens, and other factors), then larger and more densely connected organizations become feasible. The same could happen if fixes are found for some of the bureaucratic deformations that warp organizational life—wasteful status games, mission creep, concealment or falsification of information, and other agency problems. Even partial solutions to these problems could pay hefty dividends for collective intelligence.
An important question, therefore, is whether national or international authorities will see an intelligence explosion coming. At present, intelligence agencies do not appear to be looking very hard for promising AI projects or other forms of potentially explosive intelligence amplification.
This suggests that in order to achieve international collaboration on some technology that is of pivotal importance for national security, it might be necessary to have built beforehand a close and trusting relationship.
Consider a vaguely analogous historical situation. The United States developed nuclear weapons in 1945. It was the sole nuclear power until the Soviet Union developed the atom bomb in 1949. During this interval—and for some time thereafter—the United States may have had, or been in a position to achieve, a decisive military advantage. The United States could then, theoretically, have used its nuclear monopoly to create a singleton. One way in which it could have done so would have been by embarking on an all-out effort to build up its nuclear arsenal and then threatening (and if necessary, carrying out) a nuclear first strike to destroy the industrial capacity of any incipient nuclear program in the USSR and any other country tempted to develop a nuclear capability. A more benign course of action, which might also have had a chance of working, would have been to use its nuclear arsenal as a bargaining chip to negotiate a strong international government—a veto-less United Nations with a nuclear monopoly and a mandate to take all necessary actions to prevent any country from developing its own nuclear weapons. Both of these approaches were proposed at the time. The hardline approach of launching or threatening a first strike was advocated by some prominent intellectuals such as Bertrand Russell (who had long been active in anti-war movements and who would later spend decades campaigning against nuclear weapons) and John von Neumann (co-creator of game theory and one of the architects of US nuclear strategy). Perhaps it is a sign of civilizational progress that the very idea of threatening a nuclear first strike today seems borderline silly or morally obscene.
Finally, there is the issue of cost. Even if the United States could have used its nuclear monopoly to establish a singleton, it might not have been able to do so without incurring substantial costs. In the case of a negotiated agreement to place nuclear weapons under the control of a reformed and strengthened United Nations, these costs might have been relatively small; but the costs—moral, economic, political, and human—of actually attempting world conquest through the waging of nuclear war would have been almost unthinkably large, even during the period of nuclear monopoly. With sufficient technological superiority, however, these costs would be far smaller. Consider, for example, a scenario in which one nation had such a vast technological lead that it could safely disarm all other nations at the press of a button, without anybody dying or being injured, and with almost no damage to infrastructure or to the environment. With such almost magical technological superiority, a first strike would be a lot more tempting. Or consider an even greater level of technological superiority which might enable the frontrunner to cause other nations to voluntarily lay down their arms, not by threatening them with destruction but simply by persuading a great majority of their populations by means of an extremely effectively designed advertising and propaganda campaign extolling the virtues of global unity.
Resource acquisition Finally, resource acquisition is another common emergent instrumental goal, for much the same reasons as technological perfection: both technology and resources facilitate physical construction projects.
One obstacle is the difficulty of ensuring compliance with any treaty that might be agreed, including monitoring and enforcement costs. Two nuclear rivals might each be better off if they both relinquished their atom bombs; yet even if they could reach an in-principle agreement to do so, disarmament could nevertheless prove elusive because of their mutual fear that the other party might cheat. Allaying this fear would require setting up a verification mechanism. There may have to be inspectors to oversee the destruction of existing stockpiles, and then to monitor nuclear reactors and other facilities, and to gather technical and human intelligence, in order to ensure that the weapons program is not reconstituted. One cost is paying for these inspectors. Another cost is the risk that the inspectors will spy and make off with commercial or military secrets. Perhaps most significantly, each party might fear that the other will preserve a clandestine nuclear capability. Many a potentially beneficial deal never comes off because compliance would be too difficult to verify. If new inspection technologies that reduced monitoring costs became available, one would expect this to result in increased cooperation.
The availability of powerful precommitment techniques could profoundly alter the nature of negotiations, potentially giving an immense edge to an agent that has a first-mover advantage. If a particular agent’s participation is necessary for the realization of some prospective gains from cooperation, and if that agent is able to make the first move, it would be in a position to dictate the division of the spoils by precommitting not to accept any deal that gives it less than, say, 99% of the surplus value. Other agents would then be faced with the choice of either getting nothing (by rejecting the unfair proposal) or getting 1% of the value (by caving in). If the first-moving agent’s precommitment is publicly verifiable, its negotiating partners could be sure that these are their only two options. To avoid being exploited in this manner, agents might precommit to refuse blackmail and to decline all unfair offers. Once such a precommitment has been made (and successfully publicized), other agents would not find it in their interest to make threats or to precommit themselves to only accepting deals tilted in their own favor, because they would know that threats would fail and that unfair proposals would be rejected. But this just demonstrates again that the advantage is with the first-mover. The agent who moves first can choose whether to parlay its position of strength only to deter others from taking unfair advantage, or to make a grab for the lion’s share of future spoils.
One motivation for the CEV proposal was to avoid creating a motive for humans to fight over the creation of the first superintelligent AI.
If some technology is feasible (the argument goes) it will be developed regardless of any particular policymaker’s scruples about speculative future risks. Indeed, the more powerful the capabilities that a line of development promises to produce, the surer we can be that somebody, somewhere, will be motivated to pursue it. Funding cuts will not stop progress or forestall its concomitant dangers.
Is it good if teams know about their positions in the race (knowing their capability scores, for instance)? Here, opposing factors are at play. It is desirable that a leader knows it is leading (so that it knows it has some margin for additional safety precautions). Yet it is undesirable that a laggard knows it has fallen behind (since this would confirm that it must cut back on safety to have any hope of catching up). While intuitively it may seem this trade-off could go either way, the models are unequivocal: information is (in expectation) bad. Figures 14a and 14b each plot three scenarios: the straight lines correspond to situations in which no team knows any of the capability scores, its own included. The dashed lines show situations where each team knows its own capability only. (In those situations, a team takes extra risk only if its capability is low.) And the dotted lines show what happens when all teams know each other’s capabilities. (They take extra risks if their capability scores are close to one another.) With each increase in information level, the race dynamic becomes worse.
This reflection suggests a strategy of deferred gratification. We could postpone work on some of the eternal questions for a little while, delegating that task to our hopefully more competent successors—in order to focus our own attention on a more pressing challenge: increasing the chance that we will actually have competent successors. This would be high-impact philosophy and high-impact mathematics.
Moral Philosophy and Ethics
Pre-implantation genetic diagnosis has already been used during in vitro fertilization procedures to screen embryos produced for monogenic disorders such as Huntington’s disease and for predisposition to some late-onset diseases such as breast cancer. It has also been used for sex selection and for matching human leukocyte antigen type with that of a sick sibling, who can then benefit from a cord-blood stem cell donation when the new baby is born.
Delays could also result from obstacles rooted not in a fear of failure (demand for safety testing) but in fear of success—demand for regulation driven by concerns about the moral permissibility of genetic selection or its wider social implications. Such concerns are likely to be more influential in some countries than in others, owing to differing cultural, historical, and religious contexts. Post-war Germany, for example, has chosen to give a wide berth to any reproductive practices that could be perceived to be even in the remotest way aimed at enhancement, a stance that is understandable given the particularly dark history of atrocities connected to the eugenics movement in that country. Other Western countries are likely to take a more liberal approach. And some countries—perhaps China or Singapore, both of which have long-term population policies—might not only permit but actively promote the use of genetic selection and genetic engineering to enhance the intelligence of their populations once the technology to do so is available.
However, setting aside the question of how modernity’s shortcomings stack up against the not-so-inconsiderable failings of earlier epochs, nothing in our definition of collective superintelligence implies that a society with greater collective intelligence is necessarily better off. The definition does not even imply that the more collectively intelligent society is wiser.
Mind crime Another failure mode for a project, especially a project whose interests incorporate moral considerations, is what we might refer to as mind crime. This is similar to infrastructure profusion in that it concerns a potential side effect of actions undertaken by the AI for instrumental reasons. But in mind crime, the side effect is not external to the AI; rather, it concerns what happens within the AI itself (or within the computational processes it generates). This failure mode deserves its own designation because it is easy to overlook yet potentially deeply problematic. Normally, we do not regard what is going on inside a computer as having any moral significance except insofar as it affects things outside. But a machine superintelligence could create internal processes that have moral status. For example, a very detailed simulation of some actual or hypothetical human mind might be conscious and in many ways comparable to an emulation. One can imagine scenarios in which an AI creates trillions of such conscious simulations, perhaps in order to improve its understanding of human psychology and sociology. These simulations might be placed in simulated environments and subjected to various stimuli, and their reactions studied. Once their informational usefulness has been exhausted, they might be destroyed (much as lab rats are routinely sacrificed by human scientists at the end of an experiment). If such practices were applied to beings that have high moral status—such as simulated humans or many other types of sentient mind—the outcome might be equivalent to genocide and thus extremely morally problematic. The number of victims, moreover, might be orders of magnitude larger than in any genocide in history.
“everything is vague to a degree you do not realize till you have tried to make it precise.”
How is the robot to balance a large risk of a few humans coming to harm versus a small risk of many humans being harmed? How do we define “harm” anyway? How should the harm of physical pain be weighed against the harm of architectural ugliness or social injustice? Is a sadist harmed if he is prevented from tormenting his victim? How do we define “human being”? Why is no consideration given to other morally considerable beings, such as sentient nonhuman animals and digital minds? The more one ponders, the more the questions proliferate.
First, if a free worker in a Malthusian state gets paid a subsistence-level wage, he will have no disposable income left after he has paid for food and other necessities. If the worker is instead a slave, his owner will pay for his maintenance and again he will have no disposable income. In either case, the worker gets the necessities and nothing more.
The image of evolution as a process that reliably produces benign effects is difficult to reconcile with the enormous suffering that we see in both the human and the natural world. Those who cherish evolution’s achievements may do so more from an aesthetic than an ethical perspective. Yet the pertinent question is not what kind of future it would be fascinating to read about in a science fiction novel or to see depicted in a nature documentary, but what kind of future it would be good to live in: two very different matters.
There is a further problem: The total amount of suffering per year in the natural world is beyond all decent contemplation. During the minute that it takes me to compose this sentence, thousands of animals are being eaten alive, others are running for their lives, whimpering with fear, others are being slowly devoured from within by rasping parasites, thousands of all kinds are dying of starvation, thirst and disease. Even just within our species, 150,000 persons are destroyed each day while countless more suffer an appalling array of torments and deprivations. Nature might be a great experimentalist, but one who would never pass muster with an ethics review board—contravening the Helsinki Declaration and every norm of moral decency, left, right, and center. It is important that we not gratuitously replicate such horrors in silico. Mind crime seems especially difficult to avoid when evolutionary methods are used to produce human-like intelligence, at least if the process is meant to look anything like actual biological evolution.
One could argue that whole brain emulation research is less likely to involve moral violations than artificial intelligence research, on the grounds that we are more likely to recognize when an emulation mind qualifies for moral status than we are to recognize when a completely alien or synthetic mind does so. If certain kinds of AIs, or their subprocesses, have a significant moral status that we fail to recognize, the consequent moral violations could be extensive. Consider, for example, the happy abandon with which contemporary programmers create reinforcement-learning agents and subject them to aversive stimuli. Countless such agents are created daily, not only in computer science laboratories but in many applications, including some computer games containing sophisticated non-player characters. Presumably, these agents are still too primitive to have any moral status. But how confident can we really be that this is so? More importantly, how confident can we be that we will know to stop in time, before our programs become capable of experiencing morally relevant suffering?
For example, consider the (unusually simple) consequentialist theory of hedonism. This theory states, roughly, that all and only pleasure has value, and all and only pain has disvalue. Even if we placed all our moral chips on this one theory, and the theory turned out to be right, a great many questions would remain open. Should “higher pleasures” be given priority over “lower pleasures,” as John Stuart Mill argued? How should the intensity and duration of a pleasure be factored in? Can pains and pleasures cancel each other out? What kinds of brain states are associated with morally relevant pleasures? Would two exact copies of the same brain state correspond to twice the amount of pleasure? Can there be subconscious pleasures? How should we deal with extremely small chances of extremely great pleasures?
An individual might have a second-order desire (a desire concerning what to desire) that some of her first-order desires not be given weight when her volition is extrapolated. For example, an alcoholic who has a first-order desire for booze might also have a second-order desire not to have that first-order desire. Similarly, we might have desires over how various other parts of the extrapolation process should unfold, and these should be taken into account by the extrapolation process.
One might still worry that this moral permissibility model (MP) represents an unpalatably high degree of respect for the requirements of morality. How big a sacrifice it would entail depends on which ethical theory is true. If ethics is satisficing, in the sense that it counts as morally permissible any action that conforms to a few basic moral constraints, then MP may leave ample room for our coherent extrapolated volition to influence the AI’s actions. However, if ethics is maximizing—for example, if the only morally permissible actions are those that have the morally best consequences—then MP may leave little or no room for our own preferences to shape the outcome.
Suppose that this ethical theory is true, and that the AI knows it to be so. For present purposes, we can define hedonistic consequentialism as the claim that an action is morally right (and morally permissible) if and only if, among all feasible actions, no other action would produce a greater balance of pleasure over suffering. The AI, following MP, might maximize the surfeit of pleasure by converting the accessible universe into hedonium, a process that may involve building computronium and using it to perform computations that instantiate pleasurable experiences. Since simulating any existing human brain is not the most efficient way of producing pleasure, a likely consequence is that we all die.
There may, however, be a moral case for de-emphasizing or refraining from second-guessing moves. Trying to outwit one another looks like a zero-sum game—or negative-sum, when one considers the time and energy that would be dissipated by the practice as well as the likelihood that it would make it generally harder for anybody to discover what others truly think and to be trusted when expressing their own opinions.
Note also that the larger the successful collaboration is, the lower the costs to it of extending the benefits to all outsiders. (For instance, if 90% of all people were already inside the collaboration, it would cost them no more than 10% of their holdings to bring all outsiders up to their own level.) It is thus plausible that broader collaborations would tend to lead to a wider distribution of the gains (though some projects with few sponsors might also have distributionally excellent aims). But why is a wide distribution of gains desirable? There are both moral and prudential reasons for favoring outcomes in which everybody gets a share of the bounty. We will not say much about the moral case, except to note that it need not rest on any egalitarian principle. The case might be made, for example, on grounds of fairness. A project that creates machine superintelligence imposes a global risk externality. Everybody on the planet is placed in jeopardy, including those who do not consent to having their own lives and those of their family imperiled in this way. Since everybody shares the risk, it would seem to be a minimal requirement of fairness that everybody also gets a share of the upside.
The common good principle Superintelligence should be developed only for the benefit of all of humanity and in the service of widely shared ethical ideals.
Encouraging more kindness in the world is an important and urgent problem—one, moreover, that seems quite robustly positive-value: yet absent a breakthrough idea for how to go about it, probably a problem of quite low elasticity. Achieving world peace, similarly, would be highly desirable; but considering the numerous efforts already targeting that problem, and the formidable obstacles arrayed against a quick solution, it seems unlikely that the contributions of a few extra individuals would make a large difference.
Author
Mauro Sicard
CEO & Creative Director at BRIX Agency. My main interests are tech, science and philosophy.