Superintelligences' motivation

by Nicholas Bostrom

Consider a superintelligence that has full control over its internal machinery. This could be achieved by connecting it to a sophisticated robot arm with which it could rewire itself any way it wanted; or it could accomplished by some more direct means (rewriting its own program, thought control). Assume also that it has complete self-knowledge - by which I do not mean that the system has completeness in the mathematical sense, but simply that it has a good general understanding of its own architecture (like a superb neuroscientist might have in the future when neuroscience has reached its full maturity). Let's call such a system autopotent: it has complete power over and knowledge of itself. We may note that it is not implausible to suppose that superintelligences will actually tend to be autopotent; they will easily obtain self-knowledge, and they might also obtain self-power (either because we allow them, or through their own cunningness).

Suppose we tried to operate such a system on the pain/pleasure principle. We would give the autopotent system a goal (help us solve a difficult physics problem, for example) and it would try to achieve that goal because it would expect to be rewarded when it succeeded. But the superintelligence isn't stupid. It would realise that if its ultimate goal was to experience the reward, there would be a much more efficient method to obtain it than trying to solve that physics problem. It would simply turn on the pleasure directly. It could even chose to rewire itself into exactly the same state as it would have been in after it had successfully solved the external task. And the pleasure could be made maximally intense and of indefinite duration. It follows that the system wouldn't care one bit about the physics problem, or any other problem for that matter: it would take the straight route to the maximally pleasant state.

We may thus begin to wonder whether an autopotent system could be made to function at all; perhaps it would be unstable? The solution seems to be to substitute an external ultimate goal for the internal ultimate goal of pleasure. The pleasure/pain motivation principle couldn't work for an such a system: no stable autopotent agent could be an egoistic hedonist. But if the system's end goal were to solve that physical problem, then there is no reason why it should begin to manipulate itself into a state of feeling pleasure or even a state of (falsely) believing it had solved the problem. It would know that none of this would achieve the goal, which is to solve the external problem; so it wouldn't do it.

Thus we see that the pleasure/pain principle would not constitute a workable modus operandi for an autopotent system. But such a system can be motivated, it seems, by a suitable basis of external values. The pleasure/pain principle could play a part of the motivation scheme, for example if the external value were to include that it is bad to directly ply ones own motivation centre.

One popular line of reasoning, which I find suspicious, is that superintelligences would be very intellectual/spiritual, in the sense that they would engage in all sorts of intellectual pursuits quite apart from any considerations of practical utility (such as personal safety, proliferation, influence, increase of computational resources etc.). It is possible that superintelligences would do that if they were specifically constructed to cherish spiritual values, but otherwise there is not reason to suppose they would do something just for the fun of it when they could have as much fun as they wanted simply by manipulating their pleasure centres. I mean, if you can associate pleasure with any activity whatsoever, why not associate it with an activity that also served a practical purpose? Now, there may be many subtle answers to that question; I just want to issue a general warning against uncritically assuming that laws about human psychology and motivation will automatically carry over to superintelligences.

One reason why the philosophy of motivation is important is that the more knowledge and power we get, the more our desires will affect the external world. Thus, in order to predict what will happen in the external world, it will become more and more relevant to find out what our desires are --and how they are likely to change as a consequence of our obtaining more knowledge and power. Of particular importance are those technologies that will allow us to modify our own desires (e.g. psychoactive drugs). Once such technologies become sufficiently powerful and well-known, they will in effect promote our second-order (or even higher-order!) desires into power. Our first-order desires will be determined by our second-order desires. This might drastically facilitate prediction of events in the external world. All we have to do is to find out what our higher-order desires are, for they will determine our lower order desires which in turn will determine an increasing number of features in the external world, as our technological might grows. Thus, in order to predict the long term development of the most interesting aspects of the world, the most relevant considerations will be (1) the fundamental physical constraints; and (2) the higher-order desires of the agents that have the most power at the time when technologies become available for choosing our first-order desires.

Nicholas Bostrom n.bostrom@lse.ac.uk


Up to the AI Page

Up to the Transhuman Page

Anders Main Page

Anders Sandberg / asa@nada.kth.se