Friendly Superintelligence

Anders Sandberg, Eudoxa AB


In order to discuss approaches to friendly superintelligence some assumptions have to be made. Different assumptions will provide noticeably different ways of handling the problem.

Do we aim for no risk or acceptable risk?

As risks become smaller the cost of removing them increases with no limit. For example, the goal of reducing the number of traffic accidents can be achieved with relatively simple and cheap means (e.g. by better lightning, pavement and safety education). The goal of zero traffic accidents would need to find ways of preventing not just likely accidents (children and animals running onto the road) but also very unlikely accidents (meteor impacts). The same situation holds true for friendly AI: we seek to reduce the risks to a reasonable level, but can never reach zero risk and would in any case not be able to practically afford it. Instead we have to aim for the equilibrium where further increases in safety becomes more costly than their benefits.

The hard take-off assumption assumes that there is going to be one gamble with a single large risk, while the soft take-off implies many interactions with medium risks.

Suggested approaches to friendly AI

So far, suggested approaches to friendly AI can be roughly divided into four categories:

Internal constraints – Preprogrammed limitations, guidelines or fixed behavior patterns that ensure a certain code of behavior. The classic example is the three laws of robotics in Isaac Asimov’s stories.

Built in values or goals – Goals and values that have been made part of the core system of the AI, making them hard or impossible to remove without a total change of the AI itself. A simple example would be an artifical emotional system that loves humans, while the friendliness supergoal of Eliezer’s AI design is a more complex example.

Learned values – The AI is not in itself programmed to value humans or human values, but has the capacity to learn them. During the upbringing of the system values are deliberately taught, integrating the AI within the larger human ethical community. David Brin suggested this approach in his short story Lungfish. Given that AIs are cultural artefacts rather than evolved beings (disregarding possible uses of genetic algorithms to create AI) it seems likely that learning experiences and built in prior information will exert a stronger influence on them than on evolved intelligences such as humans, who have evolved under non-cultural selection pressures.

External constraints – Laws, economical pressure, the potential for retaliation or reward and other external factors influence the behaviour of rational agents. In a setting where the AI is not isolated from the human society or has the power to act without any fear of the consequences these factors will come into play.

Each of these approaches have their problems:

As Asimov repeatedly showed in his fiction, his laws do allow accidental unfriendly behaviour. The full consequences of a complex formal system are unknowable, and being in contact with the messy real world makes things worse. This makes designing internal constraints hard, as general constraints might show unwanted emergence and specific constraints will leave open holes..

Internal constraints and values are design solutions, but there are many designers and some might be malevolent, misguided or make mistakes. Designs compete with each other on market - a risky architecture may show greater economic potential and hence become more widespread than a safer one.

If values are learned, then they can be mis-learned. Also, even benign values often come into conflict with each other, even within the same culture.

External approaches can seldom be proven to work due to their complexity. This is not going to calm sceptics who fear that there might exist subtle loopholes in legal systems, markets or inter-agent behaviour.

At the same time there appears to be fairly strong game theoretical reasons to believe that for example economic constraints (which are relevant for any resource constrained system regardless of its intelligence) can provide surprisingly strong and flexible ways of ensuring or at least promoting friendliness.

Law of comparative advantages is one such example: trade between agents is mutually profitable even when one part is more productive than the other in every commodity that is being exchanged. The reason is that specialisation enables the more productive agent to produce more of the commodity most profitable to it, while leaving the less productive agent to produce a different commodity. In the current context this suggests that AI and humans can profit from specialisation, even when their capabilities are vastly different. If the AI emerges gradually within the human framework rather than nearly instantaneously as in the hard takeoff scenario, then it is going to become enmeshed within the human economy and the benefits of specialisation and labour division will promote a symbiotic relationship. AIs that have "grown up" within a human culture are more likely to encompass its ethics and values, and have tight economical connections with it. Defection is profitable only as long as there are no interactions that can make it unprofitable, and such interactions can be provided by legal systems (allowing sanctions of different kinds), the presence of many other entities of the roughly same level of power or groups that can exert commensurable power.

A Combination Approach for Friendly AI


Given the uncertainties on AI nature, how it will be developed and the admitted limits of any proposed constrainment scheme, a combination approach is more robust than any single scheme. The idea is to employ a layered strategy, where openings in one layer of constrainment can be deal with by at least another layer. This allows the construction of a reliable and general system out of unreliable and limited schemes.

This will not be a guarantee of friendliness, any more than current systems of instincts, emotions, upbringing, education and law guarantee human friendliness. But just like them it will be a strong force to promote friendly behaviour among our mind children.