Friendly Superintelligence

Anders Sandberg, Eudoxa AB

Assumptions

In order to discuss approaches to friendly superintelligence some assumptions have to be made. Different assumptions will provide noticeably different ways of handling the problem.

We need to make friendliness work in general, not just for particular AI designs. At present there is no theory of AI that can tell us which kind of AI systems would be able to achieve human-level intelligence; hence a design for friendliness based on a particular architecture would likely be irrelevant. It might turn out that there will exist many different AI designs with extremely different internals. Friendliness has to be guaranteed regardless of this.

The "hard takeoff" scenario is assumed to be unlikely. This scenario includes artificial intelligence self-improving themselves at an ever faster rate, reaching high levels of capability in a relatively short time with little interaction with the human society. Instead it will be assumed that AI will develop over time in interaction with society. This might be the most contentious assumption as several other panel members strongly support the hard takeoff scenario.

The context systems are developed in must be taken into account, we cannot use simple a priori arguments to base our analysis when dealing with this kind of complex systems. Friendliness is fundamentally about inter-entity relationships, and in general relationships between different beings show strong path dependency.

AI will be created for economic reasons, and will be involved in economic transactions with humans from the start. Ideas can of course be developed for free, but for actual development investments are needed (unlike in the hard takeoff scenario). How AI will be developed will be determined only to a minor extent by deliberate global choices and more by what technologies provide payoffs during their development. It should be noted that attention to safety issues is not just important per se, but also useful for selling AI in any case. Hence there are economical and political benefits for promoting such concerns.

AI development will not be done by a single individual, but from the start the endeavour consists of many different developers with slightly different goals.

Do we aim for no risk or acceptable risk?

As risks become smaller the cost of removing them increases with no limit. For example, the goal of reducing the number of traffic accidents can be achieved with relatively simple and cheap means (e.g. by better lightning, pavement and safety education). The goal of zero traffic accidents would need to find ways of preventing not just likely accidents (children and animals running onto the road) but also very unlikely accidents (meteor impacts). The same situation holds true for friendly AI: we seek to reduce the risks to a reasonable level, but can never reach zero risk and would in any case not be able to practically afford it. Instead we have to aim for the equilibrium where further increases in safety becomes more costly than their benefits.

The hard take-off assumption assumes that there is going to be one gamble with a single large risk, while the soft take-off implies many interactions with medium risks.

Suggested approaches to friendly AI

So far, suggested approaches to friendly AI can be roughly divided into four categories:

Internal constraints – Preprogrammed limitations, guidelines or fixed behavior patterns that ensure a certain code of behavior. The classic example is the three laws of robotics in Isaac Asimov’s stories.

Built in values or goals – Goals and values that have been made part of the core system of the AI, making them hard or impossible to remove without a total change of the AI itself. A simple example would be an artifical emotional system that loves humans, while the friendliness supergoal of Eliezer’s AI design is a more complex example.

Learned values – The AI is not in itself programmed to value humans or human values, but has the capacity to learn them. During the upbringing of the system values are deliberately taught, integrating the AI within the larger human ethical community. David Brin suggested this approach in his short story Lungfish. Given that AIs are cultural artefacts rather than evolved beings (disregarding possible uses of genetic algorithms to create AI) it seems likely that learning experiences and built in prior information will exert a stronger influence on them than on evolved intelligences such as humans, who have evolved under non-cultural selection pressures.

External constraints – Laws, economical pressure, the potential for retaliation or reward and other external factors influence the behaviour of rational agents. In a setting where the AI is not isolated from the human society or has the power to act without any fear of the consequences these factors will come into play.

Each of these approaches have their problems:

As Asimov repeatedly showed in his fiction, his laws do allow accidental unfriendly behaviour. The full consequences of a complex formal system are unknowable, and being in contact with the messy real world makes things worse. This makes designing internal constraints hard, as general constraints might show unwanted emergence and specific constraints will leave open holes..

Internal constraints and values are design solutions, but there are many designers and some might be malevolent, misguided or make mistakes. Designs compete with each other on market - a risky architecture may show greater economic potential and hence become more widespread than a safer one.

If values are learned, then they can be mis-learned. Also, even benign values often come into conflict with each other, even within the same culture.

External approaches can seldom be proven to work due to their complexity. This is not going to calm sceptics who fear that there might exist subtle loopholes in legal systems, markets or inter-agent behaviour.

At the same time there appears to be fairly strong game theoretical reasons to believe that for example economic constraints (which are relevant for any resource constrained system regardless of its intelligence) can provide surprisingly strong and flexible ways of ensuring or at least promoting friendliness.

Law of comparative advantages is one such example: trade between agents is mutually profitable even when one part is more productive than the other in every commodity that is being exchanged. The reason is that specialisation enables the more productive agent to produce more of the commodity most profitable to it, while leaving the less productive agent to produce a different commodity. In the current context this suggests that AI and humans can profit from specialisation, even when their capabilities are vastly different. If the AI emerges gradually within the human framework rather than nearly instantaneously as in the hard takeoff scenario, then it is going to become enmeshed within the human economy and the benefits of specialisation and labour division will promote a symbiotic relationship. AIs that have "grown up" within a human culture are more likely to encompass its ethics and values, and have tight economical connections with it. Defection is profitable only as long as there are no interactions that can make it unprofitable, and such interactions can be provided by legal systems (allowing sanctions of different kinds), the presence of many other entities of the roughly same level of power or groups that can exert commensurable power.

A Combination Approach for Friendly AI

Given the uncertainties on AI nature, how it will be developed and the admitted limits of any proposed constrainment scheme, a combination approach is more robust than any single scheme. The idea is to employ a layered strategy, where openings in one layer of constrainment can be deal with by at least another layer. This allows the construction of a reliable and general system out of unreliable and limited schemes.

Level 0: General guidelines for AI development, promoting concern for friendly and safe AI of significant power as well as good engineering practice. A continuous and vigorous scientific debate is needed to keep this up to date.

Level 1: Internal constraints. Different architectures will have different forms and amounts of constraints, but it is likely most designs will include at least some friendliness-promoting subsystems for simple economical reasons.

Level 2: Good rearing practices, seeking to instil human-compatible values, identify risks and integrate the AI within the human sphere.

Level 3: Setting up a legal and economical framework where friendly AIs prosper and unfriendly are inhibited.

This will not be a guarantee of friendliness, any more than current systems of instincts, emotions, upbringing, education and law guarantee human friendliness. But just like them it will be a strong force to promote friendly behaviour among our mind children.