Andart: Too Simple to be Safe

July 15, 2004

Too Simple to be Safe

The Singularity Institute has started a site exploring the problems with Isaac Asimov's three laws of robotics, 3 Laws Unsafe.

That the three laws are insufficient to guarantee robot behavior should be obvious to anybody who has read Asimov's stories. Usually the main plot is about misbehaving robots and the mystery is why - rather than being "whodunnist" they are "howthinkits". But how complex do the rules of robot behavior have to be before we can consider them safe?

Is the 3 Laws Unsafe site necessary? To some extent it is just timed advertising, with the film based on I, Robot arriving. The real goal is not to push the thesis that the 3 laws are bad, but to interest a wider public to get into discussions of AI ethics. That is very laudable in itself. But I think one should not underestimate the misconceptions about AI programming, and that pointing out the complex problems of simple solutions may be necessary.

To many people "computers just do what they are told" is an article of faith. As personal computers become more widespread the fallacy of this statement becomes apparent: computers crash, refuse to print documents, update their software and occasionally exhibit remarkably strange behavior. Often these actions are the result of complex interactions between pre-existing software modules, where human ingenuity couldn't predict the outcome of the particular combination in a particular computer at a given time. But despite this, people often seem to think that creating artificial intelligence would produce something that one could give a set of rules and have it follow slavishly.

In Asimov's stories this is the case, and the result is chaos anyway. In reality things will be even worse: beside ambiguities in the rules and how they are to be applied, there will be errors in cognition, perception and execution of actions. And of course low-level crosstalk and software bugs too. And this is just in the case of an ordinarily intelligent machine. The self-enhancing AIs envisioned by the Singularity Institute will have far more degrees of freedom to train against reality (and hence able to get wrong), and the number of potential non-obvious interactions increases exponentially.

So what to do about it? The idea I really like about friendly AI is the attempt to formulate a goal architecture that is robust. If something goes wrong the system tries to adjust itself to make things better. It is no guarantee that it works, but experience shows that some systems are far less brittle than others.

Real intelligence exhibits several important traits: it interacts with its world, it is able to learn new behaviors (or unlearn old) and it can solve new problems using earlier information, heuristics and strategies. The learning aspect enables us to speak about the ethics of an AI program: how does it live up to its own goals, the goals of others and perhaps universal virtues. Asimovian AI was limited to interaction and problem-solving in most situations involving the three laws. It was in a very strong sense amoral: it could not act "immorally" and was hence no better than the protagonist of A Clockwork Orange after being treated. A learning agent on the other hand might have less strong barriers against dangerous behaviors, but would be able to learn to act well (under the right circumstances) and generalize these experiences anew.

If the AIs communicate with each other they might even transfer these moral experiences, enabling AIs not exposed to the critical situations to handle them as they arrive. We humans do it all the time through our books, films and stories: I may not have encountered a situation where I discover that my government is acting immorally and I have to choose between remaining comfortably silent or taking possible illegal action to change things, but I have read numerous fictional and real versions of the scenario that have given me at least some crude moral training.

Learning is also the key to robustness. Software that adapts to an uncertain outer and inner environment is more likely to function when an error actually occurs (as witnessed by the resiliency of neural network architectures) than fixed rules. To some extent this is again the difference between laws (fixed rules) and moral principles (goals).

But learning never occurs in a vacuum. The Bias-Variance Dilemma shows that any learning system has a tradeoff between being general (no bias) and requiring as little training as possible (low variance). A "pure AI" that has no preconceptions about anything would require a tremendous amount of training examples (upbringing) to become able to think usefully. A heavily biased AI with many built-in assumptions (reality is 3+1 dimensional, gravity exists, it is bad to bump into things and especially humans...) would need far less upbringing but would likely exhibit many strange or inflexible behaviors when the biases interacted. In many ways Asimovian AI is a pure AI with heavy "moral" biases (which is why learning/adaptation is so irrelevant to the intended use of the three laws).

Living beings have solved the bias variance dilemma by cheating: we get a lot of pre-packaged biases that are the result of evolutionary learning. When the baby cries when it is hungry it automatically signals the mother to come rather than try to learn what actions would produce relief from hunger. When the baby wrinkles its face against bitter tastes and enjoys sweetness, it uses a bias laid down by countless of generations encountering often poisonous bitter alkaloids and energy-rich (and hence fitness enhancing) sweet fruits. We benefit from the price paid by trillions of creatures that were selected away by evolution's ruthless hand.

A robot will likely benefit a bit from this too, as we humans try to act as its evolutionary past and throw in useful biases. But balancing the prior information with the ability to re-learn as conditions change is a challenge. It requires different levels of flexibility in different environments, and meta-flexibility to detect what kind of new environment one has entered and how to change the flexibility. It seems likely that it is not possible to find an optimal level of flexibility in general (as a proof sketch, consider that the environment might contain undecideable aspects that determine how fast it will change).

We humans have a range of flexibility both as individuals and as a species; we benefit from having at least some people more adapted to others when things change. It might be the same thing among AIs: rather than to seek an optimal design and then copy it endlessly, we create a good design and create a large number of slightly different variants. The next generation of AIs would be based on the most successful variants, as well as the knowledge gained by the experiences of the AIs themselves. This approach enables AIs to develop along divergent tracks in different circumstances - the kind of intelligence and personality useful for programming is different from what is useful as an entertainer or diplomat.

But what about the guarantees of keeping these devices moral? The three laws promise guarantees but at most produces safety railings (which is nothing to sneeze at; even the above flexible AIs will likely have a few built in limiters and biases of a similar kind - the fact that most humans are emotionally unable to kill other humans has not prevented some from doing it or training others to do it, but the overall effect is quite positive). Setting up a single master goal that is strongly linked to the core value system of the AI might be more robust to experience, reprogramming and accidents. But it would still be subject to the bias-variance dilemma, and the complexities of interpreting that goal might make it rather unstable in individual AIs. Having a surrounding "AI community" and an AI-human shared society moderates these instabilities - moral experiences and values are shared, webs of trust and trade integrated different kinds of agents and a multitude of goals and styles of thinking co-exist. Rogue agents can be both inhibited by the interactions with the society and in extreme cases by the combined resources and coercive power of the others. While moral in the end resides in the individual actions of agents, it can be sustained by collective interaction.

This is the multi-layered approach to creating "safe" AI (and humans). At the bottom level some built-in biases and inhibitions. Above it goals and motivational structures that are basically "good" (it is an interesting subject for another essay to analyse how different motivation architectures affect ethics; c.f. Aristotle's ethics, the effect of temporal-difference learning in dopamine signals and naturalistic decision-making for some ideas). Above these goals are the experiences and schemas built by the agent, as well as what it has learned from others. Surrounding the agent is a social situation, further affecting its behavior even when it is rationally selfish by giving incentives and disincentives to certain actions. And finally there are society-level precautions against misbehavior.

This is far from the neatness of the three laws. It is a complex mess, with no guarantees on any level. But it is also a very resilient yet flexible mess: it won't break down if there is a problem on one level, and multi-level problems are less likely. If the situations change the participants can change.

But to most people this complexity is unappealing: give us the apparent certainty of the three laws! There is a strong tendency to distrust complex spontaneous orders (despite our own bodies and minds being examples!) and to prefer apparent simplicity. This is where I think the 3 Laws Unsafe site is necessary: to remind people that simplicity isn't to be trusted unconditionally, and to show the fascinating array of possibilities AI ethics can offer.

Posted by Anders at July 15, 2004 03:51 PM

Comments

When speculating about technological advances there is an almost unavoidable tendency to extrapolate too narrowly, with insufficient weight given to the uncertain but considerable context within which a specific technology will interact.

In the case of AI, it seems likely that non-sentient artificially intelligent tools will be developed and gain widespread popularity as means to augment human intelligence and human collaborative efforts – before, and on the path to – the development of sentient or recursively self-improving AI.

As the augmented collective intelligence of human organizations increases, and local interests are increasingly superceded by more global interests, a metaethics of cooperation can be expected to evolve and be applied at many scales of human endeavor.

This evolution of human collective intelligence and morality will throw a much different light on currently intractable and paradoxical problems such as the tragedy of the commons and prisoners’ dilemma situations.

The challenge is to raise ourselves to that level of awareness and wisdom before our bad-ass ancestral tendencies do too much damage.

- Jef

Posted by: Jef Allbright at July 15, 2004 10:21 PM

I agree that the thing to look for is more the collective augmented intelligence rather than individual superintelligences. They might be around, but unless extreme hard takeoff scenarios hold, they will be surrounded by vast numbers of nearly as smart entities with a combined power far larger than them. A swell rather than a spike, so to say.

I wonder about the collective ethical growth. Miller and Drexler point out in "Comparative Ecology: A Computational Perspective" (http://www.agorics.com/Library/agoricpapers/ce/ce4.html#section4.5)
that ecosystems are far more coercive and violent than markets. To a large extent is seems to be due to the high number of win-win interactions compared to zero-sum interactions. This helps establish and sustain a high degree of cooperation. Given that markets represents a sizeable share of human interactions and the goods traded are anything humans can conceive ways of trading, it seems reasonable to suspect that they are indeed representative of complex cultural systems. As AI enables us to add culture and smarts to nearly any object or interaction, it seems plausible that these win-win interactions will multiply and we will tend towards greater (on average) cooperation. In many ways this change does not require individual awareness or wisdom just as the stable altruist strategies in the iterated prisoners' dilemma do not require altruist agents. The wisdom is a collective phenomenon rather than individual.

What can (and likely will) upset this cozy drift towards niceness is that complex systems have plenty of degrees of freedom and hence many ways of going wrong and many arbitrary reasons for conflicts (say, over different ethical systems or aesthetics). The defectors are probably a smaller problem than the occasional nutcases with enhanced destructive abilities and the statists seeking to impose a 'rational' (i.e. simplistic) order on parts of the system. While I believe large and diverse systems will handle such problems well (likely in some kind of self-organized criticality state with power-law distributed disasters followed by restorative transients), it is not nice to be near one of the disasters, and we better make sure the system is large enough to handle even the biggest conceivable disasters.

Posted by: Anders at July 15, 2004 11:14 PM