From: derek@cs.wisc.edu (Derek Zahn) X-Original-Message-Id: <9303171612.AA13884@lynx.cs.wisc.edu> Subject: SELF-HELP/GA: Life is an N-Armed Bandit X-Original-To: extropians@gnu.ai.mit.edu Date: Wed, 17 Mar 93 10:12:34 CST X-Extropian-Date: Remailed on March 17, 373 P.N.O. [16:12:45 UTC] Reply-To: Extropians@gnu.ai.mit.edu Errors-To: Extropians-Request@gnu.ai.mit.edu Status: RO The following brief glurg may be freely distributed, without legal ramifications or karmic debt, provided that it is not altered and contains this notice. It is intended for personal enjoyment only. If you take it too seriously, you're probably an idiot, and there's little I can do for you. LIFE IS AN N-ARMED BANDIT or BETTER LIVING THROUGH GENETIC ALGORITHMS by Derek Zahn 1. INTRODUCTION Have you ever wondered how much of your precious time should be spent on current interests/projects/friends/etc and how much spent exploring new possibilities? If you're like most people, you probably base such decisions on whims, vague mushy feelings like boredom and dissatisfaction, influences of infomercials and other media-transported memetic viruses, and temporary psychoses induced by alterations of brain chemistry. No more! The answer comes, not surprisingly, from Genetic Algorithms[1] (see Extropy #9); specifically, from its founder, John Holland, as expressed in his book, _Adaptation in Natural and Artificial Systems_ (theorems and corrolaries mentioned in this essay refer to that book). 2. LIMITING LOSSES IN LAS VEGAS Holland began by analyzing strategies for an N-armed bandit: suppose you have a slot machine with N arms to choose from. Each arm has a different rate of paying off. The optimal strategy would obviously be to always pull the arm with the highest payoff -- BUT, and this is the key question, how do you know which arm that is? Information must be gathered about the different arms and used to guide decisions about which arm to pull next -- and this gathering of information can never stop because there's always some chance that the arm that appears to be best really is not. For the special case of a 2-armed bandit, Holland demonstrated (theorem 5.1) a great strategy. Specifically, let m1 and m2 be the observed mean payoffs of the two arms, and let s1 and s2 be the variances of those payoffs. For convenience, say m1 > m2. Then, if we're going to make N pulls, we should allocate / 2 \ 2 | N | b * ln |--------------------| | 4 2 | \ 8 * PI * b * ln(N )/ s1 trials to the second arm (the worse arm), where b = ------- m1 - m2 Now, this "flute music" is largely irrelevant to our needs (especially since it technically requires that we know the means and variances that will be observed, *before* we do the experiments), but some things can be determined: First, we can use the previously-observed means and variances as estimates. Second, the important point (see corollary 5.2) is that the better arm should be allocated exponentially more pulls in the future to maximize returns. The same result holds for bandits with N arms. 3. THE PROPER WAY TO USE A DIARY To apply this to self-improvement, we need to formalize our lives (note that nearly every Extropian-compatible self-help technique involves such life formalization to one extent or another). First, we need to get organized. Our primary data structure will be a set of different ways we can spend our time. Make a list of these. Define M as the number of entries in this list. Now, let's define some time unit that is to be the granularity of our record- keeping. In the rest of this paper, I will assume that the time unit is half an hour, though arguments could be made for other time units[2]. Each time unit of your waking life[3], record which activity you were involved in, and an evaluation measure (a real number between 0.0 and 1.0). Revisionism at the end of a day or week is probably allowable, since your memories of your effectiveness are probably as important to maximize as your transitory perceptions. By altering the amount of time spent on each of your projects according to the N-Armed Bandit return optimization strategy, you should be able to maximize the average evaluation in the future, right? Alas, if only it were so simple. Unfortunately, this first approximation would rapidly result in days filled with nothing but sex[4] (assuming the required experimental setup is available). The problem is that, just as biological organisms do not have a single gene with a large number of alleles, our lives have multiple components that interact. Refinement is in order. 4. THE SCHEMATA OF EVERYDAY LIFE To capture the interaction of activities, we really want to evaluate all possible combinations in some time period. For the purposes of illustration, let's choose the time period to be one waking day, which will be defined as 32 half-hour slots, each of which is filled with one of the M activities from our list. Letting letters represent activities, a typical day might then look something like: XAAXBFYFXEEEEEEXIUNGGXXXWWWEEQXX Where 'X' represents reading Extropians email. This kind of coding, where each gene can take the same values, is unusual but not theoretically much different from the general case. Let's define a SCHEMA to be some possible combination of activities during a day -- for example, if '*' represents ANY arbitrary activity for a timeslot, XAA******EE*****I***G**********X and ****BFY***E**********XXX******** and **AX**************************** are all schemata that appear in the day given above. There are a LOT of schemata possible! In fact, 32 32 (M + 1) of them. And each day contains 2 of those! So if we evaluate the "goodness" of the day (either computed from the half-hour assessments by some formula, or just given at the end of the day), we are collecting information on 32 those 2 schemata, which is pretty good! Unfortunately, it's impractical to store all that information. BUT, if we apply Holland's major theoretical result -- the Fundamental Theorem of Genetic Algorithms -- we can IMPLICITLY store that information, and proceed to glory! 5. TIME MANAGEMENT LIKE YOU'VE NEVER SEEN IT BEFORE Let's define a population of days (for convenience, call it a "month"). Start with a random month (or, for pragmatic reasons, copy some recorded month). Now, "evaluate" each member of the population by living that day -- following the time schedule as closely as posssible. Once the month is over, we plan out the next month by performing fitness-proportional reproduction on the previous month's days, subject to genetic operators. We'll use mutation (where a single half-hour gets changed to another one with some probability mu) and crossover (where activities between two days in the population are shuffled with some probability rho). Then, charge into next month! Here's the point: Holland showed that this strategy IMPLICITLY keeps track of the fitness values for the schemata and forms a good approximation to the optimal N-Armed-Bandit strategy! This absolutely stunning result gives us, finally, a rational, justifiable approach to improving the quality of our lives. So go for it! You have nothing to lose but your Franklin Planner! 6. CONCLUSION Have patience! Theoretical calculations are not yet complete (and are heavily dependent on M), but it appears that approaching the "optimal" day (within some reasonable epsilon) could be possible within a hundred thousand years! A drop in the bucket for an immortalist Extropian! NOTES ----- [1] As what, after all, doesn't -- one way or another? [2] Specifically, in the future, when tighter integration with computers will become normal, it will become feasible to drastically shrink the time unit. [3] As the "missing third" of our lives is reclaimed by lucid dream invocation and other sleep-exploration research, we will probably want to monitor time units spent asleep as well. [4] Of course, there's something to be said for that...