From: derek@cs.wisc.edu (Derek Zahn)
X-Original-Message-Id: <9303171612.AA13884@lynx.cs.wisc.edu>
Subject: SELF-HELP/GA: Life is an N-Armed Bandit
X-Original-To: extropians@gnu.ai.mit.edu
Date: Wed, 17 Mar 93 10:12:34 CST
X-Extropian-Date: Remailed on March 17, 373 P.N.O. [16:12:45 UTC]
Reply-To: Extropians@gnu.ai.mit.edu
Errors-To: Extropians-Request@gnu.ai.mit.edu
Status: RO
The following brief glurg may be freely distributed,
without legal ramifications or karmic debt, provided that
it is not altered and contains this notice. It is intended
for personal enjoyment only. If you take it too seriously,
you're probably an idiot, and there's little I can do
for you.
LIFE IS AN N-ARMED BANDIT
or
BETTER LIVING THROUGH GENETIC ALGORITHMS
by
Derek Zahn
1. INTRODUCTION
Have you ever wondered how much of your precious time should
be spent on current interests/projects/friends/etc and how
much spent exploring new possibilities? If you're like most
people, you probably base such decisions on whims, vague
mushy feelings like boredom and dissatisfaction, influences
of infomercials and other media-transported memetic viruses,
and temporary psychoses induced by alterations of brain
chemistry.
No more! The answer comes, not surprisingly, from Genetic
Algorithms[1] (see Extropy #9); specifically, from its founder,
John Holland, as expressed in his book, _Adaptation in
Natural and Artificial Systems_ (theorems and corrolaries
mentioned in this essay refer to that book).
2. LIMITING LOSSES IN LAS VEGAS
Holland began by analyzing strategies for an N-armed bandit:
suppose you have a slot machine with N arms to choose from.
Each arm has a different rate of paying off. The optimal
strategy would obviously be to always pull the arm with the
highest payoff -- BUT, and this is the key question, how
do you know which arm that is? Information must be gathered
about the different arms and used to guide decisions about
which arm to pull next -- and this gathering of information
can never stop because there's always some chance that the
arm that appears to be best really is not.
For the special case of a 2-armed bandit, Holland demonstrated
(theorem 5.1) a great strategy. Specifically, let m1 and m2
be the observed mean payoffs of the two arms, and let s1 and s2
be the variances of those payoffs. For convenience, say m1 > m2.
Then, if we're going to make N pulls, we should allocate
/ 2 \
2 | N |
b * ln |--------------------|
| 4 2 |
\ 8 * PI * b * ln(N )/
s1
trials to the second arm (the worse arm), where b = -------
m1 - m2
Now, this "flute music" is largely irrelevant to our needs (especially
since it technically requires that we know the means and variances that
will be observed, *before* we do the experiments), but some things can
be determined: First, we can use the previously-observed means and
variances as estimates. Second, the important point (see corollary
5.2) is that the better arm should be allocated exponentially more
pulls in the future to maximize returns. The same result holds for
bandits with N arms.
3. THE PROPER WAY TO USE A DIARY
To apply this to self-improvement, we need to formalize our lives
(note that nearly every Extropian-compatible self-help technique
involves such life formalization to one extent or another). First,
we need to get organized. Our primary data structure will be a
set of different ways we can spend our time. Make a list of
these. Define M as the number of entries in this list. Now, let's
define some time unit that is to be the granularity of our record-
keeping. In the rest of this paper, I will assume that the time unit
is half an hour, though arguments could be made for other time units[2].
Each time unit of your waking life[3], record which activity you were
involved in, and an evaluation measure (a real number between
0.0 and 1.0). Revisionism at the end of a day or week is probably
allowable, since your memories of your effectiveness are probably
as important to maximize as your transitory perceptions.
By altering the amount of time spent on each of your projects
according to the N-Armed Bandit return optimization strategy,
you should be able to maximize the average evaluation in the
future, right?
Alas, if only it were so simple. Unfortunately, this first
approximation would rapidly result in days filled with nothing
but sex[4] (assuming the required experimental setup is available).
The problem is that, just as biological organisms do not have
a single gene with a large number of alleles, our lives have
multiple components that interact. Refinement is in order.
4. THE SCHEMATA OF EVERYDAY LIFE
To capture the interaction of activities, we really want
to evaluate all possible combinations in some time period.
For the purposes of illustration, let's choose the time
period to be one waking day, which will be defined as
32 half-hour slots, each of which is filled with one of
the M activities from our list. Letting letters represent
activities, a typical day might then look something like:
XAAXBFYFXEEEEEEXIUNGGXXXWWWEEQXX
Where 'X' represents reading Extropians email. This kind
of coding, where each gene can take the same values, is
unusual but not theoretically much different from the
general case.
Let's define a SCHEMA to be some possible combination of
activities during a day -- for example, if '*' represents
ANY arbitrary activity for a timeslot,
XAA******EE*****I***G**********X and
****BFY***E**********XXX******** and
**AX****************************
are all schemata that appear in the day given above.
There are a LOT of schemata possible! In fact,
32 32
(M + 1) of them. And each day contains 2 of
those!
So if we evaluate the "goodness" of the day (either computed
from the half-hour assessments by some formula, or just given
at the end of the day), we are collecting information on
32
those 2 schemata, which is pretty good! Unfortunately,
it's impractical to store all that information. BUT, if
we apply Holland's major theoretical result -- the Fundamental
Theorem of Genetic Algorithms -- we can IMPLICITLY store
that information, and proceed to glory!
5. TIME MANAGEMENT LIKE YOU'VE NEVER SEEN IT BEFORE
Let's define a population of days (for convenience, call
it a "month"). Start with a random month (or, for
pragmatic reasons, copy some recorded month). Now,
"evaluate" each member of the population by living that
day -- following the time schedule as closely as posssible.
Once the month is over, we plan out the next month by
performing fitness-proportional reproduction on the previous
month's days, subject to genetic operators. We'll use
mutation (where a single half-hour gets changed to another
one with some probability mu) and crossover (where activities
between two days in the population are shuffled with some
probability rho). Then, charge into next month!
Here's the point: Holland showed that this strategy IMPLICITLY
keeps track of the fitness values for the schemata and forms
a good approximation to the optimal N-Armed-Bandit strategy!
This absolutely stunning result gives us, finally, a rational,
justifiable approach to improving the quality of our lives.
So go for it! You have nothing to lose but your Franklin Planner!
6. CONCLUSION
Have patience! Theoretical calculations are not yet complete
(and are heavily dependent on M), but it appears that approaching
the "optimal" day (within some reasonable epsilon) could be
possible within a hundred thousand years! A drop in the bucket
for an immortalist Extropian!
NOTES
-----
[1] As what, after all, doesn't -- one way or another?
[2] Specifically, in the future, when tighter integration with
computers will become normal, it will become feasible to
drastically shrink the time unit.
[3] As the "missing third" of our lives is reclaimed by lucid
dream invocation and other sleep-exploration research, we
will probably want to monitor time units spent asleep as
well.
[4] Of course, there's something to be said for that...