Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques

Do not Match, Inherit: Fitness Surrogates for
Genetics-Based Machine Learning Techniques

Xavier Llorà1,2, Kumara Sastry2, Tian-Li Yu3, David E. Goldberg2

1 National Center for Supercomputing Applications. University of Illinois at Urbana-Champaign
2 Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign
3 Department of Electrical Engineering, National Taiwan University

Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199
GECCO 2007 HUMIES 1

Motivation
• Competent GBML
– Use competent GAs to approach GBML problems
– Take advantage of competent GA scalability
– Provide insight about problem structure
– χeCCS by Llorà, Sastry, Goldberg & de la Ossa (2006)

• Rule matching may thread practical applications
– Even for small dimensional problems (MUX 20), rule matching
may take more than 85% of the execution time in XCS
– As dimensionality or cardinality of training sets increase, rule
matching rules the overall execution time
– Efficient implementation (Llora & Sastry, 2006) still require
matching rules

GECCO 2007 Llorà, Sastry, Yu & Goldberg 2

Motivation
• Competent GAs
– Byproduct: Models and problem structure insight
– Revision of the fitness relaxation for expensive fitness
evaluations
– Idea: Build a cheap surrogate fitness accurate enough
– Successfully applied to GA (Sastry, Lima & Goldberg, 2006)
– Help cut down the number of fitness evaluations

• GBML
– Can we transfer the same ideas to GBML approaches?
– What are the requirements needed for competent GBML to
benefit from fitness relaxation?


Outline
• Overview χeCCS
• Evaluation relaxation
• Fitness inheritance using least squares fitting
• Fitness inheritance and χeCCS
• Results
• Conclusions


χ-ari Extended Compact Classifier System

• No reinforcement learning is used
• A competent GA is in charge of the learning
• The idea:
– A population of single rules
– For each rule we compute its fitness
– The χ-ari extended compact genetic algorithm
– Niching to maintain different accurate rules (restricted
tournament replacement)


Maximally Accurate and General Rules
• Accuracy and generality can be compute as

n t + (r) + n t# (r) n t + (r)
quot;(r) = quot;(r) =
nt nm
• Fitness should combine accuracy and generality
f (r) = quot;(r) # $(r)%
! !
• Such measure can be either applied to rules or rule sets

!


Maximally Accurate and General Rules


Extended Compact Genetic Algorithm
• A Probabilistic model building GA (Harik, 1999)
– Builds models of good solutions as linkage groups
• Key idea:
– Good probability distribution → Linkage learning

• Key components:
– Representation: Marginal product model (MPM)
• Marginal distribution of a gene partition

– Quality: Minimum description length (MDL)
• Occam’s razor principle
• All things being equal, simpler models are better

– Search Method: Greedy heuristic search


Marginal Product Model (MPM)
• Partition variables into disjoint sets
• Product of marginal distributions on a partition of
genes
• Gene partition maps to linkage groups
MPM: [1, 2, 3], [4, 5, 6], … [l-2, l -1, l]

... xl-2 xl-1 xl
x1 x2 x3 x4 x5 x6

{p000, p001, p00#, p010, p011, p01#, p100, p101,
p10#, p110, p111, p11# … }(27 probabilities)

Minimum Description Length Metric
• Hypothesis: For an optimal model
– Model size and error is minimum

• Model complexity, Cm
– # of bits required to store all marginal probabilities

• Compressed population complexity, Cp
– Entropy of the marginal distribution over all partitions

• MDL metric, Cc = Cm + Cp

Building an Optimal MPM
• Assume independent genes ([1],[2],…,[l])

• Compute MDL metric, Cc

• All combinations of two subset merges
Eg., {([1,2],[3],…,[l]), ([1,3],[2],…,[l]), ([1],[2],…,[l-1,l])}
•

• Compute MDL metric for all model candidates

• Select the set with minimum MDL,

• If , accept the model and go to step 2.

• Else, the current model is optimal


χeCCS Models for Different Multiplexers
Building Block Size Increases

Fitness Inheritance using Least Squares
• Proposed by Sastry, Lima & Goldberg (2006)
• Surrogate is a regression using basis identified by BBs
• A simple example: [1,3] [2] [4]
• The schemas represented are
– {0*0*, 0*1*, 0*#*, 1*0*, 1*1*, 1*#*, #*0*, #*1*, #*#*,
*0**, *1**, *#**, ***0 , ***1 , ***#}
• Recode the individuals by


• Recoding defines matrix A

• Normalize the fitness


• Solve using least squares

• Once solved, the fitness surrogate take the following form


Fitness Inheritance and χeCCS
• Two different problems

Hidden XOR 6-input multiplexer


Hidden XOR
• Evolved rules and model

• Surrogate accuracy


6-input Multiplexer
• The evolved solution and model

• The surrogate is totally off


6-input Multiplexer
• The key = missing basis

• χeCCS is able to solve the problem quickly, reliably,
and accurately
• However, the model basis are not accurate enough
to build a proper surrogate

Overlapping BBs using DSMGA
• Proposed by Yu, Yassine, Goldberg and Chen (2003)
• Based on organizational theory
• Main property = DSMGA model builder (DSMcluster)
deals with overlapping building blocks
• The main issue = translate a populations or rules
into a dependency structure matrix (DSM)
• The intuition = specific bits are the ones responsible
for the kind of linkage we seek


Jumping to the Results
• DSMcluster model for the hidden XOR
– [i0 i1 i2] [i3] [i4] [i5]

• DSMcluster model for the 6-input multiplexer
– [i0 i1] <i2 i3 i4 i5>
– It identifies a BB [i0 i1] of variables interacting with a
bus <i2 i3 i4 i5>
– Translated into χeCCS language:
[i0 i1 i2] [i0 i1 i3] [i0 i1 i4] [i0 i1 i5]
– The right model which provides the right set of basis


Conclusions
• The matching process is crucial and expensive
• Efficient implementations can take us far to a point
• Relaxation can get rid of the need of matching
• For some types of problems overlapping BBs are
required
• DSMGA provides the proper machinery to identify
the proper basis for such a surrogate


Do not Match, Inherit: Fitness Surrogates for
Genetics-Based Machine Learning Techniques

Xavier Llorà1,2, Kumara Sastry2, Tian-Li Yu3, David E. Goldberg2

1 National Center for Supercomputing Applications. University of Illinois at Urbana-Champaign
2 Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign
3 Department of Electrical Engineering, National Taiwan University

Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199
GECCO 2007 HUMIES 23

Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques

More Related Content

Similar to Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques

Similar to Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques (20)

More from Xavier Llorà

More from Xavier Llorà (20)

Recently uploaded

Recently uploaded (20)

Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques