Learning Classifier Systems for Class Imbalance Problems

Learning Classifier Systems
for Class Imbalance
Problems

Ester Bernadó-Mansilla

Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull
Barcelona, Spain

Aim

Enhance the applicability of LCSs
to knowledge discovery from datasets

Classification problems
Real-world domains

Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla

Framework

model
LCS
Dataset
+
estimated
performance
• Representativity of the target
• Evolutionary pressures
concept

• Interpretability
• Geometrical complexity

• Domain of applicability
• Class imbalance

• Noise

Class Imbalance

When one class is represented by a small number of

examples, compared to other class/es.

Usually the class of that describes the circumscribed

concept (positive class) is the minority class

Where?

Rare medical diagnoses

Fraud detection

Oil spills in satellite images



Class Imbalance and Classifiers

Is there a bias towards the majority class?


Probably, because…

Most classifier schemes are trained to minimize the global error


As a result

They classify accurately the examples from the majority class

They tend to misclassify the examples of the minority class,

which are often those representing the target concept.


Measures of Performance

Confusion matrix

Prediction
A B
Actual A true positive (TP) false negative (FN)
B false positive (FP) true negative (TN)

Accuracy = (TP+TN)/(TP+FN+FP+TN)

TN rate = TN / (TN + FP)

TP rate = TP / (FN + TP)

ROC curves


The Higher Class Imbalance: the
Higher Bias?

Dataset 1 Dataset 2

concept: 15 concept: 15
counterpart: 150 counterpart: 45
ratio: 10:1 ratio: 3:1


XCS

XCS

class
input Set of
Rules

update
search

Genetic Reinforcement
Algorithms Learning

reward

Environment

Dataset


Our Approach with XCS

Bounding XCS’s parameters for unbalanced datasets


Online identification of small disjuncts


Adaptation of parameters for the discovery of small

disjuncts


XCS’s Behavior in Unbalanced
Datasets

Unbalanced 11-multiplexer problem

ir=16:1 ir=32:1 ir=64:1


XCS’s Population

Most numerous rules, ir=128:1

Classifier P Error F Num
###########:0 1000 0.12 0.98 385
1.2 10-4
###########:1 0.074 0.98 366

estimated estimated too high
high
prediction:
overgeneral error: numerosity
fitness
992.24
classifiers 15.38
7.75

Test examples are classified as belonging to the majority class


How Imbalance Affects XCS

Classifier’s error


Stability of prediction and error estimates


Occurrence-based reproduction



Classifier’s Error in Unbalanced
Datasets

Will an overgeneral classifier be detected as inaccurate if the

imbalance ratio is high?

Bound for inaccurate classifier: !quot;!0
Given the estimated prediction and error:

P = Pc (cl ) Rmax + (1 ! Pc (cl )) Rmin
quot;=| P ! Rmax | Pc (cl )+ | P ! Rmin | (1 ! Pc (cl ))
We derive:

# quot;o p 2 + 2 p ( Rmax # quot;0 )# quot;0 ! 0
where !quot;!0
p =!C / C
For
Rmax = 1000 !0 = 1
we get maximum imbalance ratio:

irmax = 1998

Prediction and Error Estimates and
Learning Rate
ir=128:1, ###########:0
Error
Prediction
β=0.2
β=0.002


Occurrence-based Reproduction

Probability of occurrence (pocc)

Given ir=maj/min:
0,6

Classifier poccB poccI
0,5
1/2 1/2
########### :0

probability of occurrence
1/2 1/2
########### :1 0,4

0,3
0000#######:0 1/32
0,2

0001#######:1 1/32 0,1

0
1 2 4 8 16 32 64 128 256

imbalance ratio
22ir
p occB 00001######:1 00000######:0
ir + 1 ###########:0 ###########:1


Occurrence-based Reproduction

Probability of reproduction (pGA)


1
pGA =
TGA
if Tocc < % GA
#% GA
where TGA $ quot;
!Tocc otherwise

With θGA=20:

GA
Tocc
…
T (# # # # # # # # # # #: 0) ! quot;
GA GA
θGA
Tocc
GA
…
T (0000# # # # # # #: 0) ! T 1
GA occ
θGA
1 Assuming non-overlapping


Guidelines for Parameter Tuning

Rmax and є0 determine the threshold between negligible noise and

imbalance ratio

β determines the size of the moving window. The window should be

high enough to allow computing examples from both classes:
f min
! =k
f maj
θGA can counterbalance the reproduction opportunities of most frequent

(majority) and least frequent niches (minority):

1
! GA = k '
f min


XCS with Parameters Tuning

XCS with parameter tuning
XCS with standard settings

ir=16:1 ir=32:1 ir=64:1 ir=64:1 ir=256:1


XCS Tuning for Real-world Datasets

How we can estimate the niche frequency?


Estimate from the ratio of majority class instances and minority

class instances

Problem:


• This may not be related to the distribution of niches in the feature
space

Take the approach to the small disjuncts problem



Online Identification of Small Disjuncts

We search for regions that promote

overgeneral classifiers

Estimate ircl based on the classifier’s

experience on each class:

exp max
ircl =
exp min

Adapt β and θGA according to ircl


ircl = 20 / 4


Online Parameter Adaptation

ir=256:1


What about UCS?

Supervised XCS:

Needs less exploration


Avoids XCS’s fitness dilemma


More robust to parameter settings


Overgeneral classifiers also tend to overcome the

population
Their probability of occurrence depends on the imbalance ratio

Partially minimized with fitness sharing



What about UCS?

ir=256:1
ir=512:1


Are LCSs more error-prone to class imbalance
than other classifier schemes?
C4.5 SMO XCS
TP rate Bal2c1 0,00% ± 0,00% 0,00% ± 0,00% 0,00% ± 0,00%

Bal2c2 81,65% ± 6,83% 93,72% ± 4,64% 81,96% ± 6,00%

Bal2c3 81,90% ± 6,04% 93,77% ± 5,59% 83,99% ± 6,88%

bpa 42,95% ± 14,09% 0,00% ± 0,00% 61,38% ± 9,10%

gls2c1 80,00% ± 42,16% 0,00% ± 0,00% 50,00% ± 52,70%

gls2c2 35,00% ± 47,43% 15,00% ± 33,75% 55,00% ± 49,72%

gls2c3 30,00% ± 42,16% 0,00% ± 0,00% 5,00% ± 15,81%

gls2c4 75,00% ± 32,63% 81,67% ± 25,40% 81,67% ± 25,40%

gls2c5 77,14% ± 16,77% 10,00% ± 9,64% 84,29% ± 14,21%

gls2c6 59,82% ± 15,13% 0,00% ± 0,00% 81,79% ± 13,95%

h-s 75,83% ± 13,29% 80,00% ± 7,03% 80,00% ± 9,78%

pim 55,37% ± 13,27% 53,38% ± 6,42% 55,93% ± 9,75%

tao 95,23% ± 2,14% 84,11% ± 6,17% 92,58% ± 5,72%

thy2c1 90,00% ± 16,10% 76,67% ± 22,50% 90,00% ± 16,10%

thy2c2 94,17% ± 12,45% 54,17% ± 24,92% 90,83% ± 14,93%

thy2c3 90,95% ± 10,34% 33,81% ± 21,35% 90,71% ± 8,05%

wav2c1 75,74% ± 4,06% 88,51% ± 3,20% 87,24% ± 3,43%

wav2c2 72,34% ± 3,89% 84,57% ± 4,05% 78,72% ± 2,57%

wab2c3 77,64% ± 2,38% 89,97% ± 3,48% 87,86% ± 3,65%

wbdc 92,95% ± 3,42% 95,42% ± 5,36% 95,83% ± 5,89%

wdbc 92,47% ± 5,09% 94,81% ± 2,71% 93,83% ± 6,37%

wine2c1 89,00% ± 16,63% 100,00% ± 0,00% 100,00% ± 0,00%

wine2c2 95,00% ± 8,05% 98,33% ± 5,27% 98,33% ± 5,27%

wine2c3 90,18% ± 11,70% 97,14% ± 6,02% 98,57% ± 4,52%
wpbc 41,00% ± 12,87% 9,50% ± 17,07% 30,50% ± 24,99%

How can we Minimize the Effects of
Small Disjuncts?

Resampling the dataset:
 Addresses small
disjuncts
Classical methods:

• Random oversampling
• Random undersampling Assumes that
clusterization will
Heuristic methods:

find small
• Tomek links
disjuncts and
• CNN match classifier’s
• One-sided selection approximation
• Smote
Cluster-based oversampling
 Could XCS
benefit from the
online
Cost-sensitive classifiers

identification of
small disjuncts?

Domains of Applicability

Should we use some counterbalancing scheme?


Which learning scheme should we use?


Is there a combination of counterbalancing

scheme+learner that beats all others?

How can we know the presence of small

disjuncts?

Are there other complexity factors mixed up with

the small disjuncts problem?

Domains of Applicability

Resampling/
Learn it! Classifier/
Resampling+classifier
Where are
LCSs
placed?

Dataset Dataset
Suggested
Prediction
characterization
approach

Type of dataset:
Geometrical distribution of classes
Possible presence of small disjuncts
Other complexity factors


Future Directions

Potential benefit of XCS to discover small disjuncts

…and learn from it online

Further analyze UCS


How do LCSs perform w.r.t. other classifiers for unbalanced

datasets?

Measures for small disjuncts identification

… and other possible complexity factors

What is noise and what is a small disjunct?


In which cases a LCS is applicable?



Learning Classifier Systems for Class Imbalance Problems

More Related Content

Similar to Learning Classifier Systems for Class Imbalance Problems

Similar to Learning Classifier Systems for Class Imbalance Problems (10)

More from Xavier Llorà

More from Xavier Llorà (20)

Recently uploaded

Recently uploaded (20)

Learning Classifier Systems for Class Imbalance Problems