AING
 
This classifier is designed by Mohamed Rafik Bouguélia during his thesis from September 2011 and August 2014. It is an on line learning method based on the "growing neural gas" algorithm (GNG). At each time, only one data is read from a continuous data stream. This data is used to incrementally update the model. The method maintains a model as a dynamically evolving graph topology of data-representatives called neurons. It also performs a merging process which uses a distance-based probabilistic criterion to eventually merge neurons. In the following, we present the different evolutions of this approach.
The initial version: GNG
The main point of GNG is its use of a period for the creation of a new node. Different variants of GNG have been proposed to relax this this periodical evolution.These variants are: IGNG, I2GNG and SOINN. The figure below illustrates the functioning of each one of them.
AING: Adaptive Incremental Neural Gas
The algorithm
  • Incremental learning from data stream
  • Adaptive parameter-free distance threshold
  • Merging process to preserve time and memory resources.
The merging process
  • uses a rule based on a probabilistic criterion more convenient than a very rigid distance. The probability is proportional to the distance between the two neurons (dist(y; ~y)) and to the number of data-points assigned to y (|Xy|)
  • Not affected by a initialization parameters only up_bound which depends on system requirements: If the number of neurons becomes higher than an upper bound (up_bound), the merging process is triggred up_bound depends directly on system requirements (e.g. memory budget).

Results
A2ING: semi-supervised active learning
The principle

In usual supervised learning methods, data labelling is operated beforehand over a static dataset. This is not possible in the case of data streams where data is continuously arriving. Furthermore, manual labelling is costly and time consuming. Active learning has as objective to adapt the labelling to the contious data stream each time is necessary by querying from a human labeller only the class labels of data which are informative.

The main actions
  • Semi-supervised active extension of AING
  • Can learn from both labelled and unlabelled patterns by actively querying the labels of the most informative patterns according to an uncertainty measure
  • Decides itself, at each new pattern arrival, whether or not its class-label should be queried from a human annotator
  • Get a small number of labelled patterns to initialise the model with some neurons

How to label a new class?

For each new pattern x, we use K-NN and we derive a probability of belonging to its two most probable classes. Let KNN(x) = {(y1; cy1 )... (yK; cyK)} be the K nearest neurons selected from G. Let P(cjx) be the probability that the pattern x belongs to the class c.

If for a pattern x, Δ(c1;c2|x) is close to 0, this means that the probability of x to belong to both classes are similar and x is informative, because knowing the label of x can help to descriminate from the two classes.


Datasets
  • DocSet dataset: 519 documents for learning, 260 documents for testing, p = 413, 13 classes.
  • CAF dataset: 772 documents for learning, 386 documents for testing, p = 271, 141 classes.
  • LIRMM dataset: 1301 documents for learning, 650 documents for testing, p = 277, 24 classes.
  • MMA dataset: 1728 documents for learning, 863 documents for testing, p = 292, 25 classes.
  • APP dataset: 13161 documents for learning, 6581 documents for testing, p = 328, 139 classes.

Experiments
Active learning in the presence of label noise

Mislabelling is a big issue in active learning methods because it impacts the classification accuracy. M. Rafik Bouguelia proposes a mislabelling likelihood measure to characterize the potentially mislabelled instances. Then, he derives a measure of informativeness that expresses how much the label of an instance needs to be corrected by an expert labeller.


Incertainty measure

The instances that are selected for manual labelling are typically those for which the model h is uncertain about their class. Given a sample x, the true class label of x would improve h and reduce its overall uncertainty (x is said to be informative)

An uncertainty measure that selects instances with a low prediction probability can be defined as:

the probability that x belongs to class y. The selection algorithm is given below.

Impact of label noise

M. R. Bouguelia measures below the different impacts of δ over accuracy (A), number of labelled instances (B) and (C)


Characterizing mislabelled of instances

We define a disagreement coefficient that reflects the mislabelling likelihood of instances.

Let x be a data point whose label is queried. The class label given by the labeller is noted yg. Let yp = argmaxy∈Y Ph(y|x) be the class label of x which is predicted by the classifier


First measure of disagreement: likelihood disagreement

If yg≠yp a mislabelling error have been occurred.

Let pp = P(yp|x) and pg = P(yg|x).We estimate a degree of disagreement D1(x) = 1- P(yg|x)⁄P(yp|x)

The higher the value of D1, the more likely that x has been incorrectly labelled with yg, because the probability that x belongs to yg would be small relatively to yp.


Second measure of disagreement: multiview disagreement

Many problems in machine learning require datasets for multiple views, e.g, an object can be categorized from either its color or shape. When the sample in each view doesn't lead to the same class, due to data corruption or noise, this causes a view disagreement.

If we consider the features taken by the object in each view, suppose f1 for the first view and f2 in the second view. If we also consider qygfi resp. qypfi the rate that shows how much the feature fi is likely to contribute at classifying x into yp and yg respectively.

Let Fp be the set of features that contributes at classifying x in the predicted class yp more than the given class, and inversely for Fg:

The quantity information measuring the membership of x to yp (resp. yg) is measured by a second degree of disagreement:

or D2 normalized between [0,1]: D2 = (D'2+1)⁄2