Presentation

AING

This classifier is designed by Mohamed Rafik Bouguélia during his thesis from September 2011 and August 2014. It is an on line learning method based on the "growing neural gas" algorithm (GNG). At each time, only one data is read from a continuous data stream. This data is used to incrementally update the model. The method maintains a model as a dynamically evolving graph topology of data-representatives called neurons. It also performs a merging process which uses a distance-based probabilistic criterion to eventually merge neurons. In the following, we present the different evolutions of this approach.

The initial version: GNG

The main point of GNG is its use of a period for the creation of a new node. Different variants of GNG have been proposed to relax this this periodical evolution.These variants are: IGNG, I2GNG and SOINN. The figure below illustrates the functioning of each one of them.

AING: Adaptive Incremental Neural Gas

The algorithm

Incremental learning from data stream
Adaptive parameter-free distance threshold
Merging process to preserve time and memory resources.

The merging process

uses a rule based on a probabilistic criterion more convenient than a very rigid distance. The probability is proportional to the distance between the two neurons (dist(y; ~y)) and to the number of data-points assigned to y (|Xy|)
Not affected by a initialization parameters only up_bound which depends on system requirements: If the number of neurons becomes higher than an upper bound (up_bound), the merging process is triggred up_bound depends directly on system requirements (e.g. memory budget).

Results

A2ING: semi-supervised active learning

The principle

In usual supervised learning methods, data labelling is operated beforehand over a static dataset. This is not possible in the case of data streams where data is continuously arriving. Furthermore, manual labelling is costly and time consuming. Active learning has as objective to adapt the labelling to the contious data stream each time is necessary by querying from a human labeller only the class labels of data which are informative.

The main actions

Semi-supervised active extension of AING
Can learn from both labelled and unlabelled patterns by actively querying the labels of the most informative patterns according to an uncertainty measure
Decides itself, at each new pattern arrival, whether or not its class-label should be queried from a human annotator
Get a small number of labelled patterns to initialise the model with some neurons

How to label a new class?

For each new pattern x, we use K-NN and we derive a probability of belonging to its two most probable classes. Let KNN(x) = {(y1; cy1 )... (yK; cyK)} be the K nearest neurons selected from G. Let P(cjx) be the probability that the pattern x belongs to the class c.

If for a pattern x, Δ(c1;c2|x) is close to 0, this means that the probability of x to belong to both classes are similar and x is informative, because knowing the label of x can help to descriminate from the two classes.

Datasets

DocSet dataset: 519 documents for learning, 260 documents for testing, p = 413, 13 classes.

CAF dataset: 772 documents for learning, 386 documents for testing, p = 271, 141 classes.

LIRMM dataset: 1301 documents for learning, 650 documents for testing, p = 277, 24 classes.

MMA dataset: 1728 documents for learning, 863 documents for testing, p = 292, 25 classes.

APP dataset: 13161 documents for learning, 6581 documents for testing, p = 328, 139 classes.

Experiments

Active learning in the presence of label noise

Mislabelling is a big issue in active learning methods because it impacts the classification accuracy. M. Rafik Bouguelia proposes a mislabelling likelihood measure to characterize the potentially mislabelled instances. Then, he derives a measure of informativeness that expresses how much the label of an instance needs to be corrected by an expert labeller.

Incertainty measure

The instances that are selected for manual labelling are typically those for which the model h is uncertain about their class. Given a sample x, the true class label of x would improve h and reduce its overall uncertainty (x is said to be informative)

An uncertainty measure that selects instances with a low prediction probability can be defined as:

the probability that x belongs to class y. The selection algorithm is given below.

Impact of label noise

M. R. Bouguelia measures below the different impacts of δ over accuracy (A), number of labelled instances (B) and (C)

Characterizing mislabelled of instances

We define a disagreement coefficient that reflects the mislabelling likelihood of instances.

Let x be a data point whose label is queried. The class label given by the labeller is noted y_g. Let y_p = argmax_y∈Y Ph(y|x) be the class label of x which is predicted by the classifier

First measure of disagreement: likelihood disagreement

If y_g≠y_p a mislabelling error have been occurred.

Let p_p = P(y_p|x) and p_g = P(y_g|x).We estimate a degree of disagreement D1(x) = 1- P(y_g|x)⁄P(y_p|x)

The higher the value of D₁, the more likely that x has been incorrectly labelled with y_g, because the probability that x belongs to y_g would be small relatively to y_p.

Second measure of disagreement: multiview disagreement

Many problems in machine learning require datasets for multiple views, e.g, an object can be categorized from either its color or shape. When the sample in each view doesn't lead to the same class, due to data corruption or noise, this causes a view disagreement.

If we consider the features taken by the object in each view, suppose f₁ for the first view and f₂ in the second view. If we also consider q_{y_g}^f_i resp. q_{y_p}^f_i the rate that shows how much the feature f_i is likely to contribute at classifying x into y_p and y_g respectively.

Let F_p be the set of features that contributes at classifying x in the predicted class y_p more than the given class, and inversely for F_g:

The quantity information measuring the membership of x to y_p (resp. y_g) is measured by a second degree of disagreement:

or D₂ normalized between [0,1]: D₂ = (D'₂+1)⁄2