BigDataFr recommends: Scalable and Accurate Online Feature Selection for Big Data
Feature selection is important in many big data applications. There are at least two critical challenges. Firstly, in many applications, the dimensionality is extremely high, in millions, and keeps growing. Secondly, feature selection has to be highly scalable, preferably in an online manner such that each feature can be processed in a sequential scan.
In this paper, we develop SAOLA, a Scalable and Accurate OnLine Approach for feature selection. With a theoretical analysis on bounds of the pairwise correlations between features, SAOLA employs novel online pairwise comparison techniques to address the two challenges and maintain a parsimonious model over time in an online manner.
Furthermore, to tackle the dimensionality that arrives by groups, we extend our SAOLA algorithm, and then propose a novel group-SAOLA algorithm for online group feature selection. […]
Read article
By Kui Yu, Xindong Wu, Wei Ding, Jian Pei
Source: arxiv.org