[HAL] BigDataFr recommends: Random forests and big data #datascientist

BigDataFr recommends: A Prime Number Based Approach for Closed Frequent Itemset Mining in Big Data

Abstract

Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involves massive data but it also often includes data streams and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. […]

This paper reviews available proposals about random forests in parallel environments as well as about online random forests. Then, we formulate various remarks and sketch some alternative directions for random forests in the Big Data context. .’ […]

Read paper
By Robin Genuer 1,2, Jean-Michel Poggi 3, Christine Tuleau-Malot 4, Nathalie Villa-Vialeneix 5

1 SISTM – Statistics In System biology and Translational Medicine
INRIA Bordeaux – Sud-Ouest, Epidémiologie et Biostatistique
2 ISPED – Institut de Santé Publique, d’Epidémiologie et de Développement
3 LM-Orsay – Laboratoire de Mathématiques d’Orsay
4 JAD – Laboratoire Jean Alexandre Dieudonné
5 MIAT INRA – Unité de Mathématiques et Informatique Appliquées de Toulouse

Source: hal.archives-ouvertes.fr

Laisser un commentaire