BigDataFr recommends: FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data
Abstract
‘Big data parallel frameworks, such as MapReduce or Spark have been praised for their high scalability and performance, but show poor performance in the case of data skew. There are important cases where a high percentage of processing in the reduce side ends up being done by only one node. In this demonstration, we illustrate the use of FP-Hadoop, a system that efficiently deals with data skew in MapReduce jobs. In FP-Hadoop, there is a new phase, called intermediate reduce (IR), in which blocks of intermediate values , constructed dynamically, are processed by intermediate reduce workers in parallel, by using a scheduling strategy.[…] ‘
Read paper
Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez. FP-Hadoop: Ecient Execution of
Parallel Jobs Over Skewed Data. VLDB’2015: 41st International Conference on Very Large
Databases, Aug 2015, Hawai, United States.<lirmm-01162362>
Source: hal.archives-ouvertes.fr