BigDataFr recommends: Large Interactive Visualization of Density Functions on Big Data Infrastructure
Abstract
Recently, hybrid multi-site big data analytics (that combines on-premise with off-premise resources) has gained increasing popularity as a tool to process large amounts of data on-demand, without additional capital investment to increase the size of a single datacenter. However, making the most out of hybrid setups for big data analytics is challenging because on-premise resources can communicate with off-premise resources at significantly lower throughput and higher latency. […]
[…] This paper contributes with a work-in-progress study that aims to identify and explain this impact in relationship to the known behavior on a single cloud. To this end, it analyses a representative big data workload on a hybrid Spark setup. Unlike previous experience that emphasized low end-impact of network communications in Spark, we found significant overhead in the shuffle phase when the bandwidth between the on-premise and off-premise resources is sufficiently small. [..]
Read paper
By Roxana-Ioana Roman1, Bogdan Nicolae2, Alexandru Costan3, Gabriel Antoniu3, David Auber1
Source: hal.archives-ouvertes.fr
1 University of Rennes 1
2 IBM Research – Ireland
3 KerData – Scalable Storage for Clouds and Beyond
Inria Rennes – Bretagne Atlantique , IRISA-D1 – SYSTÈMES LARGE ÉCHELLE