[HAL] BigDataFr recommends: Spark Scalability Analysis in a Scientific Workflow

spark scalability

BigDataFr recommends: Spark Scalability Analysis in a Scientific Workflow

Abstract

[…] Spark is being successfully used for big data parallel processing in many business domains (social media, finance, retail). Spark’s scalability, usability, and large user community have motivated developers from scientific domains (bioinformatics, oil and gas, astronomy) to try it. However, scientific applications’ profile, e.g., black-box programs and intense file writes, differs from traditional business workflows, which may affect its scalability. We present a scalability analysis of Spark in a real case-study in Oil and Gas domain. We explore workloads on a 936-cores HPC cluster processing 330 GB of scientific data. […]

Read paper
By Renan Souza 1, Vitor Silva 1, Pedro Miranda 1, Alexandre Lima 1, Patrick Valduriez 2,3, Marta Mattoso 1
Source: hal-archives-ouvertes.fr

1 COPPE/UFRJ – Universidade Federal do Rio de Janeiro
2 ZENITH – Scientific Data Management –
LIRMM – Laboratoire d’Informatique de Robotique et de Microélectronique de Montpellier, CRISAM – Inria Sophia Antipolis – Méditerranée
3 – IBC – Institut de Biologie Computationnelle

Laisser un commentaire