[O’R] BigDataFr recommends: More tools for managing and reproducing complex data projects

<div id="wp-socials-general-btn"></div><div style="clear:both"></div></div><p><strong>BigDataFr recommends: <a title="@radaroreilly.com - Ben Lorica - More tools for managing and reproducing complex data projects" href="http://radar.oreilly.com/2015/04/more-tools-for-managing-and-reproducing-complex-data-projects.html#more-76209" target="_blank">More tools for managing and reproducing complex data projects

« As data projects become complex and as data teams grow in size, individuals and organizations need tools to efficiently manage data projects. A while back, I wrote a post on common options, and I closed that piece by asking:

– Are there completely different ways of thinking about reproducibility, lineage, sharing, and collaboration in the data science and engineering context?

At the time, I listed categories that seemed to capture much of what I was seeing in practice: (proprietary) workbooks aimed at business analysts, sophisticated IDEs, notebooks (for mixing text, code, and graphics), and workflow tools. At a high level, these tools aspire to enable data teams to do the following:

– Reproduce their work — so they can rerun and/or audit when needed
– Collaborate
– Facilitate storytelling — because in many cases, it’s important to explain to others how results were derived
– Operationalize successful and well-tested pipelines — particularly when deploying to production is a long-term objective »

Read more
Ben Lorica, Chief Data Scientist & Director of Content Strategy for Data at O’Reilly Media, Inc
Source: radaroreilly.com

Laisser un commentaire