[O’R] BigDataFr recommends: More tools for managing and reproducing complex data projects

More tools for managing and reproducing complex data projects

« As data projects become complex and as data teams grow in size, individuals and organizations need tools to efficiently manage data projects. A while back, I wrote a post on common options, and I closed that piece by asking:

– Are there completely different ways of thinking about reproducibility, lineage, sharing, and collaboration in the data science and engineering context?

At the time, I listed categories that seemed to capture much of what I was seeing in practice: (proprietary) workbooks aimed at business analysts, sophisticated IDEs, notebooks (for mixing text, code, and graphics), and workflow tools. At a high level, these tools aspire to enable data teams to do the following:

– Reproduce their work — so they can rerun and/or audit when needed
– Collaborate
– Facilitate storytelling — because in many cases, it’s important to explain to others how results were derived
– Operationalize successful and well-tested pipelines — particularly when deploying to production is a long-term objective »

Read more
Ben Lorica, Chief Data Scientist & Director of Content Strategy for Data at O’Reilly Media, Inc
Source: radaroreilly.com

Laisser un commentaire