Common Probability Distributions: The Data Scientist’s Crib Sheet

BigDataFr recommends: Common Probability Distributions: The Data Scientist’s Crib Sheet

Data scientists have hundreds of probability distributions from which to choose. Where to start?

[…] Data science, whatever it may be, remains a big deal. “A data scientist is better at statistics than any software engineer,” you may overhear a pundit say, at your local tech get-togethers and hackathons. The applied mathematicians have their revenge, because statistics hasn’t been this talked-about since the roaring 20s. They have their own legitimizing Venn diagram of which people don’t make fun. Suddenly it’s you, the engineer, left out of the chat about confidence intervals instead of tutting at the analysts who have never heard of the Apache Bikeshed project for distributed comment formatting. To fit in, to be the life and soul of that party again, you need a crash course in stats. Not enough to get it right, but enough to sound like you could, by making basic observations.

Probability distributions are fundamental to statistics, just like data structures are to computer science. They’re the place to start studying if you mean to talk like a data scientist. You can sometimes get away with simple analysis using R or scikit-learn without quite understanding distributions, just like you can manage a Java program without understanding hash functions. But it would soon end in tears, bugs, bogus results, or worse: sighs and eye-rolling from stats majors.

There are hundreds of probability distributions, some sounding like monsters from medieval legend like the Muth or Lomax. Only about 15 distributions turn up consistently in practice though. What are they, and what clever insights about each of them should you memorize? […]

Read more
By Sean Owen
Source: blog.cloudera.com

[Cloudera] BigDataFr recommends: Common Probability Distributions: The Data Scientist’s Crib Sheet

Related Posts:

Laisser un commentaire Annuler la réponse