HomeНаука и техникаRelated VideosMore From: O'Reilly

John Rauser keynote: "Statistics Without the Agonizing Pain" -- Strata + Hadoop 2014

523 ratings | 28565 views
From the 2014 Strata Conference + Hadoop World in New York City. There are two essential skills for the data scientist: engineering and statistics. A great many data scientists are very strong engineers but feel like impostors when it comes to statistics. In this talk John will argue that the ability to program a computer gives you special access to the deepest and most fundamental ideas in statistics. John’s goal is to convince the non-statistician engineers in the audience that the road to statistical fluency is much, much shorter than they think. About John Rauser: John has been extracting value from large datasets for over 20 years at hedge funds, small data-driven startups, Amazon, and now Pinterest. He has deep experience in machine learning, data visualization, on-line experimentation, website performance and real-time fault analysis. An empiricist at heart, “Just do the experiment!” is his favorite call to arms. Watch more from Strata + Hadoop World 2014: http://goo.gl/UUfrR7 Find out more about the conference: http://strataconf.com/stratany2014 Don't miss an upload! Subscribe! http://goo.gl/szEauh Stay Connected to O'Reilly Media by Email - http://goo.gl/YZSWbO Follow O'Reilly Media: http://plus.google.com/+oreillymedia https://www.facebook.com/OReilly https://twitter.com/OReillyMedia
Html code for embedding videos on your blog
Text Comments (18)
Chris Stehlik (1 month ago)
Instead of consulting wikipedia, you could just spend an hour learning about t-tests at Khan Academy.
Troy Bradley (7 months ago)
How do you get the mosquitoes to drink the beer?
Kely R (2 years ago)
Hi all. Anyone can telle the name of the electronics tinkering toy of the girl at 11:17?
Pradeep Samuel (2 years ago)
Robert Lucente (4 years ago)
I just watched your "Statistics Without the Agonizing Pain." Simply awesome. You are probably aware but just in case of the books Think Stats Think Bayes by Allen B. Downey which uses Python programming to get stats ideas across. They are available for free at GreenTeaPress.Com.
Daniel Torrido Verde (4 years ago)
There is something that is not clearly said here. You don't need to remember  the formula for the t-test, that a use for a computer. What is really important is the concept of density function and distribution function. In ten minutes or so one can explain those concepts and they will be applicable to many type of tests.  Understanding the gist of what is an statistical test and how to interpreter the outcome is not difficult and you don't have to remember difficult formulas.  That is what you see here is a straw man argument.  If you only want to know the essence of statistical test, you can grasp it easily and one way to understand that is that formulas are like simulations.  You should ask any statistician if you really what to use the formula that he show here, the answer is no way, you just use your computer but when you have the knowledge of the fundamental concept of density function, distribution function and sample mean all is more clear.  Anyway, I understand that simulation is a good way of understanding problems and an easy way of testing difficult problems but in no way are difficult the essential concepts involved in a test for the mean.
Robert Mckeown (4 years ago)
I love this talk - spot on. Indeed solid understanding of the real stats behind things is highly desirable, but in terms of getting a better sense of the problem under consideration, this is a great approach for those with decent programming skills
Robert Mckeown (4 years ago)
Here is a simple example of me following the approach recently - http://bit.ly/1wCcAnJ
Dave C (4 years ago)
I would add that Permutation testing (resampling without replacement) is more suited to smaller samples and bootstrapping (resampling with replacement) is suitable for larger samples. Ronald Fisher used permutation testing logic to support his argument for the t-test. He urged people use the t-test because there were no devices available back then that could create the resamples needed to run a test. (Collingridge, D.S. (2012). A primer on quantitized data analysis and permutation testing. Journal of Mixed Method Research)
yoplaitless (4 years ago)
Good presentation but it also shows that simulation without statistical understanding may be dangereous. Contrary to what is suggested in the video, the null hypotheses of the t-test and a random permutation test are not equivalent. In the former test, the null hypothesis is "the average treatment effect is zero". In the latter case, the null hypothesis is "the effect is zero for every individual".
John Rauser (4 years ago)
+Bill Venables To clarify, I think mathematical statistics is beautiful and useful, it's just a terrible way to introduce statistical thinking (given modern computational options).  I had only 10 minutes, and so the talk had to stay totally on the rails, otherwise I'd have expanded on the origins of the analytical approach, why it was invented, and why it is still useful.  For more on this topic see George Cobb's lovely paper, The Introductory Statistics Course: A Ptolemaic Curriculum: https://escholarship.org/uc/item/6hb3k0nz.
Moi (2 years ago)
Cobb's paper is excellent. It goes well with your keynote. It should be noted, though, that many of the problems we have in statistics education are not unique to statistics, and can be found in many disciplines, courses and textbooks across academia. I believe this will change with time, but as we all know, change happens very slowly in academia.
Bill Venables (4 years ago)
You got their attention and raised the profile of statistics with an audience that probably needs to know much more about it than they currently do.  That's an achievement I applaud.  Many, if not most modern approaches to Statistics would use your route, actually.  You do see my point, though, I hope.  You can't really let it stop there.   You need to understand what is going on as well as to appreciate it from a computational demonstration.  And let's face it, it really isn't all that tough to do so once you know where you are going. My point about Bayesian approaches are that even to get to first base you need the concept of likelihood, which unless you have a theoretical understanding, really remains inaccessible.
Dave Jacoby (4 years ago)
I've been saying for years that, when I was in CS, they changed the curriculum to make Statistics an elective, and I felt at the time I was dodging a bullet, but now I feel I shot myself in the foot. This video makes me think I'm more in the dodge category again.
Dave Jacoby (4 years ago)
I should try it.
Thomas Melville (4 years ago)
I had stats my senior year in undergrad. And it was awful. I cried- literally. But you are a coder. I think you would do well in it.
Dave Jacoby (4 years ago)
Yeah, and his point is that there's a bunch of complexity in traditional statistics that we can sidestep and work with in ways we as programmers can more easily understand. 
Mark Senn (4 years ago)
From your tagline at https://plus.google.com/+DaveJacoby/about: "If you can not measure it, you can not improve it.".  Statistics is a tool to, among other things, intrepret what measurements really mean.

Would you like to comment?

Join YouTube for a free account, or sign in if you are already a member.