Monday, August 8, 2016

Setting the stage


The first thing I want to cover is what we test on the stats team.  A good starting point is that we validate the behavior of statistical models we implement.

For non-statisticians, the first question I get is "What is a statistical model? "  My definition to get started is pretty simple: a statistical model is a set of equations based on existing data that help make predictions about new data coming in. 

A very familiar example is a model to compute an average.  We probably all saw this in school.  You take 5 tests, add up all your scores and divide by 5.  That set our grade for the course.  But I added what may seem like a strange phrase to my definition that a model helps us make predictions about new data coming in.  Let's walk through that viewpoint since it is key to the definition.

In this case, suppose there is a 6th test we have to take.  And suppose my grades had been 88, 91, 83, 88 and 85.  They add up to 435.  I divide 435 by 5 and get an average of 87.  Not only is this my grade, it also is a prediction of what score I would make on the 6th test. 

Now, it obviously doesn't mean I would make an 87 exactly.  If that was the case, I wouldn't even need to take the test.  But if I made an 86, or 92 or even 81 I would not be surprised.  It can get interesting to investigate what happened if I made a 100 or a 53 and that will come up later.

So we have a statistical model at this point.  It computes an average given a set of numbers as an input.   It's our job as the test team to validate that the average is computed correctly in every possible case. Seems simple, maybe tedious, but once we bring computers into the equation (pun intended) things will start to get more complicated very, very, quickly.

I'll cover this simple model next.

Questions, comments, concerns and criticisms always welcome,
John

No comments:

Post a Comment