The first thing I
want to cover is what we test on the stats team. A good starting point is that we validate the
behavior of statistical models we implement.
For
non-statisticians, the first question I get is "What is a statistical
model? " My definition to get
started is pretty simple: a statistical model is a set of equations based on
existing data that help make predictions about new data coming in.
A very familiar
example is a model to compute an average.
We probably all saw this in school.
You take 5 tests, add up all your scores and divide by 5. That set our grade for the course. But I added what may seem like a strange
phrase to my definition that a model helps us make predictions about new data
coming in. Let's walk through that
viewpoint since it is key to the definition.
In this case,
suppose there is a 6th test we have to take.
And suppose my grades had been 88, 91, 83, 88 and 85. They add up to 435. I divide 435 by 5 and get an average of
87. Not only is this my grade, it also
is a prediction of what score I would make on the 6th test.
Now, it obviously
doesn't mean I would make an 87 exactly.
If that was the case, I wouldn't even need to take the test. But if I made an 86, or 92 or even 81 I would
not be surprised. It can get interesting
to investigate what happened if I made a 100 or a 53 and that will come up
later.
So we have a
statistical model at this point. It
computes an average given a set of numbers as an input. It's our job as the test team to validate
that the average is computed correctly in every possible case. Seems simple,
maybe tedious, but once we bring computers into the equation (pun intended)
things will start to get more complicated very, very, quickly.
I'll cover this
simple model next.
Questions, comments,
concerns and criticisms always welcome,
John
No comments:
Post a Comment