Friday, August 19, 2016

Some background for understanding testing the models


I received a comment from a reader last week (I have readers?  Wow.  I did very little "advertising").  Anyway, this person was a bit confused about modeling averages.  I can understand that - we all learn how to compute averages in junior high or so, and unless we took an advanced math class, we never were exposed to why the algorithm works or whether there are alternatives to the way we learned (add 'em all up and divide by the number of things you added).

I figured a little background reading might help.  Nothing too deep - I want to keep this blog away from research math and more accessible to everyone.

Averages are taken for granted nowadays but that has certainly not been the case always.  In fact, in some ways, they were controversial when first introduced.  And even "when they were first introduced" is a tough question to answer.  http://www.york.ac.uk/depts/maths/histstat/eisenhart.pdf is a good starting point for digging into that aspect of averages.

The controversial part is pretty easy to understand and we even joke about it a bit today.  "How many children on average does a family have?" is the standard question which leads to answers like 2.5.  Obviously, there are no "half kids" running around anywhere, and we tend to laugh off these silly results.  Coincidentally, the US believes the ideal number of kids is 2.9.  The controversy came in initially - what value is an average if there is no actual, real world instance of a result having this value?  In other words, what use would it to be to know that the average family has 2.5 children, yet no families have 2.5 children? 

The controversy here was directed at the people that computed averages.  They came up with a number - 2.5 in this example - that is impossible to have in the real world.  And if you try to tell me your algorithm is a good algorithm yet it gives me impossible results, then I have to doubt it is "good."

(We will come back to the standard phrase "All models are wrong.  Some are useful." later).

Floating point math is a more difficult concept to cover. 
I don't want to get into the details of why this is happening in this blog since there is a huge amount of writing on this already.  If you want details I found a couple of good starting points.
A reasonable introduction to this is on wikipedia:
A more hands on, technical overview:


Questions, comments, concerns and criticisms always welcome,
John

No comments:

Post a Comment