Extraordinary Squares: So what can we do about this?

Here is the challenge the test team has. Given that even a simple algorithm can have unexpected results, and given that a computer cannot - by its nature - give exact results, how can we validate the output of a given model?

The answer is both simple and difficult.

On the simple side, we do exactly what you would expect. Knowing that an expected output value should be (exactly) 50, we can set up an error factor of some small amount and validate the value of the output is within that range of the expected output.

In other words, we can set a value of error equal to .00001, for instance. Then if the output of the algorithm is within .00001 of 50, we can log a PASS result for the test. If it is more or less than that range (50 +/- .00001) then we log a failure and investigate what happened.

That's the simple part. In fact, having a small error factor is pretty standard practice. If you want to read a bit more about this, just look up how computers determine the square root of a number. This technique is usually taught during the first week or so of a computer science course. (And then it is hardly mentioned again, since it is such a rabbit hole. Then it becomes a real consideration in jobs like this one).

The hard part is knowing how to set a reasonable range. Obviously, a very large range will allow bogus results to be treated as passing results. If we are trying to compute the average of 1-99 and allow a "correct" answer to be +/- 10 from from 50 (45 to 55), the test will always pass. But everyone will notice that 54 or 47 or whatever else is not correct.

And if we make it too small - like .0000000000000001 (that is a 1 at the 16th decimal place), then the test will likely fail as we change the range to compute due to expected rounding errors.

This is a large challenge for us and I'll outline what we do to handle this next.

Questions, comments, concerns and criticisms always welcome,

John

Extraordinary Squares

Thursday, August 25, 2016

So what can we do about this?

No comments:

Post a Comment