Here is the
challenge the test team has.Given that
even a simple algorithm can have unexpected results, and given that a computer
cannot - by its nature - give exact results, how can we validate the output of
a given model?
The answer is both
simple and difficult.
On the simple side,
we do exactly what you would expect.Knowing that an expected output value should be (exactly) 50, we can set
up an error factor of some small amount and validate the value of the output is
within that range of the expected output.
In other words, we
can set a value of error equal to .00001, for instance.Then if the output of the algorithm is within
.00001 of 50, we can log a PASS result for the test.If it is more or less than that range (50 +/-
.00001) then we log a failure and investigate what happened.
That's the simple
part.In fact, having a small error
factor is pretty standard practice.If
you want to read a bit more about this, just look up how computers determine
the square root of a number.This
technique is usually taught during the first week or so of a computer science
course. (And then it is hardly mentioned again, since it is such a rabbit hole.Then it becomes a real consideration in jobs
like this one).
The hard part is
knowing how to set a reasonable range.Obviously, a very large range will allow bogus results to be treated as
passing results.If we are trying to
compute the average of 1-99 and allow a "correct" answer to be +/- 10
from from 50 (45 to 55), the test will always pass.But everyone will notice that 54 or 47 or
whatever else is not correct.
And if we make it
too small - like .0000000000000001 (that is a 1 at the 16th decimal place),
then the test will likely fail as we change the range to compute due to
expected rounding errors.
This is a large
challenge for us and I'll outline what we do to handle this next.
Questions, comments, concerns and criticisms always welcome,