One of the tests we
have to complete is validating our implementations for accuracy.As I have mentioned, this can be tricky
simply because we are using computers and have to contend to with rounding
errors.Another reason this is hard is
simply the nature of statistics.
meteorologist.Given a set of weather
statistics - temperature, barometric pressure, wind speed, etc… - the
meteorologist can state "There is a 70% chance of rain tomorrow."
Tomorrow comes, and
it rains.Was the forecast correct?Of course - there was a 70% chance of
rain.Now suppose tomorrow arrives, and
it does NOT rain.Again, was the
forecast correct?Of course - there was
a 30% chance that it would not rain.Such is the nature of statistics, and that also hits us for some of our
Recently, we added a
algorithm to Tableau.The basic idea can
be viewed as "Given a set of data points like this, divide them into equal
In this case, I
tried to obviously draw three clusters, with about the same number of dots in
But what about this
Same number of
dots.Most people would probably want 2
groups, but that would probably look like this:
The group on the
right would have more dots, but visually this seems to make sense.Using three groups would give this:
Now they all have
the same number of dots, but the groups on the right are very close together.
The bigger question
to answer, just like the meteorologist faces, is "What is
correct?"And when predicting
situations like this, that is a very difficult question to answer.One of the challenges for the test team is
creating data sets like the dots above that can let us validate our algorithm
in cases which may not be as straightforward as some others.In the test world, these are "boundary
cases," and getting this data for our clustering testing was a task we
concerns and criticisms always welcome,
Let's take a break
from diving into rounding errors and take a larger scoped view of the testing
role.After all, computing a basic
statistic - such as the average - of a set of numbers is a well understood
problem in the statistical world.How
much value can testers add to this operation?Fair question.Let me try to
answer it with an analogy.
I used to repeat the
mantra I heard from many engineers in the software world that "testing is
insurance."This lead to the
question ," How much insurance do you want or need for your project?"
and that lead to discussions about the relative amount of funding a test
organization should have.The logic was
that if your project as a million dollar project, you would want to devote some
percentage of that million into testing as insurance that the project would
The first analogy I
want to draw is that insurance is a not a guarantee - or even influencer - of
success.Just because I have car
insurance does not mean I won't get into a wreck.Likewise, buying vacation insurance does not
guarantee the weather will allow my flight to travel to the destination.Insurance only helps when things go
wrong.The software engineering world
has a process for that circumstance called "Root Cause
Analysis."I'll go over that later,
but for now, think of it as the inspection team looking at the wreck trying to
figure out what happened to cause the crash.
That leads me to my
second analogy:Testing is like
defensive driving.Defensive driving
does not prevent the possibility of a crash.Instead, it lessens the chance that you will get into an accident.Hard stats are difficult to find, but most
insurance companies in the USA will give you a 10% reduction in your premiums
if you take such a class.Other estimate
range to up to a 50% decrease in the likelihood of a wreck, so I will use any
number you want between 10% and 50%.
How those results
are achieved are by teaching drivers to focus on the entire transportation
system around them.It is pretty easy to
get locked into only looking in one direction and then being surprised by
events that happen in another are (see where this analogy is going?).If I am concentrating on the road directly in
front of me, I may not notice a group of high speed drivers coming up behind me
until it is they are very close.Likewise, if I am only concentrated on accurate computing an average, I
may not notice that my code is not performant and may not work on a server that
is in use by many people.In both cases,
having a wider view will make my driving - or code development - much smoother.
More on this coming
Questions, comments, concerns and criticisms always welcome,
The general rule I want to follow for validation is that for big numbers, a
big epsilon is OK. For small numbers,
small epsilon is desirable.In other words, if we are looking at interstellar distances, and error of a few kilometers is probably acceptable, but for microscopic measuresments, a few micrometers may be more appropriate.
So my rule - open to
any interpretation - is "8 bits past the most precise point in the test
data."Let's look at a case where
we want a small epsilon - for example, we are dealing with precise decimal values.
Suppose we have
these data points for out test case:
The most precise
data point is that last one - 8.003.The
epsilon factor will be based off that.
8 bits of precision
means 1 / (2^8) or 1/256, which is approximately 1/256=0.0039.Let's call that .004.Append this to the precision of the last
digit of 8.003, which is the one thousandth place.I get 0.000004.This means anything that is within .000004 of
the exact answer will be considered correct.
So if we need to
average those three numbers:
1.1 + 2.7 + 8.003 =
The exact answer for
the average is impossible to compute accurately in this case.I still need to verify we get close to the
answer, so my routine to validate the result will look for the average to be in
3.934333 - 0.000004
3.934333 + 0.000004
So if the average we
compute is between3.934329and 3.934337 .
More on this, and
how it can be implemented to enforce even greater accuracy will come up later.
concerns and criticisms always welcome,