One other note for today. Thinking about
models is a skill in itself. I took this
class online a few years ago and highly recommend it: https://www.coursera.org/learn/model-thinking It's free if you want and there is a pay
option also. It starts Aug 8, 2016 so
you may want to hop over there and check it out. It is not math or computer heavy - instead, it is an introduction to why we have models, how to use them and think about problems from a modeling point of view.
Questions, comments, concerns and criticisms always welcome,
John
Thursday, August 11, 2016
Creating the model
Last time I gave an
example of a model to compute an average.
The method I used to compute the average was probably the most familiar
method there is - basically, add up all the number and divide by the count of
the numbers that were added.
This works well on
paper but can lead to problems on computers.
One error that can creep in is adding up to a number bigger than the
computer can handle. Humans with pencil
and paper can always make numbers bigger (just add more paper) but a computer
will have the maximum number it can handle.
32 bit computers are
still popular and for this next bit let's assume the largest integer it can
handle is 4,294,967,294. This is the number 2 raised to the 32nd power (2
because the machine is binary, and 32 from the 32 bit processor). Four billion is a large number, but if we are
trying to figure out average sales for a really large business, it won't work.
When we try to add numbers that total to more than 4 billion, the computer will
not know what to do - it can't count that high.
This isn't new: this
is a very old problem with computers. No
matter how you do it, there is a number that will be bigger than the computer
can handle. Just fill up all the computer
memory with whatever number you want (all 1s, since it is binary) and then add
1 more to it. The computer doesn't have
enough memory to hold that new number.
So how can we deal
with this potential problem?
One way is to change
our algorithm. Our original algorithm is
this:
- Add all the numbers in the original list
- Divide by the count of the numbers in the original list
- The answer I get is the result I want
But we can crash at
step 1.
One workaround is to
look really hard at what the algorithm does.
Our example had 5 grades, so we add them up and divide the total by
5. What if we divided each number by 5
to begin with, and then added up those results?
Example:
88/5=17.6
91/5=18.2
83/5=16.6
88/5=17.6
85/5=17
And
17.6+18.2+16.6+17.6+17=87 This was the
output of the original algorithm, so this new algorithm looks like it might be
useful.
So the new algorithm
would be:
- Count the number of items in the list
- Divide the first number by that count and add the result to a running total
- Divide the next number in the list by that count and add the result to the running total
- Repeat step 3 until all the items in the list have been processed
Seems reasonable and
next up we will give this a try.
Questions, comments,
concerns and criticisms always welcome,
John
Monday, August 8, 2016
Setting the stage
The first thing I
want to cover is what we test on the stats team. A good starting point is that we validate the
behavior of statistical models we implement.
For
non-statisticians, the first question I get is "What is a statistical
model? " My definition to get
started is pretty simple: a statistical model is a set of equations based on
existing data that help make predictions about new data coming in.
A very familiar
example is a model to compute an average.
We probably all saw this in school.
You take 5 tests, add up all your scores and divide by 5. That set our grade for the course. But I added what may seem like a strange
phrase to my definition that a model helps us make predictions about new data
coming in. Let's walk through that
viewpoint since it is key to the definition.
In this case,
suppose there is a 6th test we have to take.
And suppose my grades had been 88, 91, 83, 88 and 85. They add up to 435. I divide 435 by 5 and get an average of
87. Not only is this my grade, it also
is a prediction of what score I would make on the 6th test.
Now, it obviously
doesn't mean I would make an 87 exactly.
If that was the case, I wouldn't even need to take the test. But if I made an 86, or 92 or even 81 I would
not be surprised. It can get interesting
to investigate what happened if I made a 100 or a 53 and that will come up
later.
So we have a
statistical model at this point. It
computes an average given a set of numbers as an input. It's our job as the test team to validate
that the average is computed correctly in every possible case. Seems simple,
maybe tedious, but once we bring computers into the equation (pun intended)
things will start to get more complicated very, very, quickly.
I'll cover this
simple model next.
Questions, comments,
concerns and criticisms always welcome,
John
Monday, August 1, 2016
Welcome to Extraordinary Squares!
Welcome to my new
blog about testing at Tableau!
Tableau is a company
that helps you see and understand your data.
You can read all about us at www.tableau.com,
get a free version of Tableau and create an account there.
I work on the
Statistics team here. Right off the bat,
I was a little confused about this team.
Internally, I was thinking that "All of Tableau is about
statistics, so how can there be just one team for that?" Now that I have been awhile, I see all that
we do, from connecting to databases (to get data for statistical analysis), to
loading the data (not at all easy), to drawing maps, creating a web client,
hosting servers, etc.. Etc.. Etc.. There
is a lot to Tableau!
The goal of this
blog is to roughly mimic my former OneNote
Testing blog and talk about the challenges the test team here faces, how we
address them and work to ensure Tableau is the best in class application
available. I may (probably will) have an
occasional tip or two as well.
One regret I had
with my former blog was the name: specifically, the URL I used. I wish I had not had my name in the URL. So this time, I went for what I think is a
cool and slightly punny name: Extraordinary Squares. Ordinary squares is a statistical test to
determine if a line fits data (more on that later) and Extra-ordinary because,
well, we are extraordinary!
Squares is also a
pun. I was at a meeting with some of the
people on my team and someone mentioned we were changing a ratio from 13:8 to
14:8. Someone else said, "That messes
up everything."
I said, "Yeah, that even messes up our Fibonacci Golden Ratio."
Everyone in the room laughed.
I said, "Yeah, that even messes up our Fibonacci Golden Ratio."
Everyone in the room laughed.
So yeah, squares.
Let's get started
testing Tableau!
Questions, comments,
concerns and criticisms always welcome,
John
Subscribe to:
Posts (Atom)