Thursday, August 11, 2016

A good modeling class

One other note for today.  Thinking about models is a skill in itself.  I took this class online a few years ago and highly recommend it: https://www.coursera.org/learn/model-thinking  It's free if you want and there is a pay option also.  It starts Aug 8, 2016 so you may want to hop over there and check it out.  It is not math or computer heavy - instead, it is an introduction to why we have models, how to use them and think about problems from a modeling point of view.

Questions, comments, concerns and criticisms always welcome,

John

Creating the model


Last time I gave an example of a model to compute an average.  The method I used to compute the average was probably the most familiar method there is - basically, add up all the number and divide by the count of the numbers that were added.

This works well on paper but can lead to problems on computers.  One error that can creep in is adding up to a number bigger than the computer can handle.  Humans with pencil and paper can always make numbers bigger (just add more paper) but a computer will have the maximum number it can handle.

32 bit computers are still popular and for this next bit let's assume the largest integer it can handle is 4,294,967,294. This is the number 2 raised to the 32nd power (2 because the machine is binary, and 32 from the 32 bit processor).  Four billion is a large number, but if we are trying to figure out average sales for a really large business, it won't work. When we try to add numbers that total to more than 4 billion, the computer will not know what to do - it can't count that high.

This isn't new: this is a very old problem with computers.  No matter how you do it, there is a number that will be bigger than the computer can handle.  Just fill up all the computer memory with whatever number you want (all 1s, since it is binary) and then add 1 more to it.  The computer doesn't have enough memory to hold that new number. 

So how can we deal with this potential problem?

One way is to change our algorithm.  Our original algorithm is this:
  1. Add all the numbers in the original list
  2. Divide by the count of the numbers in the original list
  3. The answer I get is the result I want

But we can crash at step 1.

One workaround is to look really hard at what the algorithm does.  Our example had 5 grades, so we add them up and divide the total by 5.  What if we divided each number by 5 to begin with, and then added up those results?

Example:
88/5=17.6
91/5=18.2
83/5=16.6
88/5=17.6 
85/5=17

And 17.6+18.2+16.6+17.6+17=87   This was the output of the original algorithm, so this new algorithm looks like it might be useful.

So the new algorithm would be:
  1. Count the number of items in the list
  2. Divide the first number by that count and add the result to a running total
  3. Divide the next number in the list by that count and add the result to the running total
  4. Repeat step 3 until all the items in the list have been processed

Seems reasonable and next up we will give this a try.

Questions, comments, concerns and criticisms always welcome,
John

Monday, August 8, 2016

Setting the stage


The first thing I want to cover is what we test on the stats team.  A good starting point is that we validate the behavior of statistical models we implement.

For non-statisticians, the first question I get is "What is a statistical model? "  My definition to get started is pretty simple: a statistical model is a set of equations based on existing data that help make predictions about new data coming in. 

A very familiar example is a model to compute an average.  We probably all saw this in school.  You take 5 tests, add up all your scores and divide by 5.  That set our grade for the course.  But I added what may seem like a strange phrase to my definition that a model helps us make predictions about new data coming in.  Let's walk through that viewpoint since it is key to the definition.

In this case, suppose there is a 6th test we have to take.  And suppose my grades had been 88, 91, 83, 88 and 85.  They add up to 435.  I divide 435 by 5 and get an average of 87.  Not only is this my grade, it also is a prediction of what score I would make on the 6th test. 

Now, it obviously doesn't mean I would make an 87 exactly.  If that was the case, I wouldn't even need to take the test.  But if I made an 86, or 92 or even 81 I would not be surprised.  It can get interesting to investigate what happened if I made a 100 or a 53 and that will come up later.

So we have a statistical model at this point.  It computes an average given a set of numbers as an input.   It's our job as the test team to validate that the average is computed correctly in every possible case. Seems simple, maybe tedious, but once we bring computers into the equation (pun intended) things will start to get more complicated very, very, quickly.

I'll cover this simple model next.

Questions, comments, concerns and criticisms always welcome,
John

Monday, August 1, 2016

Welcome to Extraordinary Squares!

Welcome to my new blog about testing at Tableau!

Tableau is a company that helps you see and understand your data.  You can read all about us at www.tableau.com, get a free version of Tableau and create an account there.

I work on the Statistics team here.  Right off the bat, I was a little confused about this team.  Internally, I was thinking that "All of Tableau is about statistics, so how can there be just one team for that?"  Now that I have been awhile, I see all that we do, from connecting to databases (to get data for statistical analysis), to loading the data (not at all easy), to drawing maps, creating a web client, hosting servers, etc.. Etc.. Etc..  There is a lot to Tableau!

The goal of this blog is to roughly mimic my former OneNote Testing blog and talk about the challenges the test team here faces, how we address them and work to ensure Tableau is the best in class application available.  I may (probably will) have an occasional tip or two as well.

One regret I had with my former blog was the name: specifically, the URL I used.  I wish I had not had my name in the URL.  So this time, I went for what I think is a cool and slightly punny name: Extraordinary Squares.  Ordinary squares is a statistical test to determine if a line fits data (more on that later) and Extra-ordinary because, well, we are extraordinary! 

Squares is also a pun.  I was at a meeting with some of the people on my team and someone mentioned we were changing a ratio from 13:8 to 14:8.  Someone else said, "That messes up everything."
I said, "Yeah, that even messes up our Fibonacci Golden Ratio."
Everyone in the room laughed.

So yeah, squares.

Let's get started testing Tableau!

Questions, comments, concerns and criticisms always welcome,

John