Friday, July 21, 2017

Paying down test debt, continued

Last week I mentioned that some old test automation breaks while it is disabled.

As an example, suppose I added a test to check for the 22 Franch regions being labelled properly back in 2014.  It works for a year, but then France announces it will consolidate its regions in January 2016.  While working on that change, I disable my test since I know it won't provide any value while the rest of the changes are in progess.

Then I forget to turn the test back on and don't notice that until after the change.

In this case, the fix is straightforward.  I change my test to account for the real world changes that happened while it was disabled.  In this case, I take out the list of the 22 regions and replace that list with the 14 new ones. 

This pattern - the code being tested changes while the test is disabled - is common.  In almost all cases, simply changing the test to account for the new expected behavior is all that needs to be done to enable the test.  So I typically make that change, enable the test, run it  a few thousand times and if it passes, leave it enabled as part of the build system moving forward.

Sometimes the tests are more complicated that I know how to fix.  In that case, I contact the team that owns the test and hand off the work to enable it to them.

All in all, this is a simple case to handle. 

There is also the case that the test is no longer valid.  Think of a test that validated Tableau worked on Windows Vista.  Vista is no longer around, so that test can simply be deleted.

Other factors can change as well, and I'll wrap this up next week.

Questions, comments, concerns and criticisms always welcome,

Wednesday, July 12, 2017

Paying down test debt

Another aspect of my work recently has been paying down technical debt we built over the years.  An example of technical debt would be this:
  1. Imagine we are building an application that can compute miles per gallon your car gets
  2. We  create the algorithm to compute miles per gallon
    1. We add tests to make sure it works
    2. We ship it
  3. Then we are a hit in the USA!  Yay!
  4. But the rest of the world wants liters per 100 kilometers. 
  5. We add that feature
    1. As we add it, we realize we need to change our existing code that only knows about miles
    2. We figure it will take a week to do this
    3. During this week, we disable the tests that test the code for "miles"
    4. We finish the liters per 100km code
    5. We check in
  6. We ship and the whole world is happy

But look back at step 5c.  The tests for miles (whatever they were) were disabled and we never turned them back on.  We call this "technical debt" or, in this case since we know it is test related, "test debt."  It happens when we take shortcuts like 5c - disabling a test.  I'll just point out a better practice would have been to ensure every bit of new code we wrote for the metric values should never have broken the MPG code, and the test should never have been disabled.  In the real world, the most likely reason to do this would be for speed - I simply want to test my new code quickly and don't want to run all the tests over the old code I am not changing, so I disable the old tests for right now and will re-enable them when I am done.  (Or so I say...)

So one other task I have taken on is identifying these tests that are in this state.  Fortunately, there are not many of them but every so often they slip through the process and wind up being disabled for far longer than what we anticipated.  Turning them back on is usually easy.  Every so often, an older test won't pass nowadays because so much code has changed while it was disabled.

What to do in those cases is a little trickier and I will cover that next.

Questions, comments, concerns and criticisms always welcome,

Thursday, July 6, 2017

Using the tool I wrote last week to start making changes

I finished my tool to look through a large set of our test code to classify our tests with respect to who owns them, when they run and other attributes like that.  My first use of this was to find "dead" tests - tests that never run, provide no validation or otherwise are left in the system for some reason.  I want to give a sense of scale for how big this type of challenge is.

After looking through just over 1000 tests, I identified 15 that appeared they may be dead.  Closer examination of those tests took about 1/2 a day and determined that 8 of them are actually in use.  This revealed a hole in my tool - there was an attribute I forgot to check.

One of the tests was actually valid and had simply been mis-tagged.  I reenabled that test and it is now running again and providing validation that nothing has broken.

The other 6 tests were a bit more challenging.  I had to look at each test then look at lab results to see if anyone was actually still running them, dig through each test to see what was the expected result and so on.  In most cases, I had to go to the person that wrote the test - in 2 instances, almost 10 years ago - to see if the tests could be removed.  It might seem trivial to track 6 files out of 1000+ but this will save us build time for every build and maintenance costs over the years as well as leaving a slightly cleaner test code base.

In 4 of the cases, the tests can be removed and I have removed them.  In the USA, this is a holiday week for us so I am waiting on some folks to get back in the office next week to follow up on the last 2 tests. 

This is all incremental steps to squaring away our test code.

Questions, comments, criticisms and complaints always welcome,

Tuesday, June 27, 2017

Working on a tool for a hackathon

We have a culture of regular time devoted to hackathons.  We can work on what we know is important or fun or challenging - we have free reign to take on the projects that we are motivated to complete.

For my project, I am working on classifying some of our test code.  What I have to do specifically is parse through each test file looking for attributes in the code.  The goal here is to make a tool that does this so I never have to do it again.

I've been working on this for a day now, and I am reminded why I want this tool.  I have written TONS of code that opens a text (type) file and goes through it line by line.  It is always tedious, slow and I always have to look up how to do these basic tasks on the internet.  Since I only do this once every year or so, I forget the exact syntax to use and need a continual refresher.

But I got my tool done yesterday and am making sure it works today.  Then I want to move it from a command line to a nice visualization that I can monitor for changes…

Questions, comments, criticisms and complaints always welcome,

Friday, June 16, 2017

Test names that make sense

One of the tasks developers have is adding a test when making code change.  That is just good engineering practice - you always want to make sure your code works, and then when I make a change, I want to test that I did not break your code.  It's pretty self-explanatory, really.

The trick here is that when someone fixes a bug report.  Bug reports are tracked by number, so I may be working on bug 1234 today.  When I get a fix in place, I need to add a test.  Now, when I add the test, I need to give the test a name.

One easy way to name the test is naming it after the bug number being fixed, like this:

That makes it possible for anyone else that needs to look at this code to know to check the bug database for details around bug 1234.  I chose the word "possible" there specifically because while it is possible to do this, it is time consuming.  I have to switch from my IDE (I use Visual Studio) to the bug tool and dig up the bug report. 

Now imagine if I had named that test this instead:

Now if I am reading that code, or investigating a failure, I have a MUCH better starting point.  I know that the test I potentially broke had to with French regions and maps.  If I am changing map code, I am very interested in what I might have broken and know where to start my investigation much more quickly.  I don't have to switch out of my IDE to get this data and it saves me a little bit of time overall.

So while I am going through tests, I am not renaming the old format with a bit of descriptive text.  The next challenge I might take on is trying to quantify how much time I am saving overall.

Questions, comments, concerns and criticisms always welcome,

Tuesday, June 6, 2017

Hardware at Tableau

I just noticed that I was working in a remote office today and logged into my primary desktop from that office.  I also realized I never documented the great hardware we use at Tableau.

It may not seem special, but all the developers here get at least 2 machines: one Windows and one Mac.  We need this since our 2 primary desktop clients both need coverage.

I chose a Windows desktop and that is what I use for email and such as well as writing code for Tableau.  It's a state of the art 16 core (or 4 depending on how you count hyperthreads) 32GB desktop.  I also have 2 monitors on my desk - a 24" HD monitor and a 22" 4K monitor.  I have learned to rely on multiple monitors since way back in 1998 and can't imagine working with only one.  Brrrr.

Since I run Windows 10 on my desktop, I got a Mac laptop for portable usage.  Nothing special here - 16GB Ram and whatever processor they were using last year (I have never checked).  I use it for note taking in meetings and general office type usage.  If I need to write code or debug or whatever, I will remote into my desktop.

And finally, the docking station I have in the remote office is even better.  It has 2 monitors and I can use the laptop as a third monitor.  In effect, I get a three monitor setup when I work remotely and that is tremendously handy.  I put Tableau on one monitor, my debugger/Visual Studio/Pycharm on the second and email/chat clients/reference notes/OneNote on the third.  It really speeds me up and is a nice perk when I can't get into my main office.

Questions, comments, concerns and criticisms always welcome,

Thursday, June 1, 2017

An upcoming side project for the test team

We voted this week to dedicate an upcoming sprint to focus on becoming more efficient as a team rather than focus on any given new functionality for Tableau.  The thinking here is that if we become 10% more efficient, we can deliver 10% more features in a given release over time, so this small investment now will pay large dividends in the future.

The test team chose to work on analyzing automation results.  For instance, if a given test is known to fail some large percentage of the time - let's say 99.99% for sake of argument - then if it fails tonight I might not need to make investigating it the highest priority task on my plate tomorrow.  Similarly, a test that has never failed and fails tonight might very well become my most important task tomorrow.

So what we are doing in our first steps is determining the failure rate of every single test we have.  Just tying together all that data - years worth, times several thousand tests, times multiple runs per day, et… - is a large challenge.  Then we have to mine the data for the reason for each failure.  If the failure was due to a product bug, then we need to factor out that failure from computing how often each test intermittently failed.

The data mining and computation for all of this seems like a good, achievable goal for one sprint.  Using that data in a meaningful way will be the (obvious) follow on project.

Wish us luck!

Questions, comments, concerns and criticisms always welcome,