Monday, December 11, 2017

Integration Testing, Part 2


The overall goal of software testing is to validate the software we ship works as described.  Integration testing is a key part of that and I want to continue with a very simple "test."



I want to continue with the muffler and engine analogy from my last post.  If the engine is designed correctly, it has a specification document for it.  That document would list various data and specifications about the engine, such as horsepower, the type of fuel needed and so on.  This document will also list how much fuel the engine burns at a given speed it is running and based on that, it will say how many gallons/liters of exhaust it generates per second at each given engine speed.



Let's say it generates 1 unit of exhaust at idle, 3 units at half speed and 12 (wow!  You are really flooring it!) at full speed per second.



It is our job to test that the muffler we use can process that much exhaust.  The "test" here is simple.  We look at the data sheet for the muffler and see if it can process up to 12 units of exhaust per second.  If that answer is no, we don't need to set up an engine and muffler and measure the exhaust.  We can simply say this will not work and we need to make a change (to either the engine or the muffler), or select a different engine and muffler combination.  Easy enough, but this simple test of reading the documentation is missed often enough that I wanted to call it out as a simple first step to take.



This also supposes that the documentation is accurate.  That is not always the case in software since the underlying code can change at any point and updating documentation sometimes lags.  In the mechanical engineering world, though, that does not happen as often.



More on this muffler and engine next time.



Questions, comments, concerns and criticisms always welcome,

John

Tuesday, November 28, 2017

Integration Testing, Part 1


Integration tests are the tests we use to validate that 2 or more software modules work together. 

Let me give an example by analogy.  Suppose you have a car engine and you know it works (for whatever definition of "work" you want to use).  I have a muffler, and it also works, again, using whatever definition of "works" you want to use.

Now suppose you are asked "Will the engine you make work with my muffler?"

Each component works, but how can we tell if they will work together?

Integration testing is the key here.  We know that each component works by itself, but there are no guarantees that they will work together.

For instance, one test case will be that the size of the hole for the exhaust from the engine is the same size as the muffler pipe (to speak broadly).  If the engine has a 5 inch exhaust and the muffler is only 3 inches wide, we have a mismatch and they won't work together.

A second case, assuming the first passes, is connecting the 2 components.  Even if the size of the exhaust is correct, if you use metric bolts and I don't, we are in a failing state again.

In fact, there will be many more test cases for this.  Materials construction (some metals don't interact well with others), weight considerations, stress cases (handling backfiring, for instance) and many, many more. 

The same mentality applies to software testing and I will go deeper into that next time.

Until then, questions, comments, concerns and criticisms always welcome,
John

Monday, November 20, 2017

Updating some data about our data for our team


One of the techniques we use to develop new features (which I cannot talk about) is wrapping the new code behind what we call a Feature Flag.  A feature flag is just a variable that we set OFF while we are working on features to keep that code from running until we are ready to turn it ON.  This is relatively basic engineering and there really is not anything special about it.

As a related note, many Windows applications use a registry key to turn features on or off.  We use a text file here with other data about the flag stored in it. For example, we not only store the name of the flag, but the name of the team that owns it, a short description of what it is for and when we expect to be done.

In some cases, those dates can be wrong or the name of the flag needs to be changed to make its purpose more clear.  For instance, imagine this contrived example. A flag named "tooltip" is not all that useful, but a flag named "ShowAdvancedAnalyticsTooltipsForTheWebClients" is a bit more explanatory.  And if work finishes early or lags behind estimated dates, those dates can change as well.

So this week I expect to update these values (metadata) for our flags.  This is very low priority work but with US holidays approaching, now seems like the best time to get this knocked off the to do list.

Questions, comments, concerns and criticisms always welcome,
John

Wednesday, October 25, 2017

Don't bother porting dead code


I have a task on me to move (we call it "port") some test code from one location to another.  The details are not interesting, but it did involve moving a test "Shape" object and a "Text" object.

The text object and the shape object both inherited from the same parent class, included the same set of header files, were co-located in the same directory within the project I wanted to modify and were otherwise similar in structure.  For a variety of reasons, though, the text object could be moved without much effort at all.  The shape object proved far more difficult.

At first, the compiler complained it could not find the header file for shape.h.  That was a little tricky, but it boiled down to having 2 files named shape.h in the project, and the path to the file I wanted was not specified correctly.  Fixing that caused the shape object not to be able to inherit from its parent class.

And thus began about 2 weeks of trying to get the darn code to build.  I would find and fix one problem only to get to the next error.  This is not unusual - we call it peeling the onion - but it is time consuming. 

For my needs, this is a medium priority task at best, so I wasn't working on it full time.  Just when I could fit it into my schedule for an hour or so.  I started with 27 build errors, built that up to about 100, then whittled it all down to 2.

But at this point I was 2 weeks into build files, linking errors, etc… and decided to try a new approach since I felt I was treating symptoms and not the underlying problem.

I put everything back where it was (reverted my changes, so to speak) and rebuilt.  I then stepped through the code to see how the Shape object was being allocated and used.

It wasn't.

Although it is referenced in the tests, it was never used.  It was dead code.

Sigh.

I was able to delete the dead code, move everything else I needed and got unblocked.

Lesson learned - do your investigation early in the process to determine what actually needs to be ported!

Now, off to a 2 week hiatus.  See you when I am back!

Questions, comments, concerns and criticisms always welcome,
John

Friday, October 20, 2017

Back from Tableau Conference 2017


What a whirlwind that was.  I started the week help with folks getting their schedules straight, then did a little docent work with helping people find rooms.  I also was crowd control for seating during the keynotes (which were a blast! Adam Savage was pretty terrific) and even got to do a little security work in there.

Overall, this was a fantastic conference.  I got to meet several of our customers 1 on 1 and gained some tremendous insights into what everyone wants from us.  That alone made the conference worthwhile for me - now I have a much better idea where I need to focus my time and test efforts moving forward.

If you come next year in New Orleans, be sure to let me know!  I'd love to spend some time chatting with any Tableau users that happen to be reading this blog!

Questions, comments, concerns and criticisms always welcome,
John

PS: Yes, someone mentioned that it seemed like everywhere we walked we always walked through the casino.  I just chuckled and mentioned that means that whoever designed the walkways did the job right...

Tuesday, October 3, 2017

I so badly want to rewrite this little bit of test code, but won't


I saw something similar to this in a test script:

#fdef Capybara:
#include "A.h"
#include "B.h"
#else
#include "B.h"
#endif

Fair enough.  I can imagine that this was NOT the way this conde snippet was checked in - it probably changed over time to what it is now.  I haven't yet dug into the history of it.

That seems a bit inefficient to me and I would prefer to change it to:

#fdef Capybara:
#include "A.h"
#endif
#include "B.h"

Fewer lines and should perform the exact same includes. 

But I won't make this change and here is why:
  1. It may break.  The odds are small, but why add risk where the system is currently working?
  2. It's a small benefit overall.  Maybe 1/1000 of a second faster build times.
  3. It takes some amount of time to make the change, build, test it, get a code review, check it in, etc…  I can't see any time savings for this change in the long run.
  4. A better fix than this one change would be a tool looking for this pattern in ALL our files.  Then use the tool to make the change everywhere, and make the tool part of the regular code checks we perform constantly.

But considering #2 overall, even a tool would not pay for itself.  So I am leaving this as is for now and moving on to my next task.

(But if I have the fortune of needing to edit that file, I will be sorely tempted to make this fix at the same time :) )

Questions, comments, concerns and criticisms always welcome,
John

Monday, September 25, 2017

Tableau Conference is 2 weeks away and I am getting ready


I got my schedule set for TC which is now just around the corner.  I will be working as a Schedule Scout, helping folks that attend get their schedules created.  Seems like a great way to get a conversation started face to face with our customers.

For TC last year, I had the same goal of talking directly with customers.  I looked around at all the jobs we can do - at TC, ALL the jobs are performed by Tableau employees - and figured that working at the logo store would be ideal.  My thought was that I would meet a large number of customers, and I did!  The only downside was that the store was very busy and I did not have much time to interact with everyone.

This year it looks like I will get a chance to work 1:1 with people and I am looking forward to it.  I hope to meet you in Vegas!

Questions, comments, concerns and criticisms always welcome,
John

Tuesday, September 19, 2017

Working with code coverage this week


I've been working on code coverage a bit this week.  Our team uses Bullseye for gathering information and while it has its good and bad points, overall it is fairly easy to use manually. 

More specifically, I have been trying to identify files that have no automated coverage at all.  This would mean that this file could, in theory, not even exist, and we would not know.  The reality is that Tableau would not be able to be built, but not having any automated coverage is a poor state to be in.

And this is where I hit my first snag.  We have several different types of automation we run each day.  Unit tests is an obvious starting point, and there are also end to end (integration) tests.  Code coverage numbers for those are easy enough to gather and to merge together.

We also run other types of tests like security tests and performance tests.  Getting numbers from something like a performance test is tricky.  Since we are trying to measure accurate performance, we don't want to also slow down the system by trying to monitor which lines of code are being hit or missed.  On the other hand, the code being hit should be the exact same code the other tests already cover - in other words, we should not have special code that only runs when we try to measure performance.  It's hard to validate that assumption when we specifically don't want to measure code coverage for half of the equation.

In any event, we do have a few files with low coverage, and we are working to ensure that code has adequate automated tests on it moving forward.  More on this - what adequate means in this context - coming up.

Questions, comments, concerns and criticisms always welcome,
John

Friday, September 8, 2017

How I learned Tableau


Way back when I came to Tableau, I needed to ramp up on how Tableau works.  While I like to think I understand much of the math behind the scenes, I still needed to figure out the UI, connecting to data sources and mapping and functionality like that which Tableau provides.

One of my buddies from way back was talking with me earlier this week and had the same dilemma I faced: how to learn Tableau relatively quickly.

When I started, I bought the Offical Tableau 9 book by George Peck.  Great book - it starts with the basics and builds from there.  It's about 2 years out of data at this point (there is a 10.0 book available now), but I still use it from time to time to refresh my memory.

But books only go so far, and I really learn best with hands on work.  I found a "20 days to learn Tableau" chart that I also used.  It really resonated with me - it had videos to watch, whitepapers to read (I actually found a typo in one of the papers, reported it to the author here at Tableau, and it got fixed) and activities to complete.  I recommended it to my friend and I hope he gets as much out of it as I have.

Questions, comments, concerns and criticisms always welcome,
John

Tuesday, August 29, 2017

Stochastic Processes


I'm a big believer in online classes to brush up old skills or develop new ones.  I've taken several over the past few years and while there are some classes that don't live up to the expectations, I found one that has been a pretty fun course so far.

It's called Stochastic Processes: Data Analysis and Computer Simulation from Kyoto University in Japan.  Here's a link to the class on Edx.  It is self paced and wraps up Aug 2, 2018.  It is divided into 6 weeks of classes and each week is expected to take 2-3 hours per week to complete.

I have spent MUCH more than that simply wiping the rust off of my physics classes from college.  Don't get me wrong - I love the experience of going back to Albert Einstein's PhD thesis on Brownian Motion to help with that chapter.  I understood just enough of his paper to make the simulation more understandable, and having an excuse to read anything by Einstein is just icing on the cake.

And there has been a lot of that.  I had to keep looking up mathematical concepts I haven't used in a long time (Dirac Delta equation, for instance).  Again, this was worthwhile.

If you get a chance and know Python and meet the other requirements, this may be an interesting class to take.  Plus, to audit the class is free, and you can't beat that price!

Comments, questions, concerns and criticisms always welcome,
John

Tuesday, August 15, 2017

I'm waiting for this book about using Tableau with Matlab


Last week I wrote a bit about the Matlab support with Tableau.  Our team also owns R support (it is all built on a REST API) and many people have been using R for years with Tableau.  Talks about R integration are a very popular topic at Tableau Conference each year as well, so there has been a tremendous amount of interest in this area.



So much interest, in fact, that Jen Stirrup has written a new book that is due out pretty soon, Advanced Analytics with R and Tableau. It will be available in paperback and ebook formats and is due out on September 6.  That is less than a month away as I write this and I am looking forward to getting my copy.



Good luck, Jen.  I hope you sell many copies of this book!



Questions, comments, concerns and criticisms always welcome,

John

Monday, August 7, 2017

Matlab and Tableau!


Kudos to the folks at Mathworks for their efforts to bring Matlab into the Tableau world!  Details about this are here.  Nice job, gang!

As for testing, this is one of my team's areas.  We also own R and Python integration, so we had test plans for this area well established.  Mathworks was so good at their implementation that there was frankly not much for us to do - a couple of string change requests and that was really about it.  We added some automated tests to validate the behavior is correct - to tell us if we change something that would break using Matlab server - and have that up and running now.  The side benefit to the automation is that we have no manual testing left from this effort.  This means that we are not slowed down at all in the long term even though we have added new functionality.  From a test point of view, this is the ideal case.

We never want to build up manual test cases over time.  That growth, if there is any, will always eventually add up to more time than the test team has to complete the tasking.  Obviously, this doesn't work in the long term so we have made a concerted effort to hit 100% of our test cases being automated.

So, yay us!

And thanks again to Mathworks. FWIW, I truly like Matlab.  It is every easy to look at some mathematical equation and simply type it into Matlab - it almost always works the very time I try it. 

Questions, comments, concerns and criticisms always welcome,
John

Thursday, August 3, 2017

Cartographies of Time - a mini-review


We have an internal library here at Tableau and like any library, we can check out books to read or study.  We had the same setup at Microsoft as well, with a heavy emphasis on technical books.  Any computer company will have books on programming habits, design patters, Agile and other fields like this.

Tableau also has a large section on data visualizations.  The whole spectrum is covered here from books on how to efficiently write a graphics routine to how to best present data on screen in human readable form. 

A new book arrived this last week called Cartographies of Time and it is a history of the timeline.  I saw it on the shelf and grabbed it since I am a fan of medieval maps and the cover has a map in that style on it.  It is a fascinating book that covers the very first attempts at timelines and brings us up to the modern day.

The most striking aspect of this so far - I've not gotten too far into the book - is the sheer artistic skill of the early timelines.  The people that created those timelines worked very hard to get a vibrant image, a workable color scheme and a tremendous amount of data all put into one chart.  It is simply amazing to see this and if you have the opportunity I recommend picking up a copy of this book for yourself.

Questions, comments, concerns and criticisms always welcome,
John

Thursday, July 27, 2017

A brief aside about data at the Tour de France


I'm a bike race fan and I really enjoy watching the stage races like the Tour de France.  The colors, speed and racing is just a great spectacle.

One of the teams that was there this year is Dimension Data.  They use Tableau to analyze the TONS of data they get on the riders and I read, re-read and read again this article on how they do it: https://www.dcrainmaker.com/2017/07/tour-de-france-behind-the-scenes-how-dimension-data-rider-live-tracking-works.html

Now, if I can just get myself invited along on a race to help them with Tableau…

And congratulations to Edvald Boasson Hagen!

Questions, comments, concerns and criticisms always welcome,
John

Friday, July 21, 2017

Paying down test debt, continued


Last week I mentioned that some old test automation breaks while it is disabled.

As an example, suppose I added a test to check for the 22 Franch regions being labelled properly back in 2014.  It works for a year, but then France announces it will consolidate its regions in January 2016.  While working on that change, I disable my test since I know it won't provide any value while the rest of the changes are in progess.

Then I forget to turn the test back on and don't notice that until after the change.

In this case, the fix is straightforward.  I change my test to account for the real world changes that happened while it was disabled.  In this case, I take out the list of the 22 regions and replace that list with the 14 new ones. 

This pattern - the code being tested changes while the test is disabled - is common.  In almost all cases, simply changing the test to account for the new expected behavior is all that needs to be done to enable the test.  So I typically make that change, enable the test, run it  a few thousand times and if it passes, leave it enabled as part of the build system moving forward.

Sometimes the tests are more complicated that I know how to fix.  In that case, I contact the team that owns the test and hand off the work to enable it to them.

All in all, this is a simple case to handle. 

There is also the case that the test is no longer valid.  Think of a test that validated Tableau worked on Windows Vista.  Vista is no longer around, so that test can simply be deleted.

Other factors can change as well, and I'll wrap this up next week.

Questions, comments, concerns and criticisms always welcome,
John

Wednesday, July 12, 2017

Paying down test debt


Another aspect of my work recently has been paying down technical debt we built over the years.  An example of technical debt would be this:
  1. Imagine we are building an application that can compute miles per gallon your car gets
  2. We  create the algorithm to compute miles per gallon
    1. We add tests to make sure it works
    2. We ship it
  3. Then we are a hit in the USA!  Yay!
  4. But the rest of the world wants liters per 100 kilometers. 
  5. We add that feature
    1. As we add it, we realize we need to change our existing code that only knows about miles
    2. We figure it will take a week to do this
    3. During this week, we disable the tests that test the code for "miles"
    4. We finish the liters per 100km code
    5. We check in
  6. We ship and the whole world is happy

But look back at step 5c.  The tests for miles (whatever they were) were disabled and we never turned them back on.  We call this "technical debt" or, in this case since we know it is test related, "test debt."  It happens when we take shortcuts like 5c - disabling a test.  I'll just point out a better practice would have been to ensure every bit of new code we wrote for the metric values should never have broken the MPG code, and the test should never have been disabled.  In the real world, the most likely reason to do this would be for speed - I simply want to test my new code quickly and don't want to run all the tests over the old code I am not changing, so I disable the old tests for right now and will re-enable them when I am done.  (Or so I say...)

So one other task I have taken on is identifying these tests that are in this state.  Fortunately, there are not many of them but every so often they slip through the process and wind up being disabled for far longer than what we anticipated.  Turning them back on is usually easy.  Every so often, an older test won't pass nowadays because so much code has changed while it was disabled.

What to do in those cases is a little trickier and I will cover that next.

Questions, comments, concerns and criticisms always welcome,
John

Thursday, July 6, 2017

Using the tool I wrote last week to start making changes


I finished my tool to look through a large set of our test code to classify our tests with respect to who owns them, when they run and other attributes like that.  My first use of this was to find "dead" tests - tests that never run, provide no validation or otherwise are left in the system for some reason.  I want to give a sense of scale for how big this type of challenge is.

After looking through just over 1000 tests, I identified 15 that appeared they may be dead.  Closer examination of those tests took about 1/2 a day and determined that 8 of them are actually in use.  This revealed a hole in my tool - there was an attribute I forgot to check.

One of the tests was actually valid and had simply been mis-tagged.  I reenabled that test and it is now running again and providing validation that nothing has broken.

The other 6 tests were a bit more challenging.  I had to look at each test then look at lab results to see if anyone was actually still running them, dig through each test to see what was the expected result and so on.  In most cases, I had to go to the person that wrote the test - in 2 instances, almost 10 years ago - to see if the tests could be removed.  It might seem trivial to track 6 files out of 1000+ but this will save us build time for every build and maintenance costs over the years as well as leaving a slightly cleaner test code base.

In 4 of the cases, the tests can be removed and I have removed them.  In the USA, this is a holiday week for us so I am waiting on some folks to get back in the office next week to follow up on the last 2 tests. 

This is all incremental steps to squaring away our test code.

Questions, comments, criticisms and complaints always welcome,
John

Tuesday, June 27, 2017

Working on a tool for a hackathon

We have a culture of regular time devoted to hackathons.  We can work on what we know is important or fun or challenging - we have free reign to take on the projects that we are motivated to complete.

For my project, I am working on classifying some of our test code.  What I have to do specifically is parse through each test file looking for attributes in the code.  The goal here is to make a tool that does this so I never have to do it again.

I've been working on this for a day now, and I am reminded why I want this tool.  I have written TONS of code that opens a text (type) file and goes through it line by line.  It is always tedious, slow and I always have to look up how to do these basic tasks on the internet.  Since I only do this once every year or so, I forget the exact syntax to use and need a continual refresher.

But I got my tool done yesterday and am making sure it works today.  Then I want to move it from a command line to a nice visualization that I can monitor for changes…

Questions, comments, criticisms and complaints always welcome,
John

Friday, June 16, 2017

Test names that make sense


One of the tasks developers have is adding a test when making code change.  That is just good engineering practice - you always want to make sure your code works, and then when I make a change, I want to test that I did not break your code.  It's pretty self-explanatory, really.

The trick here is that when someone fixes a bug report.  Bug reports are tracked by number, so I may be working on bug 1234 today.  When I get a fix in place, I need to add a test.  Now, when I add the test, I need to give the test a name.

One easy way to name the test is naming it after the bug number being fixed, like this:
Test_Bug1234()

That makes it possible for anyone else that needs to look at this code to know to check the bug database for details around bug 1234.  I chose the word "possible" there specifically because while it is possible to do this, it is time consuming.  I have to switch from my IDE (I use Visual Studio) to the bug tool and dig up the bug report. 

Now imagine if I had named that test this instead:
Test_AddNewFrenchRegionsToMaps()

Now if I am reading that code, or investigating a failure, I have a MUCH better starting point.  I know that the test I potentially broke had to with French regions and maps.  If I am changing map code, I am very interested in what I might have broken and know where to start my investigation much more quickly.  I don't have to switch out of my IDE to get this data and it saves me a little bit of time overall.

So while I am going through tests, I am not renaming the old format with a bit of descriptive text.  The next challenge I might take on is trying to quantify how much time I am saving overall.

Questions, comments, concerns and criticisms always welcome,
John

Tuesday, June 6, 2017

Hardware at Tableau


I just noticed that I was working in a remote office today and logged into my primary desktop from that office.  I also realized I never documented the great hardware we use at Tableau.

It may not seem special, but all the developers here get at least 2 machines: one Windows and one Mac.  We need this since our 2 primary desktop clients both need coverage.

I chose a Windows desktop and that is what I use for email and such as well as writing code for Tableau.  It's a state of the art 16 core (or 4 depending on how you count hyperthreads) 32GB desktop.  I also have 2 monitors on my desk - a 24" HD monitor and a 22" 4K monitor.  I have learned to rely on multiple monitors since way back in 1998 and can't imagine working with only one.  Brrrr.


Since I run Windows 10 on my desktop, I got a Mac laptop for portable usage.  Nothing special here - 16GB Ram and whatever processor they were using last year (I have never checked).  I use it for note taking in meetings and general office type usage.  If I need to write code or debug or whatever, I will remote into my desktop.

And finally, the docking station I have in the remote office is even better.  It has 2 monitors and I can use the laptop as a third monitor.  In effect, I get a three monitor setup when I work remotely and that is tremendously handy.  I put Tableau on one monitor, my debugger/Visual Studio/Pycharm on the second and email/chat clients/reference notes/OneNote on the third.  It really speeds me up and is a nice perk when I can't get into my main office.

Questions, comments, concerns and criticisms always welcome,
John

Thursday, June 1, 2017

An upcoming side project for the test team


We voted this week to dedicate an upcoming sprint to focus on becoming more efficient as a team rather than focus on any given new functionality for Tableau.  The thinking here is that if we become 10% more efficient, we can deliver 10% more features in a given release over time, so this small investment now will pay large dividends in the future.

The test team chose to work on analyzing automation results.  For instance, if a given test is known to fail some large percentage of the time - let's say 99.99% for sake of argument - then if it fails tonight I might not need to make investigating it the highest priority task on my plate tomorrow.  Similarly, a test that has never failed and fails tonight might very well become my most important task tomorrow.

So what we are doing in our first steps is determining the failure rate of every single test we have.  Just tying together all that data - years worth, times several thousand tests, times multiple runs per day, et… - is a large challenge.  Then we have to mine the data for the reason for each failure.  If the failure was due to a product bug, then we need to factor out that failure from computing how often each test intermittently failed.

The data mining and computation for all of this seems like a good, achievable goal for one sprint.  Using that data in a meaningful way will be the (obvious) follow on project.

Wish us luck!

Questions, comments, concerns and criticisms always welcome,
John 

Tuesday, May 23, 2017

Sharing lessons from moving test code around


I mentioned 2 weeks ago that I was moving some tests around within our codebase.  That work is still happening and will almost certainly continue for quite some time.

One other task I am taking simultaneously is quantifying the cost of moving these tests.  This ranges from a simple hourly track of my time to including the time others need to review the code and validating the tests achieve the same coverage once they have been moved.

I'm also taking a stab at quantifying how difficult moving a test can potentially be.  For instance, a traditional unit test that happens to be in a less than ideal location is a good candidate for almost a pure "copy and paste" type of move.  Since the test is sharply focused and doesn't have many dependencies, it is very simple to move around.

Other tests that start with loading a workbook in order to validate a column is being drawn correctly (I am making up an example) have many dependencies that have to be untangled before the test can be moved.  This is at best a medium difficulty task and can easily take a large amount of time depending on how tightly both the test and product code are woven together.

For now, I am making notes on how to untie those knots and moving the tests that are easy to move.  Once I am done with my notes, I intend to look through them for common patterns, good starting points and use that data to develop a plan to start untangling the next round of tests.   And of course I will share this with others since I doubt I will have enough time - or energy :) - to do all this myself.

Questions, comments, concerns and criticisms always welcome,
John

Monday, May 15, 2017

All Hands Week


This is a bit of an unusual week.  We have booked the Washington State Convention Center in downtown Seattle for our annual company meeting.  "All Hands" is the navy phrase that we use to show that the entire company attends - we go over business strategy, technical planning, development specific tasking, TableauConference planning and so on.

I can't write much about any of this (maybe not so) obviously.  This will be my second such event and I learned a lot of information last year.  Now that I know where to focus, I expect this year to be even better!

Otherwise, I am still moving unit tests to better locations.  The easy tests to move will likely fill this week for me and then next week the work gets more challenging.  Stay tuned!

Questions, comments, concerns and criticisms always welcome,
John

Wednesday, May 10, 2017

Moving unit tests to better locations


Last week I spent identifying and removing dead code.  For what it is worth, the biggest challenge there is proving the code is not actually used.  If you know of a way to tell if an operator overload is actually called, let me know…

This week I am focused on moving some of our unit tests to a more proper location.  Some of our older tests are part of a large module that runs tests all across the product.  For instance, suppose I want to test Kmeans clustering.  As it stands right now, I either have to work some command line magic to get just those tests to run, or I run that entire module which tests areas in which I am not interested (like importing from Excel).

A better place for the Kmeans test would be in the same module that holds the Kmeans code.  That way, when I run the tests, I focus only on testing the code in which I am interested and don't need to worry about Excel importing.  There are also some speed benefits when building the test code.  Right now, the old project has references all over the product.  It has to have those wide ranging references since it has such a wide variety of tests in it.  That means a lot of file copying and such during compile time.

One of the other benefits I expect to see when done is that the time to build that older test file will shrink because I am removing code from it.  As I move the code to the correct module, I am updating all the references it used to minimize the amount of references needed to build.  So my module build time will go up, but not as much as the time saved from the older test pass.

There is one final benefit to all of this.  When I build my module now, I build both that module and the older test code.  This is necessary since I need the testing provided there in order to test any changes being made in my module.  Once I am done with this task, I will only need to build my module in order to test it since all the tests will be part of the module.  I will no longer have to "pay the price" of  building that older test project.

Questions, comments, concerns and criticisms always welcome,
John

Monday, May 1, 2017

Tabpy on a Pi Tablet !


I built a Raspberry Pi powered tablet last week and brought it in to work.  Naturally, I couldn't resist the near alliteration of "tabpy on a pi-tab" so I pip installed tabpy:

Running tabpy on a Raspberry Pi Tablet


A Pi is pretty low powered so it won't run fast, but should be fun to play with.

Questions, comments, criticisms and complaints always welcome,
John

Wednesday, April 26, 2017

Removing Dead Code


Last week I was working on code coverage.  One of the results I saw is that some of the source code we own is not used by any of our testing - it is 0% covered.  Digging into this, I found out that this code is not used at all by Tableau so I intend to start removing it from our codebase.

Code that is not used by the product you are working on is often referred to as  "dead code" and there are a few ways this can happen.  One obvious way is that existing code functionality simply gets provided by something else.  Let's say you had a very slow Bubble Sort routine to sort a list.  Once you learn a faster algorithm to sort, like Quick Sort, you start using the Quick Sort instead.  If you are not diligent when making the code change to use Quick Sort, the Bubble Sort code can get left behind.  It is not used at this point and becomes "dead code."

Similarly, if you need to implement sorting, you could try Quick Sort, Insertion Sort and Merge Sort (for instance).  Once you profile the time spent by each routine and generate the test data, you can make a decision about which routine to use.  Again, if you don't remove the routines you don't use, they become "dead code." 

After digging into the code coverage numbers, I found a few instances of the second case.  Since this code is not used at all, it doesn't get compiled so that helps mitigate having it around.  But it still results in a good amount of unneeded code stored on everyone's hard drive, maintained in our repository and so on.  The best practice is to just get rid of it and that is what I am working on now.

Questions, comments, concerns and criticisms always welcome,
John

Wednesday, April 19, 2017

Working on a code coverage task this week


For this week I am focused on getting code coverage numbers for our team.

Challenge #1 is pretty simple to state - get a list of all the source files that our team owns.  And while the problem is easy to understand, the real world implications of this are a bit trickier, especially with older files we have.  As the years have gone by, ownership of unchanging source files gets a little fuzzy.  The team (or developer) who created the original source file may be long gone.  Even teams that own a file might have been reorganized - several times over - since the file was checked in. 

So if Carol created the original "stats.cpp" file she may be the only person to ever have edited it.  If she moves to another team, and her old team gets reorganized, ownership can get moved to the bottom of the list of problems to address at that point.  After all, if the code is stable, why spend resources on tracking who should be associated with it? 

But after a while, every company has this challenge.  That is what I am sorting out this week.

Fortunately, Tableau has been pretty good with naming conventions for source files.  For example, all Cluster related files have the text "cluster" in them.  For most of the features my team owns, I can simply search by file name to get a good starting point for the files we own.  Getting that list together, parsed and cleaned up is my goal for the week.

After that, I think I may need to move to class based ownership.  More on that next time.


Questions, comments, concerns and criticisms always welcome,
John

Tuesday, April 11, 2017

A nifty site to see what a compiler does to C++ code


While investigating a very intermittent unit test failure this week, I noticed an anomaly in our code.  This test is written in c++ and had an extra semicolon in it:

We had a test that did something like this:
Worksheet wb = wbc->GetSheet();
    ;
 CPPUNIT_ASSERT(blah blah);

Notice that extra ; in the middle?  Since the test intermittently fails, I was looking for anything unexpected in the code.  This is unexpected, but I also needed to know if it was important.

Matt Godbolt created a terrific site that lets you put in C++ code and see what output various compilers produce.  The site is here  https://gcc.godbolt.org/

You can choose different compilers and I just took a look at gcc 6.3 to see if it would ignore an extra ;. 

Here's my test code:
void test()
{
int x=1;
;
}

And here is the output:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 1
        nop
        pop     rbp
        ret

I get the same output with or without the extra semicolon. This is great since I would expect the compiler to get rid of blank commands like this.  Since the compiler does indeed ignore this typo in the code, I can move on to other avenues of investigation.

Give this site a whirl.  You can choose several different compilers and chip options, pass parameters in and so on.  Thanks Matt!

Questions, comments, concerns and criticisms always welcome,
John

Friday, April 7, 2017

Always learning


One of the most fun aspects (to me) of being in the world of software development is the focus on continual learning.  The systems on which we work are constantly evolving.

For instance, when I started way back in 1995, Windows 3.1 was the dominant OS.  Smart phones did not exist, tablets were only on Star Trek and so on.  The challenge is how to adapt to a constantly changing world.

The strategy I use is one of constant learning.  MOOCs have made this much easier in the last few years - I tend to prefer a combination of lectures and hands on to learn a new skill.  To me, which new skill to learn is not as important as always learning something.  Sometimes there is a skill I need to learn that is very relevant to some project that is upcoming - for instance, learning Python because our automation system here at Tableau uses Python is an obvious example.

But other times I take on tasks that are more in a hobby space.  To this end, for the past few months I have been enrolled in a Programming the Internet of Things program at Coursera.  It has been a blast!  It made me brush up on electronics I learned years ago in the navy as well as some programming in C and Python.  I got to work with the Raspberry Pi and Arduino, install some useful libraries and my final capstone was a device that would email me if I left my garage door in my house was left open.  This was a beginner level class so the introduction was very gentle and definitely left me wanting more.

I received my final certificate this last week and am thinking about going deeper into another class.  Either that or finish the growing list of project ideas I developed while taking this class.  Either way, it should be fun!

Questions, comments, concerns and criticisms always welcome,
John

Thursday, March 30, 2017

Some simple data munging work this week


A task I had to deal with this week was this one.  I was given a data set we want to test, like this:
2 3 4 4 5 5 6 6 7 9
We have an algorithm to compute some basic statistics on any given data set (mean, median, variance, etc…).  Nothing special about that.  And I had two data sets - the small one above, used mostly to make sure the testing code would work, and another data set of 50,000+ floating point numbers:
-8322.458 -6199.002 -6002.999 and so on.

What I needed to do was compare the results of those types of calculations across a variety of different tools which also compute those basic stats.  I chose Excel, R, some Python code I wrote myself, Numpy/Scipy and Octave.

And that is where the problems came in.

My original data sets were simply a list of numbers, without commas, seperated by spaces, all on one row.  For the small data set, for all the tools, I could just copy/paste or even retype to get the data into the format the tool wanted.  This is not a hard problem to solve, just tedious.  The industry calls this "data munging" (getting the data from the format you have into the format your tools needs) and is almost always the most time consuming part of any analysis.  Hit me up for links to prove this if you want.

For instance, excel prefers a single column to make entering the calculations easy, but can use a row.  Python's read csv files wants commas to separate values along a row (you can specify spaces) but once the data is imported, it is easiest to have one column of data.  So I had to create a file of the 50,000+ values with each value on one line.

R was able to use the same file as Python.  Nice!

Octave  wanted all the values on one row so I had to re-layout the numbers with a comma in between each.  Since this was a one-off task, I simply used Word to edit the file.  It took a little under a minute to make the 50,000+ replacements.

Now I have the data files in the format that all the tools want, and can use their results to help ensure Tableau is getting expected answers for these basic statistics.

Questions, comments, concerns and criticisms always welcome,
John