Monday, November 28, 2016

Cleaning up ownership of some legacy tests

The Statistics team used to be a combined team here at Tableau and was called Statistics and Calculations.  Before I started, there was a small change made to make Statistics its own standalone team.  All has been good since with one small problem - all of the tests the combined team owned are labelled as being owned by "Statistics and Calculations.  When a test fails, the person that notices usually just assigns the test to Statistics (since our name comes first in that string, I guess) even if the functionality being tested is not owned by our team.

An example of a feature we own is Clustering.  We wrote that and own all the tests for it.  An example of a feature we do not own now that we are a standalone team would be table calculations.

Anyway, we have hundreds of tests that need to be retagged.  I decided to take on this work in order to properly tag ownership of the tests.  This way, if a test fails, it can be properly routed to the best owner immediately.  The first challenge is just getting a list of all the files that I need to edit.  The lowly DOS command (DOS?  Isn't that going on 40+ years old?) "findstr" was incredibly useful.  I just looked through every file in our repository to find the old string "Statistics and Calculations" that I need to edit.

Now I had a list of all the files in a weird DOS syntax.  Example:
integration_tests\legacytest\main\db\RegexpMatchFunctionTest.cpp:    CPPUNIT_TEST_SUITE_EX( RegexpMatchFunctionRequiredUnitTest, PRIMARY_TEAM( STATISTICS_AND_CALCULATIONS_TEAM ), SECONDARY_TEAM( VIZQL_TEAM ) );

Also notice that the path is incomplete - it just starts with the \integration_tests folder and goes from there.

The actual list of files is well over a hundred and my next task was to clean up this list.  I though about hand editing the file using Find and Replace and manually cutting out stuff I did not need, but that would have taken me well over an hour or two.  Plus, if I missed a file, I would have to potentially start over, or at least figure out how to restart the process with changes.  Instead, I decided to write a little python utility to read through the file, find the (partial) path and filename and remove everything else in each line.  Then correct the path and add the command I need to actually make the file editable.  Our team uses perforce so this was just adding "p4 edit " to the start of each file.  And fixing the path was pretty simple - just prepend the folder name I was in when I ran findstr.

Finally, clean out duplicate file names and run my code.  It created a 13K batch file ready for me to get to work, and if I need to update, I can just run my code again.  Kind of like reproducible research - at least that is how I think of it.

I can post the code if anyone is  interested, but it is pretty basic stuff.

Questions, comments, concerns and criticisms always welcome,

Monday, November 14, 2016

Obligatory Tableau Conference follow up

Tableau Conference 16 (TC16 as we call it) was successfully held last week in Austin, Texas.  Hundreds of talks and trainings, keynotes from Bill Nye and Shankar Vedantam, a look ahead into our plans and thousands of customers made for a  very busy week.  I was there working from Saturday to Thursday and it seems like the time just flew by.  For what it is worth, I took a hiatus from technical work and kept the logo store stocked as well as I could.  While I was able to meet many, many customers in the store, I did not have the time to talk much about Tableau and their usage.  The conference is both a blur and fresh in my mind, if such things are possible, and this is a quick summary of what I remember.  I wanted to focus on users while I was there, so that is what is on my mind today.

On the way home, though, a buddy and I met a couple of Tableau users in the airport.  We started talking about their usage and I asked them if we could do one thing for them, what would it be.  One answer was "add more edit commands to the web UI."  Fair enough.  The other was to help with the classification (or bucketing) of many levels of hierarchical data.  They classify educational institutions broadly - I have in my mind "Liberal Arts School", "Engineering", "Fine Arts" and so on.  Then, they classify each school further into many different levels and they need a tool to help with that.  I also imagine this visually as sort of like a process flow diagram, but that may be off base.

If you attended, feel free to let me know what you thought of the conference.  And if you did not, I would encourage you to go next year.  Biased though I may be, I thought it was informative and fascinating. 

The videos from the sessions are available to attendees here: if you missed anything (and I missed a whole lot!)

Questions, comments, concerns and criticisms always welcome,