Monday, November 28, 2016

Cleaning up ownership of some legacy tests

The Statistics team used to be a combined team here at Tableau and was called Statistics and Calculations.  Before I started, there was a small change made to make Statistics its own standalone team.  All has been good since with one small problem - all of the tests the combined team owned are labelled as being owned by "Statistics and Calculations.  When a test fails, the person that notices usually just assigns the test to Statistics (since our name comes first in that string, I guess) even if the functionality being tested is not owned by our team.

An example of a feature we own is Clustering.  We wrote that and own all the tests for it.  An example of a feature we do not own now that we are a standalone team would be table calculations.

Anyway, we have hundreds of tests that need to be retagged.  I decided to take on this work in order to properly tag ownership of the tests.  This way, if a test fails, it can be properly routed to the best owner immediately.  The first challenge is just getting a list of all the files that I need to edit.  The lowly DOS command (DOS?  Isn't that going on 40+ years old?) "findstr" was incredibly useful.  I just looked through every file in our repository to find the old string "Statistics and Calculations" that I need to edit.

Now I had a list of all the files in a weird DOS syntax.  Example:
integration_tests\legacytest\main\db\RegexpMatchFunctionTest.cpp:    CPPUNIT_TEST_SUITE_EX( RegexpMatchFunctionRequiredUnitTest, PRIMARY_TEAM( STATISTICS_AND_CALCULATIONS_TEAM ), SECONDARY_TEAM( VIZQL_TEAM ) );

Also notice that the path is incomplete - it just starts with the \integration_tests folder and goes from there.

The actual list of files is well over a hundred and my next task was to clean up this list.  I though about hand editing the file using Find and Replace and manually cutting out stuff I did not need, but that would have taken me well over an hour or two.  Plus, if I missed a file, I would have to potentially start over, or at least figure out how to restart the process with changes.  Instead, I decided to write a little python utility to read through the file, find the (partial) path and filename and remove everything else in each line.  Then correct the path and add the command I need to actually make the file editable.  Our team uses perforce so this was just adding "p4 edit " to the start of each file.  And fixing the path was pretty simple - just prepend the folder name I was in when I ran findstr.

Finally, clean out duplicate file names and run my code.  It created a 13K batch file ready for me to get to work, and if I need to update, I can just run my code again.  Kind of like reproducible research - at least that is how I think of it.

I can post the code if anyone is  interested, but it is pretty basic stuff.

Questions, comments, concerns and criticisms always welcome,

No comments:

Post a Comment