Last week I left off with a test that mimics the user action of re-ordering the criteria you used to create clusters. The clusters themselves should not change when this happens, and the test verifies that they do not change. I got that failure fixed and it passed 10 times when I ran my test locally.
Why 10 times? I have learned that any test which manipulates the UI can be flaky. Although my test avoids the UI here as much as I can, it still has elements drawn on screen and might have intermittent delays while the OS draws something, or some random window pops up and steals focus, etc… So I run my test many times in an attempt to root out sources of instability like these.
I would love to do more than 10 tests but the challenge becomes the time involved in running one of these end to end scenarios. There is a lot of work for the computer to do to run this test. The test framework has to be started (I'm assuming everything is installed already, but that is not always the case), Tableau has to be started, a workbook loaded, etc… Then once done, cleanup needs to run, the OS needs to verify Tableau has actually exited, all logs monitored for failures and so on. It's not unusual for tests like this to take several minutes and for sake of argument, let's call it 10 minutes.
Running my test 10 times on my local machine means 100 minutes of running - just over an hour and a half. That is a lot of time. Running 100 times would mean almost 17 hours of running. This is actually doable - just kick off the 100x run before leaving to go home and it should be done the next morning.
Running more than that would be ideal. When I say these tests can be flaky, a 0.1% failure rate is what I am thinking. In theory, a 1000x run would catch this. But that now takes almost a week of run time. There are some things we can do to help out here like run in virtual machines and such, but there is also a point of diminishing returns.
Plus, consider the random window popping open that steals focus and can cause my test to fail. This doesn't have anything to do with clsutering - that works fine, and my test can verify that. This is a broader problem that affects all tests .There are a couple of things we can do about that which I will cover next.
Questions, comments, concerns and criticisms always welcome,