CCM timing issues
Description
Environment
Pull Requests
is duplicated by
Activity
Andy Tolbert October 9, 2015 at 6:48 PM
We are refactoring the tests with the 'FIXME' tags and utility code that has 'FIXME' as part of so marking this as DUPLICATE and assigning to . The timing problem will continue to exist, but we'll have less tests that depend on it and doesn't seem to be that visible in jenkins anymore.
Joaquin Casares July 18, 2014 at 6:54 PM
This is now a placeholder until CASSANDRA-7012 is resolved. In the meantime, these commits to ccm are our workarounds:
https://github.com/pcmanus/ccm/commit/ac69fb49b4189437c0f1d3444f5aa4dd29f53533
https://github.com/pcmanus/ccm/commit/caf3d1ac71e75d6db37ebcf5a7d6485cce711a0e
We can probably remove the FIXME comments today through the ccm workaround, but I'm uncertain how stable Jenkins will become. I'd advise waiting until either CASSANDRA-7012 is resolved or until we have additional resources to keep up with Jenkins stability.
Joaquin Casares May 6, 2013 at 6:51 PM
Okay, cool. I'll move the Thread.sleep() into the waitFor code and have it switchable by a boolean. This way when ccm has that integration, it will make for a quick fix.
+1 on not submitting BUG comments. I'll move them over to FIXME's and create a new, fresh branch before I submit a pull request.
Sylvain Lebresne May 6, 2013 at 2:55 PM
Can I get this automated in the waitFor() code?
Not really, no. The problem is that the C* nodes themselves take time to discover that other are dead, and that is somewhat independent of the vision the driver has (but will impact test that care about the consistency level obviously). This is especially true when nodes are started/stopped quickly, the C* failure detector can then get particularly slow. We have the same problem with the C* dtests, and there we use a CCM flag that waits until a dead node has been detected as so by other member of the cluster. The implementation of said flag is ugly as hell, it watch the system log file of other nodes, but that's vaguely better than sleeps. So we can probably do the same here, though the flag I mention is not yet exposed through the command line 'stop' command so we'd need to add that to ccm first. I'll do that when I have a minute.
In the meantime, sleeps will have to do. But please let's not commit a "// BUG:" comment for such things. We can put a FIXME to not forget to change it later but they are no bug so this makes it annoying to locate real problem with the driver.
https://github.com/joaquincasares/java-driver/compare/master...retrypolicies_dev_bugs
Searching for "// BUG:" shows situations where an additional Thread.sleep() is needed in order to ensure the node has been marked as down. Can I get this automated in the waitFor() code? That way it's less racy?
Also, do note that in one case I must use:
instead of leaving the "40" out of the call (so it defaults to 20). Not sure why this is needed in this one situation. Any thoughts?