40 second connect() time on cluster with invalid keyspace configuration
The performance degradation was caused by the token map computation degenerating to O(n^2) when a keyspace reference a non-existing datacenter. The code will now log the following warning when it detects this situation:
As discussed on the mailing list, I'm having trouble with the java driver - both version 2.1.5 and 188.8.131.52, taking a very long time to connect to my 56 node cluster - somewhere around 40 seconds.
It's all in the debug file I sent you privately
But yes, it's tokenToReplicaMap. Log excerpt:
Did you track down the exact method that caused the large pause?
Looking at the code, I suspect it's NetworkTopologyStrategy#computeTokenToReplicaMap: it will never find hosts for the bad DC, so it keeps traversing the whole ring for each token. That yields O(numTokens^2) complexity instead of an expected O(numTokens). In your case, numTokens is 256 * 56.
If we can't find a host for a given token in a given DC (with a non-zero replication factor), that's probably a sign that something is wrong. So we should log a warning the first time that happens.
Trying out the patched version sent by Olivier, I found out that I had a keyspace defined as follows:
Having a keyspace with a non-existing keyspace made the java driver pause for some 38 seconds. Changing this (test) keyspace into using correct settings removed that pause.
So, configuration error on server side, but perhaps that java driver can protect itself against it? The python driver does not pause in this case.
num_tokens is 256.
I'll happily test a patched version.
This is baffling, there is a 40-second pause when the driver does not log anything.
From the logs I can see that this happens during initialization of the control connection to the contact point. (refreshSchema executes normally, then at the end there is an extra call to refreshNodeListAndTokenMap since we just loaded some new keyspaces; then I see more rpc_address logs coming from the body of this method; then the 40-second pause; then Successfully connected to xxx which indicates that we have returned from the method and completed the reconnectInternal call).
The only significant code that could get executed during that pause is the initialization of the token map. There were changes to that in JAVA-312, but you reproduce with 184.108.40.206, which does not have them yet.
Did you customize num_tokens in your cassandra.yaml file? A high value could have an effect on the computations for the token map (even though I don't see that taking 40 seconds of CPU).
If I send you a patched version with more logs, would you be willing to try it in your environment?