40 second connect() time on cluster with invalid keyspace configuration

Description

Final fix:
The performance degradation was caused by the token map computation degenerating to O(n^2) when a keyspace reference a non-existing datacenter. The code will now log the following warning when it detects this situation:


Initial report:
As discussed on the mailing list, I'm having trouble with the java driver - both version 2.1.5 and 2.0.9.2, taking a very long time to connect to my 56 node cluster - somewhere around 40 seconds.

Environment

None

Pull Requests

None

Activity

Show:
Erik Forsberg
March 31, 2015, 5:44 AM

It's all in the debug file I sent you privately

But yes, it's tokenToReplicaMap. Log excerpt:

Olivier Michallat
March 30, 2015, 5:58 PM
Edited

Did you track down the exact method that caused the large pause?

Looking at the code, I suspect it's NetworkTopologyStrategy#computeTokenToReplicaMap: it will never find hosts for the bad DC, so it keeps traversing the whole ring for each token. That yields O(numTokens^2) complexity instead of an expected O(numTokens). In your case, numTokens is 256 * 56.

If we can't find a host for a given token in a given DC (with a non-zero replication factor), that's probably a sign that something is wrong. So we should log a warning the first time that happens.

Erik Forsberg
March 30, 2015, 7:38 AM

Trying out the patched version sent by Olivier, I found out that I had a keyspace defined as follows:

Having a keyspace with a non-existing keyspace made the java driver pause for some 38 seconds. Changing this (test) keyspace into using correct settings removed that pause.

So, configuration error on server side, but perhaps that java driver can protect itself against it? The python driver does not pause in this case.

Erik Forsberg
March 27, 2015, 2:57 PM

num_tokens is 256.

I'll happily test a patched version.

Olivier Michallat
March 27, 2015, 2:53 PM

This is baffling, there is a 40-second pause when the driver does not log anything.

From the logs I can see that this happens during initialization of the control connection to the contact point. (refreshSchema executes normally, then at the end there is an extra call to refreshNodeListAndTokenMap since we just loaded some new keyspaces; then I see more rpc_address logs coming from the body of this method; then the 40-second pause; then Successfully connected to xxx which indicates that we have returned from the method and completed the reconnectInternal call).

The only significant code that could get executed during that pause is the initialization of the token map. There were changes to that in JAVA-312, but you reproduce with 2.0.9.2, which does not have them yet.

Did you customize num_tokens in your cassandra.yaml file? A high value could have an effect on the computations for the token map (even though I don't see that taking 40 seconds of CPU).

If I send you a patched version with more logs, would you be willing to try it in your environment?

Fixed

Assignee

Olivier Michallat

Reporter

Erik Forsberg

Labels

None

PM Priority

None

Reproduced in

None

Affects versions

Fix versions

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Priority

Major