We recently added 5 nodes to an existing 6 node cluster in a new datacenter. From this time, the initial connection to the cluster takes upwards of 2 minutes. Once initialized, everything is fine. As we dug into this delay we have narrowed it down to the initialization of the TokenMap in `TokenMap.Build`
We see 3 loops nested:
for each keyspace (30) - ComputeTokenToReplicaNetwork:
for 0..numTokens (256*11 = 2816)
for 0..numTokens (2816)
This set of nested loops causes 30*2816*2816 = 237,895,680 iterations at the inner loop in `ComputeTokenToReplicaNetwork` (minus any early continues or break conditions)
Profiling shows this is essentially a tight processing loop consuming a single core the entire time. We are concerned with this because while 2 minutes may be acceptable, we will very likely be adding more datacenters and more nodes and more keyspaces. With the exponential complexity of this method, I'm not sure how this will work.
From looking at the Java driver, it appears they have an optimization where they compute this once for each distinct value of Replication Strategy (combination of DC:<rf>) Here is the code. In our case, this reduces the complexity by a factor of 30 (all our keyspaces use the same RF):
Is this optimization something we could add to the csharp driver? It would help a lot in our case, though there is still the exponential hit of adding more nodes. Is this not an issue for others? I find it hard to believe my cluster is among the most complicated out there...
Thanks in advance!
11 nodes in two DCs
30 keyspaces all with same RFs (NetworkTopologyStrategy)