TokenMap.Build takes a long time (1min+)

Description

We recently added 5 nodes to an existing 6 node cluster in a new datacenter. From this time, the initial connection to the cluster takes upwards of 2 minutes. Once initialized, everything is fine. As we dug into this delay we have narrowed it down to the initialization of the TokenMap in `TokenMap.Build`

We see 3 loops nested:

  • for each keyspace (30) - ComputeTokenToReplicaNetwork:

    • for 0..numTokens (256*11 = 2816)

      • for 0..numTokens (2816)

This set of nested loops causes 30*2816*2816 = 237,895,680 iterations at the inner loop in `ComputeTokenToReplicaNetwork` (minus any early continues or break conditions)

Profiling shows this is essentially a tight processing loop consuming a single core the entire time. We are concerned with this because while 2 minutes may be acceptable, we will very likely be adding more datacenters and more nodes and more keyspaces. With the exponential complexity of this method, I'm not sure how this will work.

From looking at the Java driver, it appears they have an optimization where they compute this once for each distinct value of Replication Strategy (combination of DC:<rf>) Here is the code. In our case, this reduces the complexity by a factor of 30 (all our keyspaces use the same RF):

1 2 3 4 5 6 7 8 9 10 11 12 Map<ReplicationStrategy, Map<Token, Set<Host>>> replStrategyToHosts = new ...(); Map<String, Map<Host, Set<TokenRange>>> hostsToRanges = new ...(); for (KeyspaceMetadata keyspace : keyspaces) { ReplicationStrategy strategy = keyspace.replicationStrategy(); Map<Token, Set<Host>> ksTokens = replStrategyToHosts.get(strategy); // only if hasn't been calculated for this unique replication strategy // if (ksTokens == null) { ksTokens = (strategy == null) ? makeNonReplicatedMap(tokenToPrimary) : strategy.computeTokenToReplicaMap(keyspace.getName(), tokenToPrimary, ring); replStrategyToHosts.put(strategy, ksTokens); }

Is this optimization something we could add to the csharp driver? It would help a lot in our case, though there is still the exponential hit of adding more nodes. Is this not an issue for others? I find it hard to believe my cluster is among the most complicated out there...

Thanks in advance!

Environment

C* 2.1.13.

  • 11 nodes in two DCs

  • 30 keyspaces all with same RFs (NetworkTopologyStrategy)

csharp-driver 2.7.3

Pull Requests

None

Status

Assignee

Unassigned

Reporter

Thunder Stumpges

Labels

None

Reproduced in

None

PM Priority

None

Fix versions

None

External issue ID

None

External issue ID

None

External issue ID

None

External issue ID

None

External issue ID

None

External issue ID

None

Doc Impact

None

Reviewer

None

Epic Link

None

Sprint

None

Size

None

Affects versions

2.7.3

Priority

Major