Optimize the number of connections opened by Java Driver to Astra service

Description

None

Environment

Currently the number of connections to “LOCAL” nodes defaults to the number of cores - 1. This is suboptimal for Astra where Spark executors are not colocated with DB nodes. In that case the default LoadBalancingPolicy is used and as a result all the “local DC” nodes are treated as LOCAL, for beefier machines this results in hundredths of connections opened from the Spark cluster to the Astra cluster. Every Spark executor opens number of cores - 1 connections to every Astra node.

Furthermore, since all Astra connections go through a load balancer , all the connections pile up at a single host.

SCC should use a custom LoadBalancingPolicy that limits the number of connections to Astra. Only a single pool of number of cores - 1 should be opened. Partitioning and data locality should be revisited.

The default number of connection may be overridden with spark.cassandra.connection.localConnectionsPerExecutor

Pull Requests

None

Activity

Show:

Details

Assignee

Reporter

Affects versions

Priority

Created April 22, 2022 at 2:30 PM
Updated April 22, 2022 at 2:32 PM