SCC repartitionByCassandraReplica is failing in case of Spark and Cassandra running in different Containers on Kubernetes
Description
I was running Spark 3.0.1 on Kubernetes and fetching data from Cassandra using spark Cassandra connector.
Use case is Spark JavaRDD<Key> is generating local Partitions by calling repartitioningWithCassandraReplicas with X Table and then using joinWithCassandraTable with X Table. This thing is working on Spark StandAlone where Spark and Cassandra both are on same server and Spark Partitions localized after repartitioningWithCassandraReplicas before calling joinWithCassandraTable. But the same thing if tried on Kubernetes where Spark and Cassandra running in separate Pod, It seems repartitionByCassandraReplica failed as no data locality obtained in Spark Container.
What am I missing here to make it work.
Directly using rdd partitionKey with joinWithCassandraTable is severe Performance hit. Is there any way to handle spark Cassandra performance for joining rdd with table of same partition Key in Containerized Environment.
I was running Spark 3.0.1 on Kubernetes and fetching data from Cassandra using spark Cassandra connector.
Use case is Spark JavaRDD<Key> is generating local Partitions by calling repartitioningWithCassandraReplicas with X Table and then using joinWithCassandraTable with X Table. This thing is working on Spark StandAlone where Spark and Cassandra both are on same server and Spark Partitions localized after repartitioningWithCassandraReplicas before calling joinWithCassandraTable. But the same thing if tried on Kubernetes where Spark and Cassandra running in separate Pod,
It seems repartitionByCassandraReplica failed as no data locality obtained in Spark Container.
What am I missing here to make it work.
Directly using rdd partitionKey with joinWithCassandraTable is severe Performance hit. Is there any way to handle spark Cassandra performance for joining rdd with table of same partition Key in Containerized Environment.