SCC repartitionByCassandraReplica is failing in case of Spark and Cassandra running in different Containers on Kubernetes

Description

I was running Spark 3.0.1 on Kubernetes and fetching data from Cassandra using spark Cassandra connector.

Use case is Spark JavaRDD<Key> is generating local Partitions by calling repartitioningWithCassandraReplicas with X Table and then using joinWithCassandraTable with X Table. This thing is working on Spark StandAlone where Spark and Cassandra both are on same server and Spark Partitions localized after repartitioningWithCassandraReplicas before calling joinWithCassandraTable. But the same thing if tried on Kubernetes where Spark and Cassandra running in separate Pod,
It seems repartitionByCassandraReplica failed as no data locality obtained in Spark Container.

What am I missing here to make it work.

Directly using rdd partitionKey with joinWithCassandraTable is severe Performance hit. Is there any way to handle spark Cassandra performance for joining rdd with table of same partition Key in Containerized Environment.

Environment

OS - Linux
Software Platform - Kubernetes

Pull Requests

None

Activity

Resize issue view side panel

Details

Assignee

Jacek

Reporter

ranju goel

Components

Affects versions

3.0.0

Priority

Major

Created June 15, 2021 at 7:22 AM

Updated June 15, 2021 at 7:22 AM