repartitionByCassandraReplica relocates data to the local node only
Description
`repartitionByCassandraReplica` relocates data to the local(-host) node only, whereas it should relocate data over all the nodes of the local DC instead.
When running `repartitionByCassandraReplica` on a machine where:
a local node exists, then it relocates the entire data to this single node.
no local node exist (e.g. with nodes being on remote machines, or with a containerized node with an isolated IP on the local machine), then it always returns an empty RDD without throwing any exception.
Environment
None
Pull Requests
None
Activity
Show:
Jaroslaw Grabowski
April 7, 2021, 12:51 PM
thank you for finding and fixing this!