duplicate entries in a list when using saveToCassandra()

Description

Hi,

1 2 3 4 5 6 7 8 CREATE TYPE orderbooks.address ( name text ); CREATE TABLE orderbooks.person ( name text PRIMARY KEY, addresses list<frozen<address>> );
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 object MyTest3 { case class Address(name: String) case class Person(name:String, addresses: List[Address]) def main(args: Array[String]) = { val spark = SparkSession.builder .appName("My Test") .master("local[*]") .config("spark.driver.bindAddress", "127.0.0.1") .getOrCreate() val p = Person("Peter", List(Address("Wall Street"))) spark.sparkContext.parallelize(List.fill(2)(p), 1).saveToCassandra("orderbooks", "person") spark.close() } }
1 2 3 4 5 6 7 8 cqlsh:orderbooks> select * from person ; @ Row 1 -----------+------------------------------------------------ name | Peter addresses | [{name: 'Wall Street'}, {name: 'Wall Street'}] (1 rows)

The address is duplicated.

Please mind, that I set spark parallelism to 1. Is this expected behavior? My expectation was that the row will be completely overwritten and not duplicated.

If I set parallelism to 2, it's not reproduced. There is a single address as expected.

Environment

Spark Cassandra version 2.0.6
Cassandra version 3.0.15

Pull Requests

None

Status

Assignee

Piotr Kołaczkowski

Reporter

Oleksiy Dyagilev

Labels

None

Reviewer

None

Reviewer 2

None

Tester

None

Pull Request

None

Components

Affects versions

2.0.6

Priority

Major