Consider extra steps to shield the Driver against system.peers errors

Description

In a 6 nodes cluster, we could observe one of the node with a system.peers containing 6 rows (instead of 5), all of them with a fully populated tokens column. One host was wrongly listed by the system.peers table of just one of the node in this cluster. That host was running Cassandra as part of a different cluster and was therefore seen as UP by the driver. This is likely to have happened after the host got replaced and re-assigned to a different cluster by our automation.

While the driver currently filters out system.peers rows with a null value as tokens, some extra steps could be considered to detect this kind of unfortunate error:

  • Verify the cluster name in the cluster_name column of system.local on every host the driver connects to.

  • Query system.peers from several randomly picked nodes and keep the topology of the cluster that's reported by the majority of them.

Environment

Cassandra 3.0.15

Pull Requests

None

Status

Assignee

Unassigned

Reporter

Michaël Figuière

Labels

None

PM Priority

None

Reproduced in

None

Affects versions

Fix versions

None

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Priority

Major
Configure