driver not ignoring invalid entries with schema_version null in system.peers

Description

we are seeing in some cases in when a Cassandra node is assigned a new IP address but node was already bootstrapped so no replace_address is used.
That potentially some eventual inconsistent invalid entries in the system.peers of other nodes that are up and running.

select * from system.peers where peer='192.168.225.89';" peer | data_center | host_id | preferred_ip | rack | release_version | rpc_address | schema_version | tokens ----------------+-------------+--------------------------------------+--------------+------------+-----------------+----------------+----------------+-------- 192.168.225.89 | null | 336d64b9-40fd-4c75-b1eb-0df3f01a9f09 | null | us-west-2c | null | 192.168.225.89 | null | null (1 rows)

the driver right now checks for host_id and rpc_address for null checks and doesn't check for other null values.
https://github.com/datastax/java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/internal/core/metadata/DefaultTopologyMonitor.java#L541

as the entry is duplicated

select peer, host_id, rpc_address, schema_version from system.peers;" peer | host_id | rpc_address | schema_version -----------------+--------------------------------------+-----------------+-------------------------------------- 192.168.113.147 | d9cca34d-7c41-4461-8d54-cae635e13f07 | 192.168.113.147 | ddcb5a89-b6d7-3978-8d4d-bfa99dfe6eb6 192.168.225.89 | 336d64b9-40fd-4c75-b1eb-0df3f01a9f09 | 192.168.225.89 | null 192.168.186.91 | 8de849fa-4285-47ad-8a04-3f810b6da782 | 192.168.186.91 | ddcb5a89-b6d7-3978-8d4d-bfa99dfe6eb6 192.168.58.34 | 551bfaf3-9ce5-49f5-b518-4d0192739209 | 192.168.58.34 | ddcb5a89-b6d7-3978-8d4d-bfa99dfe6eb6 192.168.58.33 | c8fc9d3a-74e0-4783-ab88-3446c76f49bd | 192.168.58.33 | ddcb5a89-b6d7-3978-8d4d-bfa99dfe6eb6 192.168.113.146 | 61354657-a4b8-4600-a533-b83a9ebbf0b9 | 192.168.113.146 | ddcb5a89-b6d7-3978-8d4d-bfa99dfe6eb6 (6 rows)

we see the next error from the driver

java.lang.IllegalArgumentException: Multiple entries with same key: 336d64b9-40fd-4c75-b1eb-0df3f01a9f09=Node(endPoint=/192.168.225.89:9042, hostId=336d64b9-40fd-4c75-b1eb-0df3f01a9f09, hashCode=15700022) and 336d64b9-40fd-4c75-b1eb-0df3f01a9f09=Node(endPoint=cassandra-instance-ring1-node-1.cassandra-instance-svc.default.svc.cluster.local/192.168.225.90:9042, hostId=336d64b9-40fd-4c75-b1eb-0df3f01a9f09, hashCode=5b4bd) at com.datastax.oss.driver.shaded.guava.common.collect.ImmutableMap.conflictException(ImmutableMap.java:215) at com.datastax.oss.driver.shaded.guava.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:209) at com.datastax.oss.driver.shaded.guava.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:147) at com.datastax.oss.driver.shaded.guava.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:110) at com.datastax.oss.driver.shaded.guava.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:393) at com.datastax.oss.driver.internal.core.metadata.InitialNodeListRefresh.compute(InitialNodeListRefresh.java:81) at com.datastax.oss.driver.internal.core.metadata.MetadataManager.apply(MetadataManager.java:508) at com.datastax.oss.driver.internal.core.metadata.MetadataManager$SingleThreaded.refreshNodes(MetadataManager.java:328) at com.datastax.oss.driver.internal.core.metadata.MetadataManager$SingleThreaded.access$1700(MetadataManager.java:293) at com.datastax.oss.driver.internal.core.metadata.MetadataManager.lambda$refreshNodes$1(MetadataManager.java:166) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) at io.netty.channel.DefaultEventLoop.run(DefaultEventLoop.java:54) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)

Adding a check for schema_version solved the issue for us.
Opening this JIRA to discuss if we should include a check for schema_version also here as in SchemaAgreementChecker the driver will ignore the entries which are missing the schema_version?

If this is ok, I can open a PR to the driver repo.

Environment

None

Pull Requests

None

Activity

Olivier Michallat 
April 23, 2020 at 12:00 AM

The "extended check" in driver 3 also inspected rack and tokens. I think we can include them for consistency, they should be present in a healthy deployment anyway. I'll amend the PR.

Olivier Michallat 
April 22, 2020 at 6:52 PM

to check data_center.

And I agree that we should do something about that map issue: logging a warning an ignoring the second entry sounds reasonable enough (hopefully isPeerValid will have caught the issue before that point). I'll amend the PR.

Zain Malik 
April 20, 2020 at 8:14 AM
(edited)

thanks, Alexandre for having a look into this.

This is not required; the driver reports it in {@link Node#getSchemaVersion()}, but for* informational purposes only

I think this is only true for the NodeInfo.getSchemaVersion(), Driver still uses the schema_version from system.peers row and it's not optional.

I guess datacenter might be a better option also. I only approached schema_version as it was already used to ignore when sending queries.
https://github.com/datastax/java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/internal/core/metadata/SchemaAgreementChecker.java#L126
https://github.com/datastax/java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/internal/core/metadata/SchemaAgreementChecker.java#L153

Alex Dutra 
April 19, 2020 at 6:48 PM
(edited)

Thanks for reporting this.

I think we have two distinct problems to solve:

  1. We use an ImmutableMap.Builder to create the nodes map, and it doesn’t accept duplicate keys. This is a problem per se, since that map is keyed by host id, but host id is not guaranteed to be unique in system.peers. This can be easily fixed by changing the way we build the map.

  2. What to do with duplicate host ids, when one row is evidently spurious.

For #2, my initial thought was that we shouldn’t filter out peer rows when schema_version is null, since, as the javadocs of NodeInfo say,

This is not required; the driver reports it in Node#getSchemaVersion(), but for informational purposes only

But then we are left with a bigger problem: the driver might end up reporting the old IP for that node (and nulls for most node properties), depending on whether it will retain the old, spurious row, or the new, correct one.

That’s why, after some thought, I think we should indeed filter out corrupt rows with empty columns.

However I’m not sure we should check schema_version; I think the best column to check is datacenter – the datacenter is required for most load balancing policies, and as such, it is a crucial information. A node without datacenter is pretty much useless for most policies and will likely end up never being select in query plans. And filtering out rows without datacenter would also fix the present issue.

So I would go with 2 changes:

  1. Replace ImmutableMap.Builder with ImmutableMap.copyOf();

  2. Filter out peers rows without datacenter in DefaultTopologyMonitor.isPeerValid().

what do you think of this?

Fixed

Details

Assignee

Reporter

Fix versions

Priority

Created April 17, 2020 at 7:51 AM
Updated June 21, 2020 at 7:28 PM
Resolved April 23, 2020 at 12:48 AM