Driver should handle better system.peers incosistencies
Description
Environment
Pull Requests
Activity

Olivier Michallat May 12, 2015 at 12:50 PM
The driver's only sources of information are system.peers, actually trying to connect to the nodes, and UP/DOWN notifications received from other nodes. So unfortunately I don't think it can do any better.
CASSANDRA-9180 will prevent these "phantom" rows from being created when bootstrap fails, that should solve your issue.

Former user May 11, 2015 at 10:44 PM
The behavior is exactly what you described.
The driver tries to connect to the nodes and fails.
I thought that the driver could get more information about the nodes status (as in the same way as nodetool) that way the driver could see a mismatch between the 2 (nodetool says that the nodes are nonexistent, system.peers says nodes exist) and issue a warning on that (or avoid to connect to those nodes).

Alex Popescu May 11, 2015 at 10:39 PM
Can you please provide more details about what behavior you are seeing?
From the original description, I understand that the driver attempts to connect to these nodes and fails to do that. The client driver cannot decide what's the real status of those nodes (e.g. maybe it is the client that's in a split network and the nodes are healthy).

Former user May 11, 2015 at 10:22 PM
I understand that the driver tries to connect to the nodes in system.peers. I was recommend to fill this bug similar to the one in the C# driver.
Mailing list thread: http://qnalist.com/questions/6031649/nodes-failed-to-bootstrap-no-nodetool-info-but-system-peer-populated
The question is should the driver handle more gracefully the fact that nodes present in system.peers that are actually nonexistent in Cassandra (CASSANDRA-9180).
Otherwise mark it has invalid.

Andy Tolbert May 11, 2015 at 10:09 PM
Just noticed this:
While trying an application (Datastax Java Driver 2.1) the debug log reports that it tries to connect to Node 5 and 6 and fails.
What exceptions are you seeing? It may be perfectly natural for the driver to try to connect to the nodes if they are present in system.peers.
Details
Details
Assignee
Reporter

I have a 3 node cluster (2.0.14). Decided to add 3 new ones. 2 failed because of hardware failure (virtualized environment).
The process was automated, so what was supposed to happen was:
Node 4 joins
wait until status is UN and then 2min more
Node 5 joins
wait until status is UN and then 2min more
Node 6 joins
wait until status is UN and then 2min more
What happened:
Node 4 joins
Wait...
Node 5 joins
VM fails while node is starting.
VM 6 starts, no node with UN, waits 2min
Node 6 joins
VM fails while node is starting.
After this, nodetool reports 4 nodes all UN
While trying an application (Datastax Java Driver 2.1) the debug log reports that it tries to connect to Node 5 and 6 and fails.
Checking system.peers table, I see both nodes there. So I tried "nodetool removenode <ID>" with the IDs in the table.
It blows up with the following exception:
Exception in thread "main" java.lang.UnsupportedOperationException: Host ID not found.
Then I decided to do the following:
DELETE from peers where ID in (ID1, ID2);
All good, cluster still happy and driver not complaining anymore.
Driver should handle the peers table inconsistencies better. Maybe throw an Warning?