Driver hangs when faces dead nodes on peers table

Description

This is similar to CSHARP-296.

The driver hangs when it faces a dead node entry on system.peers.

Environment

None

Pull Requests

None

Activity

Show:
Minh Do
March 15, 2016, 11:24 PM

peer | rpc_address | schema_version | tokens | workload
------------------------------------------------------------------------------------------------------------------------------
54.217.2.68 | 10.241.5.19 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'372748112'} | Cassandra
54.170.8.129 | 10.37.13.231 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'113427455640312821154458202477628818595'} | Cassandra
46.51.18.106 | 10.106.15.242 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'56713727820156410577229101239000783353'} | Cassandra
54.146.21.231 | 10.61.24.2 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'113427455640312821154458202479064646083'} | Cassandra
54.92.16.11 | 10.13.3.19 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'141784319550391026443072753098378663704'} | Cassandra
54.184.18.2 | null | null | null | null
54.145.4.93 | 10.84.3.163 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'56713727820156410577229101240436610841'} | Cassandra
54.188.79.99 | 10.232.61.244 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'28356863910078205288614550621281390513'} | Cassandra
54.218.35.251 | 10.251.1.143 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'1967372893'} | Cassandra
54.155.58.36 | 10.72.10.89 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'85070591730234615865843651858314800974'} | Cassandra
54.82.9.3 | 10.157.49.3 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'85070591730234615865843651859750628462'} | Cassandra
54.196.20.39 | 10.47.18.17 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'28356863910078205288614550621122593220'} | Cassandra
54.220.5.128 | 10.106.13.185 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'141784319550391026443072753096942836216'} | Cassandra
54.214.134.121 | 10.237.30.159 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'113427455640312821154458202479223443376'} | Cassandra
54.203.95.16 | 10.217.218.42 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'56713727820156410577229101240595408134'} | Cassandra
54.160.18.190 | null | null | null | Cassandra
54.203.6.65 | 10.221.11.222 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'141784319550391026443072753098537460997'} | Cassandra
54.212.16.80 | 10.226.114.72 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'85070591730234615865843651859909425755'} | Cassandra
54.217.111.109 | 10.37.141.61 | 6bf198c0-228c-35b4-9e5a-c49d10a44963 | {'28356863910078205288614550619686765732'} | Cassandra

Minh Do
March 15, 2016, 11:28 PM

I also attached a patch that works for us (Netflix). However, note that we are using single token per node so there is no performance in returning tokens.

Andy Tolbert
March 16, 2016, 2:18 AM
Edited

I think (coming in driver 2.1.10 and 3.0.1), should cause the driver to ignore these peers if there are null / missing columns that are expected.

Although I am curious what kind of behavior you are seeing. Does the driver completely stall or does it time out the node? Any thread dumps or stack traces may be helpful to understand the problem better. 2.2.0-rc3 is an old release candidate, so its possible that whatever issue you may be encountering might have been fixed in 3.0.0.

The dead nodes you are referring to are these two right?:

Minh Do
March 16, 2016, 7:04 AM

Hi Andy,

I think the patch in would cover this case. Sorry that I missed this and yeah, we are on an older 2.2 version.

Alex Popescu
June 14, 2016, 6:42 AM

Fixed by

Fixed

Assignee

Unassigned

Reporter

Minh Do

Labels

PM Priority

None

Reproduced in

2.2.0-rc3

Affects versions

Fix versions

None

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Components

Priority

Major
Configure