Request leak when write attempted on defunct connection

Description

Client is running a stress test against the 18 node cassandra cluster. All the cassandra nodes are shutdown one by one with a gap of 30 seconds between each. Client logs the open connections and the in flight queries on each connection every 10 seconds.

Client did not receive onDown event for a specific node so it continued to send the requests to that co-ordinator. However all requests failed from that node with "Write attempt on defunct connection" and LBP routed the request to another available cassandra node. However the inFlight requests count on the connection for that node kept going up. Once it reached 128, it stayed there for ever. Looks like there is a request count leak on "Write attempt on defunct connection"

See the attached log: request-leak.txt. The request leak happens on host1.domain.com


Resolution:

  • fixed a bug where the connection was not marked as "defunct" immediately when the USE query to set the keyspace failed

  • hardened the code for a corner case where a defunct connection was not removed from the connection pool

Environment

None

Pull Requests

None

Status

Assignee

Olivier Michallat

Reporter

Vishy Kasar

Labels

None

PM Priority

None

Reproduced in

2.0.12

Affects versions

Fix versions

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Components

Priority

Major
Configure