Possible that host is not marked down if last connection is closed during initialization.

Description

It is possible that if a host's last remaining connection fails during initialization that the host will not be marked down.

This is very unlikely to happen in practice and its impact would only be measured in very specific cases. If it were to happen, since the host is still up it would still be included in query plans. When the host would be chosen from a query plan and borrowConnection() is called, the driver would detect it under core connections and create new ones. However, if a user were hoping to be notified as soon as a host goes down, they would not know until a newly spawned connection fails and potentially never if that host is never chosen in a query plan.

Here's is an example of how it can manifest:

1. hostA, which is the current host a control connection is established to, is marked down.
2. The control connection immediately tries reconnecting and chooses hostB.
3. The control connection is opened to hostB and begins initializing (send STARTUP and request to validate clusterName).
4. hostB goes down and its pooled connections are closed. This leaves 1 remaining initializing connection (the control connection on hostB).
5. The control connection to hostB fails to initialize because its connection is reset while writing the clusterName validation request. ChannelCloseListener#operationComplete is called when we force the closeFuture. Since isInitialized is false, the connection doesn't get defuncted, but does get closed. Because of this signalConnectionClosed is called in closeAsync instead of signalConnectionFailure.

This may not always happen, since the connection is still defuncted in writeHandler, but that is not guaranteed to occur before or after closeFuture is notified. I can produce it rarely by doing the following:

1. Create a Cluster with a single contact point (HostA), and 2 nodes in the cassandra cluster.
2. Stop HostA and wait for down event.
3. Stop HostB.
4. Wait for down event on HostB. If the issue occurs, this will never happen.

Environment

None

Pull Requests

None

Activity

Show:
Olivier Michallat
December 7, 2015, 4:30 PM

We've decided to revert this ticket. The change caused another race condition, that could lead to marking a host down if protocol negotiation failed at startup. We're reverting to the previous behavior which is less severe (as explained above, the only impact is that a host could be marked down later than expected). The connection shutdown sequence will be reeaxamined to provide a better fix.

Won't Fix

Assignee

Unassigned

Reporter

Andy Tolbert

Labels

None

PM Priority

None

Reproduced in

2.0.12

Affects versions

Fix versions

None

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Components

Priority

Major
Configure