Control connection does not reconnect when the control host gets removed

Description

Scenario:

  • Control connection is connected to "host4"

  • host4 gets removed from the cluster

  • later, the driver tries to use the control connection (for example, to refresh another host's info). This fails, and no reconnection attempt is scheduled.

Environment

None

Pull Requests

None

Activity

Show:
Olivier Michallat
December 21, 2014, 4:29 PM

The bug came from this check in ControlConnection#signalError:

The assumption that a reconnect was already triggered is not true for removed hosts. The code used to work around that by checking host metadata, but this was accidentally lost in commit 04479ae.

Reestablished the metadata check and also added an explicit reconnect in ControlConnection#onRemove, which seems better than waiting for the connection to fail.

Andy Tolbert
December 22, 2014, 7:49 PM
Edited

Created integration test (https://github.com/datastax/java-driver/pull/260) and validated against 2.0 and 2.1. The test scenario does the following:

  1. Create 3 node CCM Cluster.

  2. Create and initialize a Cluster instance with a single contact point of node1.

  3. Ensures that the current control connection is connected to node1.

  4. Decomission node1.

  5. Ensures that the control connection is connected to another node.

Fixed

Assignee

Andy Tolbert

Reporter

Olivier Michallat

Labels

None

PM Priority

None

Reproduced in

None

Affects versions

Fix versions

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Priority

Critical
Configure