Remote hosts never come back up after going down

Description

When a host comes back up after being down, a request processor (unlike a control connection) only notifies the load balancing policy if the host is not ignored.

Sadly some policies ignore remote hosts that are down.

In combination, this means that remote hosts can never be used by a request (processor) after having once gone down.

Test case:

My test case is:

  • Start nodes in two datacenters.

  • Set up a token-aware, dc-aware policy.

  • Take down the local DC. Observe traffic still goes to the remote DC.

  • Take down the remote DC. Observe traffic fails (as expected).

  • Bring up the remote DC. Expect traffic will start going to the remote DC.

In the last step, in 2.14.0, instead I observe that traffic continues to fail. Logs and netstat reveal the control connection is up to the remote DC, but it is not used by the request processor.

I have a simple (two-line) fix for this - see PR https://github.com/datastax/cpp-driver/pull/463

This fix resolves the problem, by always notifying a policy of all host-up events, regardless of the distance of that host. I've tested the fix locally, and it has the desired effect: in the last step of the test case, traffic does go to the remote DC.

Please could you merge this fix and issue a release? This bug is blocking our testing of failure cases for our geographically-distributed application. Thanks.

Environment

Tested with C/C++ driver 2.14.0 on CentOS 7.

Status

Assignee

Unassigned

Reporter

Keith Wansbrough

Labels

None

PM Priority

None

Reproduced in

2.14.0

External issue ID

None

Doc Impact

None

Reviewer

None

Pull Request

None

Size

None

Affects versions

Priority

Major
Configure