When a host comes back up after being down, a request processor (unlike a control connection) only notifies the load balancing policy if the host is not ignored.
Sadly some policies ignore remote hosts that are down.
In combination, this means that remote hosts can never be used by a request (processor) after having once gone down.
My test case is:
Start nodes in two datacenters.
Set up a token-aware, dc-aware policy.
Take down the local DC. Observe traffic still goes to the remote DC.
Take down the remote DC. Observe traffic fails (as expected).
Bring up the remote DC. Expect traffic will start going to the remote DC.
In the last step, in 2.14.0, instead I observe that traffic continues to fail. Logs and netstat reveal the control connection is up to the remote DC, but it is not used by the request processor.
I have a simple (two-line) fix for this - see PR https://github.com/datastax/cpp-driver/pull/463
This fix resolves the problem, by always notifying a policy of all host-up events, regardless of the distance of that host. I've tested the fix locally, and it has the desired effect: in the last step of the test case, traffic does go to the remote DC.
Please could you merge this fix and issue a release? This bug is blocking our testing of failure cases for our geographically-distributed application. Thanks.
Tested with C/C++ driver 2.14.0 on CentOS 7.