Low throughput sessions can throw "com.datastax.driver.core.exceptions.NoHostAvailableException" after a rolling restart of a cluster.
Description
{bold}Summary:{bold} Low throughput sessions can throw "com.datastax.driver.core.exceptions.NoHostAvailableException" after a rolling restart of a cluster. This is an issue because the driver doesn't attempt to reestablish connections to any available nodes on the first request following the restart.
{bold}Steps to reproduce:{bold} 1) Build cluster object, create session, run queries 2) Stop running queries 3) Roll a restart through the cluster, waiting for each node to come up before restarting the next 4) Run a query. Error thrown: com.datastax.driver.core.exceptions.NoHostAvailableException
{bold}Analysis:{bold} As the rolling restart progresses throught the cluster, connections (c.d.d.c.Connection) are marked as defunct by c.d.d.c.Connection.Dispatcher.channelClosed(). At the end of the rolling restart all connections have been marked as defunct. On the next call to c.d.d.c.RequestHandler.query() all connection throw a c.d.d.c.ConnecitonException from c.d.d.c.Connection.write() because the c.d.d.c.Connection.isDefunct flag is set to true. This causes the connections to be returned to the conneciton pool and be scheduled for reconnection, but that doesn't help save the current request. The current request fails with a c.d.d.c.NoHostAvailableException because all current connections are marked as defunct. This happens in c.d.d.c.RequestHandler.sendRequest() where all connections planned in the query fail because c.d.d.c.RequestHandler.query() returns false. I also plan to add a background thread to the KS REST api that runs a lightweight query periodically. This will prevent all connections from becoming defunct.
Fixed in upcoming 2.0.4 with related ticket Java-204.
Martin Grotzke July 12, 2014 at 6:39 PM
Ok, thanks!
Olivier Michallat July 12, 2014 at 2:34 PM
Edited
You should still set keepalive. The fix I mention (plus other recent changes) improves the behavior of the driver when connections have been dropped. But keepalive is what prevents connections from being dropped in the first place.
We are considering a mechanism to check connections for a future release.
Martin Grotzke July 12, 2014 at 8:41 AM
Should we still set keepAlive, or should it work without this?
Olivier Michallat June 27, 2014 at 2:27 PM
Could you retry that with 2.0.3?
I think the fix for JAVA-367 will solve this problem.
{bold}Summary:{bold}
Low throughput sessions can throw "com.datastax.driver.core.exceptions.NoHostAvailableException" after a rolling restart of a cluster. This is an issue because the driver doesn't attempt to reestablish connections to any available nodes on the first request following the restart.
{bold}Steps to reproduce:{bold}
1) Build cluster object, create session, run queries
2) Stop running queries
3) Roll a restart through the cluster, waiting for each node to come up before restarting the next
4) Run a query. Error thrown: com.datastax.driver.core.exceptions.NoHostAvailableException
{bold}Analysis:{bold}
As the rolling restart progresses throught the cluster, connections (c.d.d.c.Connection) are marked as defunct by c.d.d.c.Connection.Dispatcher.channelClosed(). At the end of the rolling restart all connections have been marked as defunct. On the next call to c.d.d.c.RequestHandler.query() all connection throw a c.d.d.c.ConnecitonException from c.d.d.c.Connection.write() because the c.d.d.c.Connection.isDefunct flag is set to true. This causes the connections to be returned to the conneciton pool and be scheduled for reconnection, but that doesn't help save the current request. The current request fails with a c.d.d.c.NoHostAvailableException because all current connections are marked as defunct. This happens in c.d.d.c.RequestHandler.sendRequest() where all connections planned in the query fail because c.d.d.c.RequestHandler.query() returns false.
I also plan to add a background thread to the KS REST api that runs a lightweight query periodically. This will prevent all connections from becoming defunct.