Driver sometimes does not re-connect to nodes when available

Description

We have 3 hosts in the Cassandra cluster.
When I block all 3 hosts we get the expected messages from the driver that each of the hosts is considered unhealthy.
When one host is unblocked, occasionally (I have not managed to define reliable reproduction steps) the driver is still unable to connect (seems permanent, I have left it in this state for hours) and throws the following error:-

Cassandra.NoHostAvailableException: No host is available to be queried (no host tried)
at Cassandra.Requests.PrepareHandler.<GetNextConnection>d__11.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Cassandra.Requests.PrepareHandler.<Prepare>d__6.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Cassandra.Requests.PrepareHandler.<Prepare>d__4.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Cassandra.Session.<PrepareAsync>d__53.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Cassandra.Mapping.Statements.StatementFactory.<GetStatementAsync>d__8.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Cassandra.Data.Linq.CqlQueryBase`1.<InternalExecuteAsync>d__33.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Cassandra.Data.Linq.CqlQueryBase`1.<ExecuteAsync>d__35.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Cassandra.Data.Linq.CqlQuerySingleElement`1.<ExecuteAsync>d__5.MoveNext()

Environment

None

Activity

Show:
Jorge Bay Gondra
February 27, 2018, 7:08 PM

Reconnection occurs in the background and it's determined by the provided IReconnectionPolicy.

https://docs.datastax.com/en/drivers/csharp/3.4/html/M_Cassandra_Builder_WithReconnectionPolicy.htm

The default reconnection policy is ExponentialReconnectionPolicy that waits exponentially longer between each reconnection attempt.

You should check the driver logs to see what's happening under the hood.

We have several unit and integration tests covering the scenario you've described. I would guess that is related to the default reconnection policy exponentially backing off.

Jorge Bay Gondra
May 10, 2018, 9:58 PM

I'm closing this one as there hasn't been further feedback in the past months.

Incomplete

Assignee

Unassigned

Reporter

Nick Laycock

Labels

None

Reproduced in

3.4.0.1

PM Priority

None

Fix versions

None

External issue ID

None

Doc Impact

None

Reviewer

None

Pull Request

None

Epic Link

None

Sprint

None

Pull Requests

None

Size

None

Components

Affects versions

Priority

Major