Periodic NoHostAvailableException

Description

We are experiencing a periodic NoHostAvailableException (stacktrace attached) which probably starts happening after temporary glitch with connections to Cassandra but the driver is not able to fully recover from the situation. The application throws errors until it is restarted which immediately solves the problem.

I am attaching a file that contains number of such errors in given second and you can see that it happens every minute which seems to be strange. It would seem like there was some periodic task inside the driver which causes the errors at the same time.

Driver version: 2.0.8
Cassandra version: 1.2 and 2.0 (happens on both)

Driver configuation:
val poolingOptions = new PoolingOptions()
.setCoreConnectionsPerHost(HostDistance.LOCAL, 8)
.setMaxConnectionsPerHost(HostDistance.LOCAL, 50)

val socketOptions = new SocketOptions()
.setConnectTimeoutMillis(5000)
.setReadTimeoutMillis(5000)

val clusterBuilder = Cluster.builder()
.addContactPoints(cassandraSeeds: _*)
.withPort(cassandraPort)
.withPoolingOptions(poolingOptions)
.withLoadBalancingPolicy(new LimitedRoundRobinPolicy(cassandraAllowedHosts))
.withReconnectionPolicy(new ConstantReconnectionPolicy(500))
.withRetryPolicy(new LoggingRetryPolicy(DefaultRetryPolicy.INSTANCE))
.withSocketOptions(socketOptions)
.withProtocolVersion(protocolVersion)

Any idea why this could happen? Thanks.

Environment

None

Pull Requests

None

Activity

Show:
Olivier Michallat
December 23, 2014, 4:02 PM

First, I recommend upgrading to 2.0.9, which was just released.

Read the release announcement which explains how to configure the read timeout. I think it is too low in your case.

You should also set core=max on the pool to avoid a known issue (also detailed in the link above). Note that each connection can handle up to 128 simultaneous requests, so 50 is probably more than you need. If you want to monitor how many connections you're really using, see the comments on JAVA-456.

Jakub Janeček
December 23, 2014, 5:24 PM

Thank you for your suggestions. I will try to update the driver and tweak the configuration but probably not sooner than in January for obvious reasons

However, the periodicity seems to be too weird! Is there some kind of periodic task inside the driver? Can't you point us in some direction where we should look for some issues?

Alex Popescu
October 8, 2015, 1:21 AM

Please feel free to reopen if the issue still occurs.

Cannot Reproduce

Assignee

Olivier Michallat

Reporter

Jakub Janeček

Labels

None

PM Priority

None

Reproduced in

None

Affects versions

Fix versions

None

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Components

Priority

Major
Configure