SessionManager.toPreparedStatement() hangs when thread limit reached

Description

Hello,

We got an issue when preparing statements under high activity. When number of threads limit is reached on the server, SessionManager.toPreparedStatement() hangs with RejectedExecutionException in Futures.transform() method.

Here is the callstack with driver v2.0.8:

I've found an interesting article on the issue:
http://www.concurrentaffair.org/2012/10/27/problems-with-rejectedexecutionhandler-and-futures/

Could you please investigate?
Thanks in advance for your help.
Minh

Environment

C* v2.0.8, 4 nodes, RF=4
Java Driver v2.0.8

Pull Requests

None

Activity

Show:
Ngoc Minh
November 26, 2014, 11:10 AM

By activing the Debug logs, we found out that we "share" Cluster among our jobs. Here is how a job is implemented:
1. In job creation:

2. At the end of a job, we perform:

In the DEBUG log, we got: (full log attached to the ticket)

So, it seems that we did not understand how to create a Cluster and/or Session and how to close them. Could you please help?

Thanks in advance.

Olivier Michallat
November 26, 2014, 12:52 PM

build() creates a new cluster each time, so my guess is that your initialization code is shared among all your jobs. Can you add a log statement in there to check how many times it is called?

Ngoc Minh
November 26, 2014, 1:04 PM

It is sure that we call build() for each job (cf. attached logs):

The thread qtp13890541-49 closed its cluster:

But it seems that other "clusters" used by other jobs (55, 48, 49) are closed as well:

Ngoc Minh
November 26, 2014, 1:21 PM

In the log, in high activity, we got:

  • 15 jobs created

  • 15 "Starting new cluster with contact points" -> each job creates its own cluster with ClusterBuilder.build()

  • 60 "New Cassandra host" -> OK because we have 4 nodes

  • 4 "Shutting down" -> 4 cluster were shutdown at the end of the jobs

  • 10 "Cluster already closed" -> 10 jobs are terminated (but could not close the cluster)

  • 1 exception related to the remaining job (15 - 4 - 10 = 1) -> it causes the job hangs -> leaked resources.

Thanks again.

Ngoc Minh
November 26, 2014, 2:07 PM

Hello,

We found out where is the problem: some created clusters are shared among our jobs due to incorrect job initialization.

Thanks a lot for your help!

Could you please close the ticket?
Minh

Not a Problem

Assignee

Olivier Michallat

Reporter

Ngoc Minh

Labels

None

PM Priority

None

Reproduced in

None

Affects versions

Fix versions

None

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Components

Priority

Critical
Configure