We're updating the issue view to help you get more done. 

Connection Trash when core and max connections are different

Description

Final decision:

The pool resizing algorithm was improved to better handle oscillations around "edge" capacities:

  • maintain a total count of in-flight queries per pool: totalInFlight. Also maintain a max of this value over time: maxTotalInFlight.

  • when totalInFlight exceeds our current capacity, add a connection immediately. We estimate the capacity with (nbConnections - 1) * 128 + 100 (100 is a configurable threshold for the last connection, so that we preemptively add the new connection before actually filling up)

  • every 10 seconds, check if we need to shrink the pool: compute the ideal size for maxTotalInFlight. If we are above that, discard some connections. Then reset maxTotalInFlight for the next iteration.

  • discarded connections are not closed immediately, but moved to a "trash", where they stay for 2 minutes (configurable). If we need a new connection within that threshold, a trashed connection can be "resurrected".


Initial ticket description:

In one of the production application, we see repeated attempts to create a new connection, service a request and immediately close that connection. This leads to thousands of connections in TIME_WAIT mode. The app becomes very sluggish and has to be bounced to recover.

Looking at the java driver code, I see the following:

addConnectionIfUnderMaximum() is looking at MaxConnectionsPerHost to decide if it has to open new connections.

returnConnection is looking at CoreConnectionsPerHost to decide if the connection has to be closed.

Based on that, we could experience connection trash due to max and core being different. The app uses default pooling options of core=2 and max=8

Is returnConnection looking at CoreConnectionsPerHost intentional? The java doc for PoolingOptions (http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/PoolingOptions.html) says, When the pool exceeds the maximum number of connections, connections in excess are reclaimed.

We are attempting to make core and max same to see if we can work around this but wanted to make you aware of this connection trash.

See the attached connection-logs-for-one-node.txt that shows all driver logs for a given cassandra node.

Environment

None

Pull Requests

None

Status

Assignee

Andy Tolbert

Reporter

Vishy Kasar

Labels

None

PM Priority

None

Reproduced in

None

External issue ID

None

External issue ID

None

External issue ID

None

External issue ID

None

External issue ID

None

External issue ID

None

Doc Impact

None

Reviewer

None

Size

None

Components

Fix versions

Affects versions

2.0.4

Priority

Major