The pool resizing algorithm was improved to better handle oscillations around "edge" capacities:
maintain a total count of in-flight queries per pool: totalInFlight. Also maintain a max of this value over time: maxTotalInFlight.
when totalInFlight exceeds our current capacity, add a connection immediately. We estimate the capacity with (nbConnections - 1) * 128 + 100 (100 is a configurable threshold for the last connection, so that we preemptively add the new connection before actually filling up)
every 10 seconds, check if we need to shrink the pool: compute the ideal size for maxTotalInFlight. If we are above that, discard some connections. Then reset maxTotalInFlight for the next iteration.
discarded connections are not closed immediately, but moved to a "trash", where they stay for 2 minutes (configurable). If we need a new connection within that threshold, a trashed connection can be "resurrected".
Initial ticket description:
In one of the production application, we see repeated attempts to create a new connection, service a request and immediately close that connection. This leads to thousands of connections in TIME_WAIT mode. The app becomes very sluggish and has to be bounced to recover.
Looking at the java driver code, I see the following:
addConnectionIfUnderMaximum() is looking at MaxConnectionsPerHost to decide if it has to open new connections.
returnConnection is looking at CoreConnectionsPerHost to decide if the connection has to be closed.
Based on that, we could experience connection trash due to max and core being different. The app uses default pooling options of core=2 and max=8
Is returnConnection looking at CoreConnectionsPerHost intentional? The java doc for PoolingOptions (http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/PoolingOptions.html) says, When the pool exceeds the maximum number of connections, connections in excess are reclaimed.
We are attempting to make core and max same to see if we can work around this but wanted to make you aware of this connection trash.
See the attached connection-logs-for-one-node.txt that shows all driver logs for a given cassandra node.