Connection Trash when core and max connections are different
The pool resizing algorithm was improved to better handle oscillations around "edge" capacities:
maintain a total count of in-flight queries per pool: totalInFlight. Also maintain a max of this value over time: maxTotalInFlight.
when totalInFlight exceeds our current capacity, add a connection immediately. We estimate the capacity with (nbConnections - 1) * 128 + 100 (100 is a configurable threshold for the last connection, so that we preemptively add the new connection before actually filling up)
every 10 seconds, check if we need to shrink the pool: compute the ideal size for maxTotalInFlight. If we are above that, discard some connections. Then reset maxTotalInFlight for the next iteration.
discarded connections are not closed immediately, but moved to a "trash", where they stay for 2 minutes (configurable). If we need a new connection within that threshold, a trashed connection can be "resurrected".
Initial ticket description:
In one of the production application, we see repeated attempts to create a new connection, service a request and immediately close that connection. This leads to thousands of connections in TIME_WAIT mode. The app becomes very sluggish and has to be bounced to recover.
Looking at the java driver code, I see the following:
addConnectionIfUnderMaximum() is looking at MaxConnectionsPerHost to decide if it has to open new connections.
returnConnection is looking at CoreConnectionsPerHost to decide if the connection has to be closed.
Based on that, we could experience connection trash due to max and core being different. The app uses default pooling options of core=2 and max=8
Is returnConnection looking at CoreConnectionsPerHost intentional? The java doc for PoolingOptions (http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/PoolingOptions.html) says, When the pool exceeds the maximum number of connections, connections in excess are reclaimed.
We are attempting to make core and max same to see if we can work around this but wanted to make you aware of this connection trash.
See the attached connection-logs-for-one-node.txt that shows all driver logs for a given cassandra node.
Not sure if this is related to the original issue, could you start a new discussion on the mailing list with a bit more details?
@Olivier Michallat : Got it!!! Realised it later when I rechecked the issue description . My problem was quite different I feel. I tried to do a inter data center write using defaults for PoolingOptions. I have used DCRoundRobin with WhiteListPolicy . So , my local quorum insert was constrained to remote DC. After few hours, I started getting NoHostException but all the nodes in cassandra are up and running in the other centre . When I observed the connections in my app server, I found a lot of TCP connections in either timed_wait or established, which resulted in TCP drops .
Is there something like lag when it comes to inter data centre actions( All my actions happen through a private link which is 10 times slower when compared to local writes). Write were of high intensity as this was a data migration from mysql to cassandra .
: the fix changes the algorithm used to resize the connection pools. The previous algorithm could lead to connection leaks (see initial description).
This is not related to the number of sessions. Each session has one connection pool per host. The fix will apply to all connection pools.
Ran 5 hour load test where the concurrent number of requests varied by a 'max query permit' setting. This setting was adjusted randomly every 5 minutes. The following graph plots the number of open connections per host against the number of query permits. The solid green line represents the concurrent query permits at that time. As the concurrent query permits adjusts, the number of connections per host correctly grows / shrinks shortly thereafter.
The cluster was configured with default settings (2 core, 8 max connections, 100 requests per connection) with exception to the idle timeout being configured at 20 seconds down from 2 minutes.
My cassandra version is 2.0.3 . I would be using a session for each keyspace. But I am thinking to reduce the number of sessions based on usage as each session is creating connections equal to the number of cores.
I have six keyspaces right now . I am going by driver defaults . Just thinking if I need to set the pooling option for each session. So , as a first cut, you are planning to support for a single session case alone? Can you please elaborate on the fix?