While testing the fix for I identified a similar, yet different connection leak that is harder to reproduce. If a Connection Pool is being replaced/closed, it is possible that another thread that is attempting to borrow a connection may create a non-core connection in the Connection Pool. If the thread responsible for creating the close events has already processed existing connections in the pool, this connection never becomes closed.
Took a heap dump and looked at the Connection in question and found a PooledConnection with the following qualities:
Mark for trash is 'false', implying it was never 'closed' or put into the reaper.
The pool that this connection belongs to has a closeFuture that is only comprised off 2 futures.
Yet, the number of connections in this pool is 3.
To reproduce:
Create a Cluster with PoolingOptions with the minimum number of max simultaneous requests (25), and low timeouts (i.e. 5-20ms)
Submit a bunch of concurrent requests.
Find leaking socket connections by continually executing:
and look for iport combinations that exist for minutes.
Take a heap dump.
Open heap dump in jvisualvm/jhat and execute the following query:
where PORT is the port the connection.
Observe whether or not the connection belongs to a closed/closing Pool by looking at the pool#closeFuture reference to see if it is set.
This leak is much less severe than as that issue leaks all connections in a pool, where this one would only leak non-core connections that are created while the pool is being closed (very small window).
I pushed the fix to the java419 branch (since there are some changes in progress on the pool, better avoid another merge).
It was surprisingly quick to reproduce: under high pressure (20 ms timeouts), I could observe 14 connections in that state after 30 seconds. I modified the OQL request slightly:
This fix seems to really improve the problem, but doesn't completely fix it. Took ~12 hours of runtime, but was able to produce 2 connection leaks. The scenario seems the same where the close future only has 2 references, but there are 3 connections in the pool. So there may still be a very small timing window where this can happen.
Heap dump attached:
Right, I think I see it. The thread that calls closeAsync: a) creates the iterator on collections (in discardAvailableConnections), and b) sets the closeFuture reference that will make isClosed() return true.
Meanwhile the thread that creates the connection: 1) adds it to connections, and 2) checks isClosed().
There is still a race if 1 and 2 happen between a and b.
After 20 hours of runtime, can no longer reproduce this issue with the latest fix.
N.B: this is fixed as part of the pull request for (#262).