Possible connection leak when replacing pool while non-core connections are being created

Description

While testing the fix for I identified a similar, yet different connection leak that is harder to reproduce. If a Connection Pool is being replaced/closed, it is possible that another thread that is attempting to borrow a connection may create a non-core connection in the Connection Pool. If the thread responsible for creating the close events has already processed existing connections in the pool, this connection never becomes closed.

Took a heap dump and looked at the Connection in question and found a PooledConnection with the following qualities:

  1. Mark for trash is 'false', implying it was never 'closed' or put into the reaper.

  2. The pool that this connection belongs to has a closeFuture that is only comprised off 2 futures.

  3. Yet, the number of connections in this pool is 3.

To reproduce:

  1. Create a Cluster with PoolingOptions with the minimum number of max simultaneous requests (25), and low timeouts (i.e. 5-20ms)

  2. Submit a bunch of concurrent requests.

  3. Find leaking socket connections by continually executing:

    and look for iport combinations that exist for minutes.

  4. Take a heap dump.

  5. Open heap dump in jvisualvm/jhat and execute the following query:

    where PORT is the port the connection.

  6. Observe whether or not the connection belongs to a closed/closing Pool by looking at the pool#closeFuture reference to see if it is set.

This leak is much less severe than as that issue leaks all connections in a pool, where this one would only leak non-core connections that are created while the pool is being closed (very small window).

Environment

None

Pull Requests

None

Activity

Show:
Olivier Michallat
January 7, 2015, 10:08 AM

I pushed the fix to the java419 branch (since there are some changes in progress on the pool, better avoid another merge).

It was surprisingly quick to reproduce: under high pressure (20 ms timeouts), I could observe 14 connections in that state after 30 seconds. I modified the OQL request slightly:

Andy Tolbert
January 8, 2015, 5:05 PM
Edited

This fix seems to really improve the problem, but doesn't completely fix it. Took ~12 hours of runtime, but was able to produce 2 connection leaks. The scenario seems the same where the close future only has 2 references, but there are 3 connections in the pool. So there may still be a very small timing window where this can happen.

Heap dump attached:

Olivier Michallat
January 9, 2015, 4:22 PM
Edited

Right, I think I see it. The thread that calls closeAsync: a) creates the iterator on collections (in discardAvailableConnections), and b) sets the closeFuture reference that will make isClosed() return true.

Meanwhile the thread that creates the connection: 1) adds it to connections, and 2) checks isClosed().

There is still a race if 1 and 2 happen between a and b.

Andy Tolbert
January 10, 2015, 5:50 PM

After 20 hours of runtime, can no longer reproduce this issue with the latest fix.

Olivier Michallat
February 3, 2015, 1:16 PM

N.B: this is fixed as part of the pull request for (#262).

Fixed

Assignee

Andy Tolbert

Reporter

Andy Tolbert

Labels

None

PM Priority

None

Reproduced in

None

Affects versions

Fix versions

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Components

Priority

Major
Configure