DriverChannel.getAvailableIds() is currently based on a volatile variable. Picking a channel from the pool is a check-then-act, and not completely thread-safe in the sense that the in-flight count observed when the channel is picked might have changed by the time a stream id is assigned in InFlightHandler.
This appears to be a problem for high-traffic scenarios where the number of simultaneous requests is higher than max-requests-per-connection. If a large number of requests start at the same time, they might all select the first channel because its observed count is still 0. But by the time they request stream ids, some of them will get rejected and throw BusyConnectionException, while there was still room on another channel.
This can be easily reproduced in a constrained test scenario: max-requests-per-connection = 20, connection.pool.local.size = 2, then launch 40 client threads that synchronously execute the same request in a loop. Errors even persist when the number of connections is increased.
To mitigate this, we can go back to a model closer to 3.x: expose a fully thread-safe counter (atomic integer), that gets incremented as soon as the connection gets acquired. This guarantees that a stream id will be available if the acquisition succeeded.
DriverChannel.getAvailableIds()
is currently based on a volatile variable. Picking a channel from the pool is a check-then-act, and not completely thread-safe in the sense that the in-flight count observed when the channel is picked might have changed by the time a stream id is assigned inInFlightHandler
.This appears to be a problem for high-traffic scenarios where the number of simultaneous requests is higher than
max-requests-per-connection
. If a large number of requests start at the same time, they might all select the first channel because its observed count is still 0. But by the time they request stream ids, some of them will get rejected and throwBusyConnectionException
, while there was still room on another channel.This can be easily reproduced in a constrained test scenario:
max-requests-per-connection
= 20,connection.pool.local.size
= 2, then launch 40 client threads that synchronously execute the same request in a loop. Errors even persist when the number of connections is increased.To mitigate this, we can go back to a model closer to 3.x: expose a fully thread-safe counter (atomic integer), that gets incremented as soon as the connection gets acquired. This guarantees that a stream id will be available if the acquisition succeeded.