Consider sharing Connection.Factory.timer HashedWheelTimer between Cluster Instances

Description

The timer belonging to Connection.Factory is a HashedWheelTimer. Since a Connection.Factory is created and shared for each Cluster instance, the driver will ultimately have a HashedWheelTimer for each Cluster instance (ultimately a thread).

The netty documentation states:

Do not create many instances.

HashedWheelTimer creates a new thread whenever it is instantiated and started. Therefore, you should make sure to create only one instance and share it across your application. One of the common mistakes, that makes your application unresponsive, is to create a new instance for every connection.

This isn't that large of a problem since it is not expected that a user will be creating so many Cluster instances, but would be a nice optimization.

There is nothing in the Timer that is necessarily bound to a Cluster instance, so is there any reason why the Driver couldn't share this timer between all Cluster instances? The timer could be created statically within Cluster and passed into Connection.Factory, and it's ThreadFactory configured for creating Daemon threads.

Environment

None

Pull Requests

None

Activity

Show:
Alex Dutra
June 21, 2020, 5:32 PM

This ticket has been closed on 2020-06-21 due to prolonged inactivity, as part of an automatic housekeeping procedure.

If you think that this was inappropriate, feel free to re-open the ticket. If possible, please provide any context that could explain why the issue described in this ticket is still relevant.

Also, please note that enhancements and feature requests cannot be accepted anymore for legacy driver versions (OSS 1.x, 2.x, 3.x and DSE 1.x).

Thank you for your understanding.

Andy Tolbert
September 9, 2015, 3:32 PM

With introduced in 2.0.11 we can now customize the construction of HWTs using NettyOptions so now it is possible for users to share them between Cluster instances if they want to.

After doing some profiling, I really wonder if we should consider making sharing the HWT the default? On the other hand I do still agree with 's comment about sharing static non-immutable resources. In this case the HWT would be tied to the default NettyOptions instance if NettyOptions is not overriden. The timer thread is almost always entirely idle, even when sharing between many Cluster instances. The amount of work it does is going to be roughly tied to the number of speculative executions bound to Cluster instances using it.

Another argument you could make against this is that you we shouldn't be inconsistent between sharing the HashedWheelTimer and not sharing the EventLoopGroup. So I think either we should by default share both or share neither. In any case, you should really only be creating 1 Cluster instance for each C* cluster you are connecting to, so in the general case you should only be creating 1 Cluster instance.

Andy Tolbert
March 17, 2015, 3:06 PM
Edited

, we resolved the issue. There is an issue in Netty 3.9.0final where it would not decrement it's timer count if the timer was never used so netty was incorrectly logging that there were > 256 timers in use.

Olivier Michallat
March 17, 2015, 3:02 PM

The initial report was from someone using a Clojure wrapper, I'll dig into that as well to check why it's creating so many Cluster instances.

Andy Tolbert
March 17, 2015, 2:58 PM

Imo, using staticly shared non-immutable ressources in a library has a bad smell and I think having a Cluster instance being as isolated a unit as possible is a good rule of thumb design wise.

Hrmm, I think you've convinced me. I also think the optimization (switching from timer-per-cluster to 1 timer) could also introduce other unanticipated issues/contention. The recommendation to create only one instance and share it from the netty doc is directed at a per-connection scope, and cluster scope is much less specific/wasteful.

I do think sharing the timer between the Cluster and the Cluster's ChannelFactory might be an ok improvement though. Since we'll be upgrading to Netty 4.x soon and the implementation has changed this all may be moot, but it's something I'll keep in mind while testing.

Won't Do

Assignee

Unassigned

Reporter

Andy Tolbert

Labels

None

PM Priority

None

Affects versions

Fix versions

None

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Components

Priority

Minor