Consider backing off to nio if epoll is present but fails to load

Description

Currently the java driver will fall back on NIO if it detects that io.netty.channel.epoll.Epoll is not on the classpath.

However, if the user has Epoll in their classpath but it fails to load because of incompatibilities between netty jars in their classpath the driver fails to intialize a Cluster.

An example of this happening is if the user has an older version of netty-all in their classpath (possibly pulled in by dependent libraries like hadoop), but a newer version of netty-transport in their classpath that was pulled in by the driver. Since netty-all is an uber jar of all the netty libraries it includes netty-transport-native-epoll. In such a situation, the user will see an exception like this:

A couple of users reported this after upgrading to 3.3.0 (which uses Netty 4.0.47.Final instead of Netty 4.0.44.Final like 3.2.0).

Since in this case the user doesn't have epoll in their classpath because they want to explicitly use it, it's just because a dependent library happened to pull it in as a transitive dependency we should fallback gracefully to NIO and inform the user of this.

Environment

None

Pull Requests

None

Activity

Show:
Alex Dutra
June 21, 2020, 5:51 PM

This ticket has been closed on 2020-06-22 due to prolonged inactivity, as part of an automatic housekeeping procedure.

If you think that this was inappropriate, feel free to re-open the ticket. If possible, please provide any context that could explain why the issue described in this ticket is still relevant.

Also, please note that enhancements and feature requests cannot be accepted anymore for legacy driver versions (OSS 1.x, 2.x, 3.x and DSE 1.x).

Thank you for your understanding.

Andy Tolbert
August 18, 2017, 8:19 PM

I think the best solution to this would be to not make attempting to load with Epoll the default. The issue described is only one of the ways this can fail, and it has failed somewhat less obviously for users. I.E. the epoll code loads and works after a time and then throws an unexpected exception in the event loop. It would be difficult to handle this for all cases.

I propose the following:

  • version 4.0+: Don't attempt to load the epoll code at all, since we let users override the event loop themselves, this is something they can opt to do, but since the benefits of netty-transport-native-epoll are not very apparent for most users and it has caused a lot of issues, it shouldn't be the default behavior.

  • version 3.x: Leave things as is, add a FAQ entry describing the issue. I think changing the default in a minor/patch version is not the best idea since this has always worked transparently (if it worked), so we shouldn't change it for the sake of changing it.

Won't Do

Assignee

Unassigned

Reporter

Andy Tolbert