Netty upgrade to 4.x
Description
Environment
Pull Requests
relates to
Activity
We noticed that Netty 4.0 sets nodelay to on by default, where previously they did not (pr 939). A user has the option to configure tcp nodelay in the Driver via SocketOptions.setTcpNoDelay. Currently, if a user does not set this value, it is configured at the discretion of netty or the default behavior of java sockets. We are considering whether or not we should not take any action, so I ran some tests with no delay on and off to see if there was any performance difference. Note that the impact of disabling nagle by enabling tcp no delay is very environmental, so I performed this test on a system that is in the same EC2 region as my cassandra cluster (us-east-1) and on one that wasn't (us-west1).
This test used the same test as the previous tests documented in the comments. The environment used was the following:
8 node cassandra cluster running 2.0.13 on c3.2xlarge instances in us-west-1.
Driver clients running on c3.2xlarge instance in each us-east-1 and us-west-1 zones.
Each test configuration was executed for 10 minutes and were executed against the java561 branch.
Client Location | No Delay | Requests Completed | Throughput |
---|---|---|---|
us-west-1 | on | 41767342 | 69155/sec |
us-west-1 | off | 41534979 | 68209/sec |
us-east-1 | on | 24443146 | 40411/sec |
us-east-1 | off | 19740311 | 32689/sec |
Interestingly enough, a performance improvement is more apparent in the remote client, with a throughput improvement of roughly 19%. To ensure this wasn't an anomaly, I ran the test several times and observed similar performance. In the local datacenter, there is an improvement, but only marginal (1.3%). It's possible we were pushing the cassandra cluster to its limits when using a local client and any possible increase would not be as noticeable.
My assertion is that the benefit of nagle's algorithm (tcp no delay = off) is really only useful on slow links / high latency networks (such as a mobile network). I would consider it a more common use case that the client is either in the same datacenter of the cassandra cluster, or has a pretty decent/reliable connection to it.
Executed the same test with the client running on n1-standard-8:
Test | Requests Completed | Throughput |
---|---|---|
2.0.9.2 | 22000475 | 36667/sec |
java622 | 21997387 | 36662/sec |
w/o coalescing | 21221093 | 35368/sec |
w/ epoll | 21375001 | 35625/sec |
Running with a larger capacity instance, the CPU usage is pretty much the same between all the tests (~75-80%). I think this further confirms that I was pretty much pushing an n1-standard-1 to it's limits previously. Contrary the previous test, more pressure is being put on the cassandra end with a larger client instance, enough such that it is the bottleneck. As the 'java622' configuration performs about the same or better as '2.0.9.2' in both cases where either the client is overtaxed or the cluster is overtaxed I think we can conclude this the netty 4.x upgrade does not inhibit performance in any negative way.
Verified against java622 branch that the driver behaves correctly after upgrading to netty 4.0.26 final. I also ran a mixed-request load duration test in the following environment:
8 node cassandra cluster running 2.0.13 on n1-standard-1 instances.
Driver client running on n1-standard-1 instance.
Each test configuration was executed for 10 minutes
Test | Requests Completed | Throughput |
---|---|---|
2.0.9.2 | 10483857 | 17473/sec |
java622 | 11141451 | 18569/sec |
w/o coalescing | 4494692 | 7491/sec |
w/ epoll | 11154753 | 18591/sec |
The 'without coalescing' configuration was inhibited by CPU utilization. The client ran on an n1-standard-1 instance, which is only 1 VCPU, which was completely consumed throughout the test. Without using coalescing, a write syscall is made for each write, which is very intensive. All other configurations use about the same amount of ~75-85% CPU. I suspect if I were to try on a larger instance, the w/o and w/ coalescing numbers would be more in line with one another (see the test results in https://datastax-oss.atlassian.net/browse/JAVA-562#icft=JAVA-562). I will validate this tomorrow, but this provides evidence of a tangible benefit of using the coalescing configuration as default.
Using netty-transport-native-epoll did not seem to offer that much benefit. I suspect this could be because epoll provides more value when there are many socket connections. With an 8 node cluster, there are 69 connections (8 per node * 8 nodes + 1 control connection).
The Java driver currently uses Netty 3.9.
Upgrading to 4.x will bring valuable improvements like buffer pooling.
Note: the maven-shade-plugin configuration (introduced by JAVA-538) will have to be upgraded. In particular, Netty 4 uses a new package name.