In Akka we have some basic perf tests for our persistence library built on top of the cassandra driver.
We noticed a 40% drop off in write throughput when upgrading from 4.3 to 4.5.
I git bisected the driver to see when it was introduced and it was:
git bisect good
24757424b70b3e7bd889e94e8d1acf313ba70fec is the first bad commit
commit 24757424b70b3e7bd889e94e8d1acf313ba70fec
Author: olim7t <omichallat+github@gmail.com>
Date: Mon Feb 3 16:22:59 2020 -0800
JAVA-2637: Bump Netty to 4.1.45
I've also confirmed this by running with 4.5.0 and overridden netty to 4.1.39.Final
I will log a Netty issue for this, but I'm trying to come up with a simpler reproducing case.
Ideally I'd like something that involves just the driver, not Akka persistence. But the fact is that just executing queries does not easily reproduce the issue, I think it might have to do with the amount of work done in future callbacks.
I'm not very familiar with Akka, could you help me understand the query execution model? From what I've gathered so far:
in CassandraLoadTypedSpec, the main iteration is in testThroughput:
the processor behavior is implemented (mocked?) in object Processor, the message ends up in this method:
from there is gets murky, but I see methods like CassandraJournal.writeMessages which is probably what ends up executing the query.
any news on this front? We are trying to narrow down the scope of code involved in the regression, but we could use your help (see Olivier’s last comment).
We have a reproduction in pure Java, see this cassandra-user ML thread.
I'm raising the priority on this, we'll most likely release a patch version in the next few days to downgrade Netty to 4.1.43. We'll also raise an issue with the Netty project.
Upon further investigation, it's not obvious that Netty is to blame, at least not directly. The driver does its own message coalescing in an attempt to limit the number of I/O syscalls, see DefaultWriteCoalescer. If I replace the current implementation by a "no-op" one (flush after every write), 4.1.43 and 4.1.45 are back to the same order of magnitude.
It looks like something changed in the way the event loop handles scheduled tasks, and that doesn't play well with our coalescer implementation. I suspect this line in particular.
I should also mention that both examples are a bit contrived: executing synchronous requests in a loop means that the coalescer will only ever handle 1 write at at time, that doesn't give it a chance to do its job. If I parallelize the load across multiple client threads, the problem immediately goes away. A perf drop on a basic example is still a bad look though, I will keep investigating to see how we can adapt the coalescer code.
The fix is merged, last run to validate it on the Akka Persistence test:
I'll proceed to release 4.6.1.