Improve connection initialization times
Description
When creating a Connection, initializeTransport will send a STARTUP request to the Cassandra node and await its response. Depending on the responsiveness of Cassandra, this may cause the thread executing the request to wait some amount of time, delaying work queued up in that thread's executor (like initializing connection pools for other hosts).
This is normally not a concern, but becomes more impactful when the connection is authenticated. The driver will wait for an authentication response from cassandra, which can take time. I've observed that in a local CCM cluster that it takes cassandra on the order of 60-120ms to respond to an 'AUTH_RESPONSE' with an 'AUTH_SUCCESS' message (has to use bcrypt to get a hash of the password to compare with data from the system_auth.users table). This slows down initialization of connection pools while creating sessions by a large factor.
Here are some numbers comparing cluster.connect() time over 1024 attempts using using no auth vs. using password auth:
3 node CCM cluster - Cassandra 2.0.13, Driver 2.0.9.2, 2 core connections
3 node CCM cluster w/ Authentication - Cassandra 2.0.13, Driver 2.0.9.2, 2 core connections
Environment
Pull Requests
Activity
spans pool creation as well, we would compose the connection creation futures into a pool creation future. The intent is to avoid Cluster.Manager.executor altogether.
, excellent, that sounds great, thanks!
, referring to your earlier comment:
However, that will only resolve the things at a per host level. So if we are creating pools in parallel, and waiting in an Cluster.Manager.executor thread until all connections are established on a host, we'd only be able to do establish n hosts at a time (where n is the size of the executor).
spans pool creation as well, we would compose the connection creation futures into a pool creation future. The intent is to avoid Cluster.Manager.executor altogether.
Hi Vishy, sure thing. Here are the results going back to 2 core connections and not discarding the cluster every time, with the session warmup:
With Authentication
Without Authentication
The mean time doubled in each case (as connections are created one at a time).
One thing of note is that with my original configuration NON_BLOCKING_EXECUTOR_SIZE was also greater than the number of hosts I had (4 > 3). Since the pools are created in parallel if it's the only session in that Cluster all host pools were created at the same time. So if I had 8 hosts for example, as you are aware only 4 hosts could be created at a time, which would slow things down even more. I think that is the main thing would not address (although maybe it does and I'm misreading it), do you think it should?
Thanks Andy. These test results make sense. Now I wonder why did the first test show such a big difference between auth and no-auth cases.
In order to make a proper apple to apple comparison, can you now run the first test
1. after discarding the first few session creations
2. by not creating the cluster every time
Here are my results with your suggestions. As we suspected, the time it takes to initialize the pool now takes roughly the amount of time to authenticate the connection with some overhead. Another thing I hadn't considered previously was that I was creating the Cluster every time, which added some time in initializing the control connection / collecting information about the cluster, after removing that, my mean time went to 171ms.
Reusing the cluster, reducing the core size to 1, and increase NON_BLOCKING_EXECUTOR_SIZE to > number of hosts, i get the following Results:
With Authentication
Without Authentication
This ~80ms of overhead seems consistent with my profiling of Cassandra, so I don't anticipate anything else is getting in the way on the driver side. In my local environment Bcrypt hashing of the password was taking 60-100ms on the cassandra end. There are two issues open for cassandra to optimize authentication (CASSANDRA-8085, CASSANDRA-7715).