Improve connection initialization times

Description

When creating a Connection, initializeTransport will send a STARTUP request to the Cassandra node and await its response. Depending on the responsiveness of Cassandra, this may cause the thread executing the request to wait some amount of time, delaying work queued up in that thread's executor (like initializing connection pools for other hosts).

This is normally not a concern, but becomes more impactful when the connection is authenticated. The driver will wait for an authentication response from cassandra, which can take time. I've observed that in a local CCM cluster that it takes cassandra on the order of 60-120ms to respond to an 'AUTH_RESPONSE' with an 'AUTH_SUCCESS' message (has to use bcrypt to get a hash of the password to compare with data from the system_auth.users table). This slows down initialization of connection pools while creating sessions by a large factor.

Here are some numbers comparing cluster.connect() time over 1024 attempts using using no auth vs. using password auth:

3 node CCM cluster - Cassandra 2.0.13, Driver 2.0.9.2, 2 core connections

3 node CCM cluster w/ Authentication - Cassandra 2.0.13, Driver 2.0.9.2, 2 core connections

Environment

None

Pull Requests

None

Activity

Show:
Andy Tolbert
March 28, 2015, 5:36 PM

spans pool creation as well, we would compose the connection creation futures into a pool creation future. The intent is to avoid Cluster.Manager.executor altogether.

, excellent, that sounds great, thanks!

Olivier Michallat
March 28, 2015, 8:44 AM

, referring to your earlier comment:

However, that will only resolve the things at a per host level. So if we are creating pools in parallel, and waiting in an Cluster.Manager.executor thread until all connections are established on a host, we'd only be able to do establish n hosts at a time (where n is the size of the executor).

spans pool creation as well, we would compose the connection creation futures into a pool creation future. The intent is to avoid Cluster.Manager.executor altogether.

Andy Tolbert
March 27, 2015, 11:08 PM

Hi Vishy, sure thing. Here are the results going back to 2 core connections and not discarding the cluster every time, with the session warmup:

With Authentication

Without Authentication

The mean time doubled in each case (as connections are created one at a time).

One thing of note is that with my original configuration NON_BLOCKING_EXECUTOR_SIZE was also greater than the number of hosts I had (4 > 3). Since the pools are created in parallel if it's the only session in that Cluster all host pools were created at the same time. So if I had 8 hosts for example, as you are aware only 4 hosts could be created at a time, which would slow things down even more. I think that is the main thing would not address (although maybe it does and I'm misreading it), do you think it should?

Vishy Kasar
March 27, 2015, 9:34 PM

Thanks Andy. These test results make sense. Now I wonder why did the first test show such a big difference between auth and no-auth cases.

In order to make a proper apple to apple comparison, can you now run the first test

1. after discarding the first few session creations
2. by not creating the cluster every time

Andy Tolbert
March 27, 2015, 9:19 PM

Here are my results with your suggestions. As we suspected, the time it takes to initialize the pool now takes roughly the amount of time to authenticate the connection with some overhead. Another thing I hadn't considered previously was that I was creating the Cluster every time, which added some time in initializing the control connection / collecting information about the cluster, after removing that, my mean time went to 171ms.

Reusing the cluster, reducing the core size to 1, and increase NON_BLOCKING_EXECUTOR_SIZE to > number of hosts, i get the following Results:

With Authentication

Without Authentication

This ~80ms of overhead seems consistent with my profiling of Cassandra, so I don't anticipate anything else is getting in the way on the driver side. In my local environment Bcrypt hashing of the password was taking 60-100ms on the cassandra end. There are two issues open for cassandra to optimize authentication (CASSANDRA-8085, CASSANDRA-7715).

Duplicate

Assignee

Unassigned

Reporter

Andy Tolbert

Labels

None

PM Priority

None

Affects versions

Fix versions

None

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Components

Priority

Major