Reconnection issues in v3.2.2
Description
Environment
Debian Jessie
Node v6.9.1
driver v3.2.2
Cassandra v2.2.6
Pull Requests
Affects versions
Fix versions
Attachments
Activity
Marko Obrovac July 6, 2017 at 4:35 PM
Makes sense. I will open a new ticket for the start-up and attach logs when warm-up is enabled.
(I don't seem to be able to close this issue myself, so please go ahead and close it.)
Jorge Bay Gondra July 6, 2017 at 2:35 PM
Any updates?
Should we close this one and use a different ticket if there is any unexpected behaviour during startup?
Jorge Bay Gondra June 28, 2017 at 8:26 AM
The logs you shared were not using warmup
(or at least it didn't look like), so looking at the logs my conclusion is that if you want shorter and more predictable startup periods, you should enable it. This suggestion is more important when using v3.2.2+, as the driver distributes the load between replicas more evenly in some cases since https://datastax-oss.atlassian.net/browse/NODEJS-358#icft=NODEJS-358 was applied. Otherwise, connect()
will yield once there is a single valid connection is established and you shouldn't want that in your case (maybe we should make warmup = true
default in the driver).
About the CPU and memory usage, we don't see anything out of the ordinary on our end. You can share the output of node --prof-process
with us if you see something unexpected during startup.
We do not issue any requests before the client starts up, i.e. before connect() completes
I see... My guess would be that the caller (an external package as its exported) is queuing, ie: adding a high number of then()
handlers before it completes?
Marko Obrovac June 27, 2017 at 7:41 PM(edited)
The fix would be to enable warmup
I did try to enable warm-up, but it didn't seem to have changed things: CPU utilisation and memory are still elevated.
and start serving app requests only after the client driver is fully connected / all connection pools are created.
I am not sure I follow. We do not issue any requests before the client starts up, i.e. before connect()
completes, so I am a bit confused as to what you were referring to.
Jorge Bay Gondra June 26, 2017 at 8:00 AM
Let us know if we can close this issue.
Upgrading our cluster to use the NodeJS driver v3.2.2 we noticed a regression in the reconnection mechanism. During the initial establishment of connections with the Cassandra nodes, if one of them times out, the process enters an endless reconnection pattern (with a reconnection delay of 0 seconds), consuming 100% of CPU. There is no such behaviour with v3.2.1, so this is a regression.
Looking at the commits, we suspect that the culprit might be https://github.com/datastax/nodejs-driver/commit/03e82f04239d86a22cf7f0b9fba418729b4f2e8a but have yet to test that theory.