Reconnection issues in v3.2.2

Description

Upgrading our cluster to use the NodeJS driver v3.2.2 we noticed a regression in the reconnection mechanism. During the initial establishment of connections with the Cassandra nodes, if one of them times out, the process enters an endless reconnection pattern (with a reconnection delay of 0 seconds), consuming 100% of CPU. There is no such behaviour with v3.2.1, so this is a regression.

Looking at the commits, we suspect that the culprit might be https://github.com/datastax/nodejs-driver/commit/03e82f04239d86a22cf7f0b9fba418729b4f2e8a but have yet to test that theory.

Environment

Debian Jessie
Node v6.9.1
driver v3.2.2
Cassandra v2.2.6

Pull Requests

None

Affects versions

Fix versions

None

Attachments

3

Activity

Show:

Marko Obrovac 
July 6, 2017 at 4:35 PM

Makes sense. I will open a new ticket for the start-up and attach logs when warm-up is enabled.

(I don't seem to be able to close this issue myself, so please go ahead and close it.)

Jorge Bay Gondra 
July 6, 2017 at 2:35 PM

Any updates?
Should we close this one and use a different ticket if there is any unexpected behaviour during startup?

Jorge Bay Gondra 
June 28, 2017 at 8:26 AM

The logs you shared were not using warmup (or at least it didn't look like), so looking at the logs my conclusion is that if you want shorter and more predictable startup periods, you should enable it. This suggestion is more important when using v3.2.2+, as the driver distributes the load between replicas more evenly in some cases since https://datastax-oss.atlassian.net/browse/NODEJS-358#icft=NODEJS-358 was applied. Otherwise, connect() will yield once there is a single valid connection is established and you shouldn't want that in your case (maybe we should make warmup = true default in the driver).

About the CPU and memory usage, we don't see anything out of the ordinary on our end. You can share the output of node --prof-process with us if you see something unexpected during startup.

We do not issue any requests before the client starts up, i.e. before connect() completes

I see... My guess would be that the caller (an external package as its exported) is queuing, ie: adding a high number of then() handlers before it completes?

Marko Obrovac 
June 27, 2017 at 7:41 PM
(edited)

The fix would be to enable warmup

I did try to enable warm-up, but it didn't seem to have changed things: CPU utilisation and memory are still elevated.

and start serving app requests only after the client driver is fully connected / all connection pools are created.

I am not sure I follow. We do not issue any requests before the client starts up, i.e. before connect() completes, so I am a bit confused as to what you were referring to.

Jorge Bay Gondra 
June 26, 2017 at 8:00 AM

Let us know if we can close this issue.

Cannot Reproduce

Details

Assignee

Reporter

Labels

Reproduced in

Priority

Created June 13, 2017 at 9:02 PM
Updated July 6, 2017 at 4:36 PM
Resolved July 6, 2017 at 4:36 PM

Flag notifications