Driver is unable to correctly reestablish connection with previously decommissioned node

Description

Hello!

Recently we ran into a very strange driver behaviour.

After the return of decommissioned node, the driver starts to refresh Nodes status as expected following with an exception:

The exception itself is repeated every time the driver tries to execute a request on this node thus flooding logs with tons of errors.

Application restart resolves the error.

Also, the driver is still able to execute queries

Steps to reproduce:

  1. Get a cluster of 3 node: 1 DC, 3 Racks (1 node in each Rack)

  2. (Not sure if related, but in my case all keyspaces are with Replication Factor 3)

  3. Make sure that driver established at least one connection with every node (write/read data. Also not sure if related, but operations are executed with LocalQuorum)

  4. Execute node decommission while writing/reading data

  5. Make sure, driver removed decommissioned node (

    )

  6. Return the decommissioned node into cassandra ring (remove all data before joining)

  7. Wait for node to be joined

  8. The driver will start to throw exceptions

 

UPD: grammar

Environment

Cassandra Driver is used under .NET Framework 4.6.1 on Windows Server.

Activity

Show:
Joao Reis
July 17, 2019, 8:20 PM

Thanks for the info, I was able to reproduce this. Indeed this happens because the driver will attempt to reuse the old connection pool for the new Host since the IPs are the same. The fix for this should be simple to implement but until this is released, there are two ways to avoid this issue:

1 - Restart the application after removing the node. Then everything will work correctly when the node joins again because the connection pool for the previous node will not exist.
2 - Change the IP of the node after it is decommissioned and shutdown.

Лев Димов
July 17, 2019, 9:22 PM

Yes, We’ve come up with same ways to resolve the issue. There is a risk with restarting applications (not to mention the amount of services and their replicas that are using cassandra) to miss any “useful” errors.

Are there any guesses when the fix might be released?

Thank you!

Joao Reis
July 17, 2019, 9:32 PM

I'll have an update on our plans for the next release before the end of this week.

Лев Димов
July 17, 2019, 9:47 PM

Looking forward for that. Thanks!

Joao Reis
July 19, 2019, 12:55 AM
Edited

The next minor release of the C# drivers (OSS 3.11.0 / DSE 2.8.0) will contain this bug fix. The release date will be somewhere around the last week of July.

Fixed

Assignee

Unassigned

Reporter

Лев Димов

Labels

None

Reproduced in

None

PM Priority

None

Fix versions

External issue ID

None

Doc Impact

None

Reviewer

None

Pull Request

None

Epic Link

None

Sprint

C# P-NEXT

Pull Requests

None

Size

None

Affects versions

Priority

Major