Uploaded image for project: 'DataStax C# Driver for Apache Cassandra'
  1. CSHARP-784

Driver is unable to correctly reestablish connection with previously decommissioned node

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects versions: 3.8.0, 3.10.1
    • Fix versions: 3.11.0, DSE-2.8.0
    • Components: None
    • Labels:
      None
    • Environment:

      Cassandra Driver is used under .NET Framework 4.6.1 on Windows Server.

    • Sprint:
      C# P-NEXT

      Description

      Hello!

      Recently we ran into a very strange driver behaviour.

      After the return of decommissioned node, the driver starts to refresh Nodes status as expected following with an exception:

      2019-07-16 16:22:24,432 2222118 INFO  [T-1cd5933a(-)] [Cassandra] [Cassandra.ControlConnection] Refreshing node list
      2019-07-16 16:22:24,432 2222118 INFO  [T-1cd5933a(-)] [Cassandra] [Cassandra.ControlConnection] Received Node status change event: host 10.217.11.94:9042 is UP
      2019-07-16 16:22:24,432 2222118 INFO  [T-1cd5933a(-)] [Cassandra] [Cassandra.ControlConnection] Received status change event for host 10.217.11.94:9042 but it was not found
      2019-07-16 16:22:24,432 2222118 INFO  [T-1cd5933a(-)] [Cassandra] [Cassandra.ControlConnection] Node list retrieved successfully
      2019-07-16 16:22:24,432 2222118 INFO  [T-1cd5933a(-)] [Cassandra] [Cassandra.ControlConnection] Retrieving keyspaces metadata
      2019-07-16 16:22:24,448 2222133 INFO  [T-1cd5933a(-)] [Cassandra] [Cassandra.ControlConnection] Updating keyspaces metadata
      2019-07-16 16:22:24,448 2222133 INFO  [T-1cd5933a(-)] [Cassandra] [Cassandra.MetadataHelpers.ReplicationStrategyFactory] Replication Strategy class name not recognized: LocalStrategy
      2019-07-16 16:22:24,448 2222133 INFO  [T-1cd5933a(-)] [Cassandra] [Cassandra.MetadataHelpers.ReplicationStrategyFactory] Replication Strategy class name not recognized: LocalStrategy
      2019-07-16 16:22:24,448 2222133 INFO  [T-1cd5933a(-)] [Cassandra] [Cassandra.ControlConnection] Rebuilding token map
      2019-07-16 16:22:24,448 2222133 ERROR [T-1cd5933a(-)] [Cassandra] [Cassandra.Session] Exception while trying borrow a connection from a pool
      System.Net.Sockets.SocketException (0x80004005): A request to send or receive data was disallowed because the socket is not connected and (when sending on a datagram socket using a sendto call) no address was supplied
         at Cassandra.HostConnectionPool.<EnsureCreate>d__58.MoveNext()
      --- End of stack trace from previous location where exception was thrown ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
         at Cassandra.HostConnectionPool.<BorrowConnection>d__36.MoveNext()
      --- End of stack trace from previous location where exception was thrown ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
         at Cassandra.Requests.RequestHandler.<GetConnectionFromHost>d__37.MoveNext()
      2019-07-16 16:22:24,463 2222149 INFO  [T-1cd5933a(-)] [Cassandra] [Cassandra.ControlConnection] Finished building TokenMap for 47 keyspaces and 3 hosts. It took 15 milliseconds.

      The exception itself is repeated every time the driver tries to execute a request on this node thus flooding logs with tons of errors.

      Application restart resolves the error.

      Also, the driver is still able to execute queries

      Steps to reproduce:

      1. Get a cluster of 3 node: 1 DC, 3 Racks (1 node in each Rack)
      2. (Not sure if related, but in my case all keyspaces are with Replication Factor 3)
      3. Make sure that driver established at least one connection with every node (write/read data. Also not sure if related, but operations are executed with LocalQuorum)
      4. Execute node decommission while writing/reading data
      5. Make sure, driver removed decommissioned node (
        [Cassandra.Connections.HostConnectionPool] Host decommissioned. Closing pool #32185163 to <host_ip>:9042

        )

      6. Return the decommissioned node into cassandra ring (remove all data before joining)
      7. Wait for node to be joined
      8. The driver will start to throw exceptions

      UPD: grammar

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              axel12.94 Лев Димов
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: