Driver tries to reconnect to a contact point based on its resolved IP, not its original DNS name.

Description

When initializing the Cluster object and adding Contact Points they are then kept as IP addresses ( List<InetSocketAddress> in the Cluster.Manager). Even if contact points were specified as DNS records initially. If the Cassandra node(s) go down and get restarted under different IP address but same DNS records the Driver is unable to re-establish the connection because it is trying to do that against old IP address.

class Manager implements Connection.DefaultResponseHandler {
............................
// Initial contacts point
final List<InetSocketAddress> contactPoints;

This scenario renders Cassandra usage in a clustered environment like Docker+Kubernetes impractical. We have Cassandra DB running in a Docker container on Kubernetes cluster. On the Kubernetes cluster Cassandra is exposed as Kubernetes Service. Clients in other containers connect to it through the Kubernetes service name which is essentially a DNS record for to the IP address of the running container POD. If the POD crashes and is restarted by Kubernetes it may be restarted under different IP address. Due to the above limitation the caller will not be able to reach it as the Driver would still be trying to reconnect to it under old IP address, not trough the Service name.

A the end of the day if the Driver was supplied a DNS name for a contact point it should use it when trying to reconnect, not the cached IP address.

Environment

Docker+Kubernetes on Redhat Linux

Pull Requests

None

Activity

Show:
Jan Bols
July 31, 2018, 10:43 AM

Do you have any workarounds when using cassandra + k8s?

Typically cassandra is deployed as a stateful set on k8s. Stateful sets provide stable names, but not stable ip-addresses to each cassandra-pod (See https://kubernetes.io/docs/tutorials/stateful-application/cassandra/).

As I understand this setup is pretty useless when you want to connect to it using a java client.

Andy Tolbert
July 31, 2018, 8:54 PM
Edited

I think that this is more of a limitation of Cassandra and by association the driver's dependence on it for this. After establishing connection to a contact point the driver doesn't use the DNS names anymore.

The driver uses the C* nodes system.peers table to resolve what nodes to connect to. The peers table reports inet addresses, not DNS names. As olivier mentions, as long as you are able to maintain connection to an existing node, you can retrieve any updates to topology from that node.

Based on user feedback, we have recognized that the system.peers table may not be the be all end of all for node discovery depending on your environment. For the next major version of java driver (4.0, currently in beta), the node topology management is completely pluggable via TopologyMonitor. This would enable users to write their own custom means of discovering node topology. Topology information could be resolved from an external source to Cassandra, so if you need to swap things out, the driver could be made aware of this without needing to maintain connectivity to existing nodes to retrieve it, although this is a scenario we need to test/explore more.

Olivier Michallat
February 13, 2019, 3:43 AM

Additionally, we added a config option to keep IP addresses unresolved in driver 4 (JAVA-1978). The address will be re-resolved each time the driver tries to open a new connection (provided that the JDK-level caching settings are also set accordingly), which should help address this use case.

Driver 4.0.0-rc1 will be released a few weeks from now, and we plan to go GA a few more weeks after that, so I'm going to mark this as a duplicate of the other ticket.

Yunzhi Shi
March 28, 2019, 2:14 AM

Does CPP drive has this fix? thanks

Michael Fero
April 3, 2019, 12:57 PM

has not been resolved at the moment. To track the progress of this issue for the C++ you can navigate to this JIRA issue and click Start watching this issue.

Duplicate

Assignee

Unassigned

Reporter

Dmytro Stepanchuk

Labels

None

PM Priority

None

Reproduced in

None

Affects versions

Fix versions

None

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Components

Priority

Major
Configure