reconnection_policy is not enforced in python client?

Description

The issue:

1. I have a single node Apache Cassandra cluster 3.11.6
2. I write 2750 rows by 10000 columns each first in apache airflow DAG task (this is successfully passed and data is persisted into the Cassandra)
3. Then immediately after I try to connect to Cassandra to perform various reads inside another set of parallel tasks of the same airflow DAG and it fails with
`ERROR - ('Unable to connect to any servers', {'10.0.1.135:9042':
OperationTimedOut('errors=None, last_host=None')})`
4. I have configured `reconnection_policy=ConstantReconnectionPolicy(delay=10)` in connection.setup* but it seems it is *not* enforced or I am misreading how the "retry" is supposed to work on the client-side.
5. `nodetool status` shows UN, means UP/Normal.
6. I have multiple DAG tasks running in parallel pulling info from Cassandra. Some of them finish successfully (green), but some do fail because of the OperationTimedOut exception.

Log

I don't get retries with failed connections, you can see this in the following apache airflow log:

  • it *started* at [2020-07-23 01:06:27,926]

  • and it *errored* at [2020-07-23 01:06:30,951]

which is just 3 seconds. However, in the connection.setup, I set:

Python code

settings.CASSANDRA is

get_session():

cassandra.yaml

Environment

Centos 7
Cassandra 3.11.6
Single node apache Cassandra cluster

Pull Requests

None

Activity

Show:
Alan Boudreault
July 23, 2020, 12:25 PM

Hi ,

It is a bit hard to understand what's going on here because I don't know what is that Task DjangoOperator (we are not the maintainers of the django-cassandra project). This in the log makes me think you are trying to reuse an existing session or already initialized Cluster in another process:

is this the case? If so, this might be the issue. A session per process is required. About the reconnection policy, it is not going to kick in before you have an initialized and connected session. From the logs, we can see that it's the first connection (Cluster.connect) that fails.

Incomplete

Assignee

Unassigned

Reporter

Dmitry Semenov

Fix versions

None

Reproduced in

None

PM Priority

None

External issue ID

None

Doc Impact

None

Reviewer

None

Size

None

Pull Request

None

Affects versions

Priority

Critical
Configure