DowngradingConsistencyRetryPolicy does not work with EACH_QUORUM when 1 DC is down

Description

When a query is made with EACH_QUORUM and 1 DC is down, C* may return an UnavailableException with 0 alive replicas, i.e.:

In this case there are other alive replicas, just not in one of the datacenters. Because of this, DowngradingConsistencyRetryPolicy does not retry as it sees 0 replicas unavailable.

In this case it would be nice if it retried with a lower CL. I'm not certain which CL it would try since Downgrading only retries once, maybe it should go off of # required?


Resolution: we decided to downgrade to ONE when the initial CL was EACH_QUORUM and Cassandra replies with 0 live replicas. This is somehow consistent with what the policy does for other cases, i.e. infer the retry CL only from the number alive in the DC that failed.

This might seem like a big jump, since we retry with a very low CL when possibly a lot more replicas were present (LOCAL_QUORUM or even QUORUM might work in specific situations). For cases where this is not acceptable, we recommend writing a custom policy with the desired behavior.

Environment

None

Pull Requests

None

Status

Assignee

Alexandre Dutra

Reporter

Andy Tolbert

Labels

None

PM Priority

None

Reproduced in

None

Affects versions

None

Fix versions

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Sprint

Priority

Major
Configure