TokenAwarePolicy should respect child policy ordering

Description

Chaining LatencyAwarePolicy and TokenAwarePolicy has different downsides depending on the order of chainging:

  • LatencyAware(TokenAware(..)) - if all three replicas for a given partition are beyond exclusionThreshold they will be sent to the back of the line by LatencyAwarePolicy, meaning we will end up coordinating through a non-replica coordinator. That replica is still going to have to wait on CL replicas, so we incur an unnecessary hop.

  • TokenAware(LatencyAware(..)) - TokenAware puts the replicas for the partition in the front of the line, regardless of what the childPolicy has decided. So for practical purposes, this chaining is the same as TokenAware(..) (excluding LAP completely)

In practice what would be ideal is to prefer local replicas to all other coordinators, but within those replicas prefer non-slow nodes. One approach to achieving this would be for TokenAwarePolicy to still prefer local replicas, but retain their ordering from the childPolicy.

For example, if we have nodes A,B,C,D,E,F and C,D,E are replicas for a partition. Let's say B & C are consider slow nodes by LatencyAware (that is, they are beyond exclusionThreshold). LAP would return a query plan like A,D,E,F,B,C. In the current implementation, TokenAware would prepend C,D,E giving us C,D,E,A,F,B. The proposed implementation in this ticket would give us D,E,C,A,F,B: as a replica, C would still be preferred over A,F,B, but D & E would be preferred first.

Given that users tend to make DCAwareRoundRobinPolicy the innermost child policy, when retaining child ordering the shuffle option in TokenAware is redundant.. at least in terms of balanced requests across replicas.

So I propose that the retain-ordering behavior be the only option in TokenAware, replacing the 2 behaviors that exist today.

Environment

None

Pull Requests

None

Status

Assignee

Unassigned

Reporter

Mike Bulman

Labels

None

PM Priority

A

Affects versions

Fix versions

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Sprint

Priority

Major
Configure