Chaining LatencyAwarePolicy and TokenAwarePolicy has different downsides depending on the order of chainging:
LatencyAware(TokenAware(..)) - if all three replicas for a given partition are beyond exclusionThreshold they will be sent to the back of the line by LatencyAwarePolicy, meaning we will end up coordinating through a non-replica coordinator. That replica is still going to have to wait on CL replicas, so we incur an unnecessary hop.
TokenAware(LatencyAware(..)) - TokenAware puts the replicas for the partition in the front of the line, regardless of what the childPolicy has decided. So for practical purposes, this chaining is the same as TokenAware(..) (excluding LAP completely)
In practice what would be ideal is to prefer local replicas to all other coordinators, but within those replicas prefer non-slow nodes. One approach to achieving this would be for TokenAwarePolicy to still prefer local replicas, but retain their ordering from the childPolicy.
For example, if we have nodes A,B,C,D,E,F and C,D,E are replicas for a partition. Let's say B & C are consider slow nodes by LatencyAware (that is, they are beyond exclusionThreshold). LAP would return a query plan like A,D,E,F,B,C. In the current implementation, TokenAware would prepend C,D,E giving us C,D,E,A,F,B. The proposed implementation in this ticket would give us D,E,C,A,F,B: as a replica, C would still be preferred over A,F,B, but D & E would be preferred first.
Given that users tend to make DCAwareRoundRobinPolicy the innermost child policy, when retaining child ordering the shuffle option in TokenAware is redundant.. at least in terms of balanced requests across replicas.
So I propose that the retain-ordering behavior be the only option in TokenAware, replacing the 2 behaviors that exist today.