Improve speculative retry by waiting for next response if first response is a failure

Description

If we have a slow request and trigger a speculative retry and the first response we get back is a failure (maybe we didn't reach CL or something else), this will trigger the driver to respond back to the application with this failure. But maybe we get a success full response when the next execution is finished, then the driver have responded with a failure even though the request was successful.
In some use cases it might make sense to ignore the first response if it is a failure and wait for the next one if there is more then one outstanding execution of the request.
I think this could be done by extending the retry policy logic so the retry policy could decide to ignore a response and wait for the next response if there is another ongoing execution and the failure might be non-permanent.

Environment

None

Pull Requests

None

Activity

Show:
Olivier Michallat
May 17, 2017, 4:16 PM

If there is another ongoing execution we ignore this response and wait for the next one

That's a simple concept, but in practice it will be complicated to implement. Anything that requires executions to coordinate with each other means more synchronization – and possibly additional shared state – in RequestHandler. I would hold off adding more complexity in that central piece of the code until unless we have clear evidence of a benefit, and as you said the scenario you describe above is a corner case.

we can't trust that an unavailable error means that no attempt to write was made

If you use the default retry policy, the first retry decision on unavailable is tryNextHost, so the response from node B won't be rethrown to the client.

Tommy Stendahl
May 18, 2017, 7:33 AM

I can start to prepare a pull-request, but I have not done much practical coding on this solution and it would require some testing so I'm not sure it can be done quickly. It might also turn out to be as Oliver suggest that this will be to complicated to implement and then I might be better of solving it in my client but I still believe that this will be a nicer solution so I give it a try.

Tommy Stendahl
May 30, 2017, 8:39 AM

I've done some work on this and the logic is really simple but as Oliver pointed out there are concurrency problems that's hard to solve. The code is available in my github branch: java1440.
I don't think its a good idea to solve this by introducing more synchronization so I might have to rethink my approach to the problem.

Tommy Stendahl
August 25, 2017, 8:27 AM

I done some more work on this but I realize that my approach is a dead end, it would require way to much synchronization. I thought would help here but it really doesn’t. So for now I will give up trying to handle this inside the driver, I will try to find a way to handle this in my application code instead.
But there is one thing that I find a bit strange, the ExecutionInfo is not included in the exceptions if a request fails, its only included in the ResultSet. Especially with it would be very helpful to get the ExecutionInfo with the exceptions, is there a reason why it is like this?

Alexandre Dutra
June 21, 2020, 5:11 PM

This ticket has been closed due to more than 2 years of inactivity, as part of an automatic housekeeping procedure.

If you think that this was inappropriate, feel free to re-open the ticket. If possible, please provide any context that could explain why the issue described in this ticket is still relevant.

Thanks for your understanding.

Won't Do

Assignee

Unassigned

Reporter

Tommy Stendahl

Labels

None

PM Priority

None

Affects versions

Fix versions

None

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Priority

Major
Configure