LoadBalancingPolicy.distance() called before init()

Description

In rare cases we see that LoadBalancingPolicy.distance() gets called before init(). We have a custom load balancing policy. Our distance() assumes some data structures are initialized in init(). However if distance gets called before init(), those are not initialized and we get a NPE.

Log for one such case in production is attached.

at mme.cassandraclient.CustomPolicy.distance(CustomPolicy.java:214)
at com.datastax.driver.core.Cluster$Manager.onAdd(Cluster.java:1748)
at com.datastax.driver.core.Cluster$Manager.access$1200(Cluster.java:1103)
at com.datastax.driver.core.Cluster$Manager$7.runMayThrow(Cluster.java:1717)
at com.datastax.driver.core.ExceptionCatchingRunnable.run(ExceptionCatchingRunnable.java:32)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

We had similar error before: JAVA-613: LoadBalancingPolicy onUp/onDown called before init


Fix: buffer control connection events until the initialization is complete, and then replay them.

Environment

None

Pull Requests

None

Activity

Show:
Olivier Michallat
October 21, 2015, 10:10 AM

That's JAVA-954.

Vishy Kasar
October 19, 2015, 9:30 PM

I still see the issue in 2.0.11

at CustomLBP.distance(CustomLBP)
at com.datastax.driver.core.Cluster$Manager.onAdd(Cluster.java:1901)
at com.datastax.driver.core.Cluster$Manager.access$1400(Cluster.java:1242)
at com.datastax.driver.core.Cluster$Manager$3.onReconnection(Cluster.java:1783)
at com.datastax.driver.core.AbstractReconnectionHandler.run(AbstractReconnectionHandler.java:126)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Olivier Michallat
May 29, 2015, 9:18 AM

This relates to since buffering events will probably be similar to debouncing them.

Olivier Michallat
May 29, 2015, 9:16 AM

I think this might be caused by a server-side UP event received on the control connection while it is initializing.
That would trigger an onAdd that calls distance to check if the host is ignored.

One way to solve this would be to buffer events until the initialization is complete, and then replay them.

Fixed

Assignee

Alex Dutra

Reporter

Vishy Kasar

Labels

None

PM Priority

None

Reproduced in

2.0.9.2

Affects versions

Fix versions

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Components

Priority

Major