Server limits the number of connections to 5. Client bombards the server over one host. At some point control connection is marked down and re-established. You get the following error:
2014-11-13 14:42:33 ERROR Cluster:1564 - Unknown error during reconnection to /ip1:9042, scheduling retry in 16000 milliseconds
java.lang.IllegalArgumentException: rpc_address is not a column defined in this metadata
at com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
at com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
at com.datastax.driver.core.ArrayBackedRow.getInet(ArrayBackedRow.java:239)
at com.datastax.driver.core.ControlConnection.refreshNodeInfo(ControlConnection.java:438)
at com.datastax.driver.core.Cluster$Manager$5.onReconnection(Cluster.java:1540)
at com.datastax.driver.core.AbstractReconnectionHandler.run(AbstractReconnectionHandler.java:124)
The client code is shown below:
PlainTextAuthProvider authProvider = new PlainTextAuthProvider("user", "pwd");
Collection<InetSocketAddress> whiteList = new ArrayList<InetSocketAddress>();
whiteList.add(new InetSocketAddress("ip1", 9042));
WhiteListPolicy policy = new WhiteListPolicy(new DCAwareRoundRobinPolicy("DC1"), whiteList);
Cluster cluster = Cluster.builder().withAuthProvider(authProvider).withLoadBalancingPolicy(policy).addContactPoint("1ip1").build();
cluster.getConfiguration().getPoolingOptions().setMinSimultaneousRequestsPerConnectionThreshold(HostDistance.LOCAL, 0);
cluster.getConfiguration().getPoolingOptions().setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.LOCAL, 1);
cluster.getConfiguration().getPoolingOptions().setMaxConnectionsPerHost(HostDistance.LOCAL, 100);
cluster.getConfiguration().getPoolingOptions().setCoreConnectionsPerHost(HostDistance.LOCAL, 3);
final Session session = cluster.connect("system");
ExecutorService executorService = Executors.newFixedThreadPool(10);
for (int i=0;i<100;++i) {
executorService.execute(new Runnable() {
@Override
public void run()
{ while (true) readSystemPeersAndAssert(session); }
});
}
Uninterruptibles.sleepUninterruptibly(1, TimeUnit.HOURS);
Full trace is attached.
Reproduced against 2.0.8 and 2.1.3 by running a stress test against a single node cassandra cluster and injecting resets on connections between the client and host. Issue manifests reasonably quickly this way (usually in under a minute).
Validated that with the fix on the 2.0 and 2.1 branch that the issue is no longer present.
Reproduced against cassandra version 2.0.12 while trying reconnect
Details:
2015-04-19 16:09:10.918 ERROR [Reconnection-1] driver.core.Cluster:onUnknownException:1577 - Unknown error during reconnection to localhost/127.0.0.1:9042, scheduling retry in 100 milliseconds
java.lang.IllegalArgumentException: rpc_address is not a column defined in this metadata
at com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
at com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
at com.datastax.driver.core.ArrayBackedRow.getIndexOf(ArrayBackedRow.java:69)
at com.datastax.driver.core.AbstractGettableData.getInet(AbstractGettableData.java:169)
at com.datastax.driver.core.ControlConnection.refreshNodeInfo(ControlConnection.java:429)
at com.datastax.driver.core.Cluster$Manager$5.onReconnection(Cluster.java:1553)
at com.datastax.driver.core.AbstractReconnectionHandler.run(AbstractReconnectionHandler.java:92)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
, 2.0.12 must be the version of cassandra you are referring to, correct? What version of the driver are you using? This should be fixed in versions 2.0.9 and 2.1.4+ of the driver.
Andy Tolbert I had a version mistake, so you are saying that in 2.1.4 (and upper) I won't have the exception, then I'll test it, thak you very much
, that is correct, 2.1.4+ should have the fix