With the fixes to paging / fetchSize in 2.0.7, I thought I would try using the prefetch logic by adding a sequence like the following wherever the result set might be large and benefit from paging:
Having done this, intermittently where I read all the rows in a logical row spanning partitions, the query stops without returning all the rows.
My schema is essentially the same as described before in CASSANDRA-6825 and CASSANDRA-6826. The difference from CASSANDRA-6825 lies in how I distributed the rows over the partition. In CASSANDRA-6825, I was writing the rows in long runs of 10,000 in a partition, before choosing a partition for the next run. In this case, I was randomly assigning the partition to each row, based on taking the first four hex digits of the ec hash modulo the number of partitions, 7. The query is the same as that described in CASSANDRA-6826:
SELECT ec, ea, rd FROM sr WHERE s = ? AND partition IN ? and l = ? ALLOW FILTERING;
This problem is very intermittent. I expanded my JUnit test program to run this test multiple times. The query to read all the rows could fail to return the correct number in one of the tests, the first, the second, the third, while returning the correct results in the others. It could run successfully all three times, but then later fail all three times, even when using the same test seed and generating identical test data.
If I set the prefetch limit very low, e.g., issue the query for more rows only when we are near the end, when, say, only one row remains, I don't see a failure. If I set the limit higher, typically when 100 rows remain, I can see the failing behavior.
This is not limited to my dual-core laptop. I was able to provoke the same failing behavior on an 8-core desktop system by setting the prefetch point at 500 rows while leaving the query fetch size at 1000 rows.
The only thing that is certain is if I disable prefetching by setting the limit to -1, the correct counts appear.
To drill down on this problem, I added some code to the test validation to check each row in order, to determine exactly when a row was dropped instead of just checking the total count at the end. What I found was that the first dropped rows appeared in partition 0. In the one case I analyzed in detail, the returned row skipped ahead 759 rows in the partition. The expected and actual rows were in the same date range, so I could actually do a select count covering the two endpoints, and it showed 760 rows inclusive. So it appears that, at random points in the ResultSet, we skip ahead a bunch of rows.
DataStax Cassandra 2.0.7
Single node cluster on a dual-core Windows laptop