Pagination does not work as expected

Description

With 2 different clusters, we don't get the expected result when reading data as the pagination stops before the end of the available data (with some other clusters, and less data, we don't have any problem. All clusters being installed from the same procedure).

The CQL is the following: SELECT * FROM XYDataTable WHERE XYDataIdColumn = <id> AND XYTimeColumn > <date> LIMIT 100000
using a SimpleStatement
With the following table: CREATE TABLE IF NOT EXISTS XYDataTable (XYDataIdColumn text, XYTimeColumn timestamp, XYValueColumn double, PRIMARY KEY(XYDataIdColumn, XYTimeColumn))

  1. Using an automated pagination, we usually get 20K rows but sometimes 25K (I know it's a multiplier of 5000 which is the default page size) starting at the beginning of 3 millions rows.

  2. Changing the implementation to a manual pagination (i.e. statement.SetAutoPage(false).SetPagingState(rows.PagingState)) we usually get 1 or 2 complete chunks of 100K rows and then it stops in the middle of the 3rd one (this time we don't even get a multiplier of 5000)

Note that using a tool like TablePlus, I can read all data without any problem whatever is the starting point

Environment

Cassandra community edition 3.11.4 (cluster 3 nodes)
DataStax C# driver 3.16.1

Attachments

1

Activity

Show:

Joao Reis 
December 5, 2024 at 3:52 PM

Good news! It’s great that you were able to found the issue I’d suggest keeping driver logging enabled to potentially help debugging future driver issues but maybe reducing the level to info or warning to reduce the “log spam”

xavier le galles 
December 5, 2024 at 3:50 PM

Hi again,
Just to inform you that we found the problem which has nothing to do with the driver. The problem comes from the server (3x instances) which has some major inconsistencies whereas we use a QUORUM level. By using a specific tool, we are able to determine that 2x instances miss some data while the 3rd one has some for a given key. We are not responsible for the hosting and we suspect that the client IT did something weird with the persistent volumes. Because, the server does not complain at all and does not even seem to be aware of the problem. I cannot believe that Cassandra suffers from such a problem… We now try to repair everything carefully.

xavier le galles 
December 3, 2024 at 6:06 PM

It’s info as requested in your previous message. I will try Debug then.

Joao Reis 
December 3, 2024 at 3:28 PM

What log level are you setting on serilog? Try info or debug or verbose

xavier le galles 
December 3, 2024 at 3:01 PM

Hi, sorry to bother you but we have activated logs on the Cassandra session and we are a bit surprised by the results: actually, we get only some logs (few) when sending commands, but we get nothing when querying. We have proper logs when opening the session but then very few.
The problem we face is mainly during the querying of the DB and not during updates. So I’m afraid it’s not going to help. The level is information for the namespace Cassandra.
thanks for your help
Diagnostics.AddLoggerProvider(new SerilogLoggerProvider(Log.Logger));

Cannot Reproduce

Details

Assignee

Reporter

Sprint

Affects versions

Priority

Created March 15, 2021 at 8:17 AM
Updated December 5, 2024 at 3:52 PM
Resolved October 26, 2021 at 2:00 PM