Reusing `Cassandra.Data.Linq.Table<T>` for querying returns wrong results

Description

I did some refactoring in code and instead of creating the Table<TEntity object for every operation I thought I could do away with this code duplication.
The result I ended up with was something which worked locally when testing (unit tests, even concurrent and manual tests) but when deploying to production queries sometimes unexpectedly returned no results (even though the data is there and is returned by other queries with different filters).

Since the Table<TEntity> class is not documented as being thread-safe it looks like it’s my own fault that stuff broke. Not sure if it’s actually intended to be thread safe, in this case this would be a bug.
Even if it isn’t a bug, it would be nice if the code could be changed to be either thread-safe or detect if it’s misused and throw an exception stating a likely problem source (like for example List<T> does when iterating while modifying).

I think this is best explained by showing how I changed the code and documenting it’s behavior:

Entity + Mapping

Original Way to Access the Data

After Refactoring

The significant change being that Table<ExportFolderAssignmentCassandraEntity> is now created in the ctor of the class and shared along all methods.

Symptoms

  • .GetAllForServerUser(serverUserId) would correctly return all elements

  • however, retrieving a single element via .Get(serverUserId, Id) would sometimes return nothing instead of the expected element.

    • no elements are being added or removed in the meantime

    • it doesn’t affect all combinations of serverUserId, Id - that is, some queries returned the expected result while others returned nothing

    • inside the same process the results seemed to be consistent; that is, a combination of serverUserId, id which returns no result does not return a result a half an hour later, too.

Environment

  • Apache Cassandra 3.11.13.2

  • Client:

    • .net 6

    • running on Linux

  • CassandraCSharpDriver 3.18.0

Activity

Show:

Joao Reis 
May 15, 2023 at 11:31 AM

Thanks for the update! Yeah I’ll close this and if we see a report similar to this I’ll reopen.

Bruno Juchli 
May 15, 2023 at 10:52 AM
(edited)

@Joao Reis
Sorry.. I have completely forgotten about this.
In the meantime I have tried to reproduce it with a minimal repro, a unit test which concurrently does some adding and reading from a cassandra table, and stops after a while. I used Rider to “run it until failure”. After 35 minutes I stopped.

There’s one obvious difference my test setup has to our real setup: test uses a single cassandra node, while in production we use a cluster.
Also, in the test i’m now using driver version 3.19.2

I think it should be ok to close this issue. In case someone else experiences this problem it could still be reopened…

If someone wants to have a look at the code I tried to reproduce the issue with, it’s here:

Joao Reis 
May 12, 2023 at 10:48 AM

Any news on this?

Bruno Juchli 
December 5, 2022 at 6:04 PM

@Joao Reis
Thanks for having taken the time to look into this. I guess I’ll have to dumb our application down until I achieve a minimal repro.
This will probably take me a while, I’ll report back when I’ve got more info.

Joao Reis 
December 5, 2022 at 4:32 PM

Can you provide a small console application that can reproduce the issue? I have a single file console application that is using your code and inserting + reading data but I’m struggling to find a way to concurrently execute these statements in a way that reproduces this issue.

Cannot Reproduce

Details

Assignee

Reporter

Sprint

Components

Affects versions

Priority

Created December 1, 2022 at 1:08 PM
Updated May 15, 2023 at 11:31 AM
Resolved May 15, 2023 at 11:31 AM