Cassandra filter push down not working properly on 3.0.0

Description

The filter push down seems not working properly on version 3.0.0
I tried to `explain` the query with data type `text`, `int`, `float` and seems most of them cannot pushed down (From the explain message `Cassandra Filters` is empty)

Environment

None

Pull Requests

None

Activity

Show:
Alex Ott
October 20, 2020, 7:23 AM

That’s correct behaviour of Cassandra & Spark Cassandra connector:

  • id couldn’t be used with inequality operators because it’s a partition key and it’s hashed

  • inequality operators usually are applied only to clustering columns (first one, or if all preceding are restricted by equality operation)

  • other predicates may be pushed down if you have supporting structures - secondary indexes (equality only), DSE Search, etc.

Documentation for 2.5.x has bigger set of the rules: https://github.com/datastax/spark-cassandra-connector/blob/b2.5/doc/14_data_frames.md#full-list-of-predicate-pushdown-restrictions, but it’s still not completely correct/clear.

Basically, the rule of thumb is - if you can make your query work as CQL in clash, then it will be pushed by SCC.

Jay Ng
October 20, 2020, 9:29 AM

thx for the explanation. However, still got sth unclear.
Is it means that we can only filter by range predicate only with the first clustering key?
The restriction stated:
Only push down no-partition key column predicates with =, >, <, >=, <= predicate
I thought it just not allow primary key for range predicate push down but okay for other columns.


I tested in cqlsh and with allow filtering the query works fine with all clustering key columns (basically all columns, regardless of the performance)
And the query working on the previous DSC 2.5.0, any reasons to not allowing this operation on the latest version?

Alex Ott
October 20, 2020, 9:49 AM

Are you sure that it was working? You need to look to explain output and make sure that you have star character next to the filter - by default in older version it was printing all filters, even if they weren’t pushed - pushed were marked with star character

Jay Ng
October 20, 2020, 10:22 AM
Edited

This is what I tested with Spark 2.4.4 with DSC 2.5.1 on orders table

I can see the PushedFilters: [*IsNotNull(price), GreaterThan(price,1.0)] in the explain msg

Alex Ott
October 20, 2020, 3:45 PM

As you can see, only IsNotNull is marked with star the GreaterThan doesn’t have a star character, that means that it’s not pushed.

Another sign of that is that there is a Filter operation above Scan - it’s performed by Spark after reading all data from table. If filter is pushed, there won’t be additional filtering

Assignee

Jaroslaw Grabowski

Reporter

Jay Ng

Fix versions

None

Labels

None

Reviewer

None

Reviewer 2

None

Pull Request

None

Components

Affects versions

Priority

Major
Configure