Cassandra filter push down not working properly on 3.0.0
The filter push down seems not working properly on version 3.0.0
I tried to `explain` the query with data type `text`, `int`, `float` and seems most of them cannot pushed down (From the explain message `Cassandra Filters` is empty)
As you can see, only IsNotNull is marked with star the GreaterThan doesn’t have a star character, that means that it’s not pushed.
Another sign of that is that there is a Filter operation above Scan - it’s performed by Spark after reading all data from table. If filter is pushed, there won’t be additional filtering
This is what I tested with Spark 2.4.4 with DSC 2.5.1 on orders table
I can see the PushedFilters: [*IsNotNull(price), GreaterThan(price,1.0)] in the explain msg
Are you sure that it was working? You need to look to explain output and make sure that you have star character next to the filter - by default in older version it was printing all filters, even if they weren’t pushed - pushed were marked with star character
thx for the explanation. However, still got sth unclear.
Is it means that we can only filter by range predicate only with the first clustering key?
The restriction stated:
Only push down no-partition key column predicates with =, >, <, >=, <= predicate
I thought it just not allow primary key for range predicate push down but okay for other columns.
I tested in cqlsh and with allow filtering the query works fine with all clustering key columns (basically all columns, regardless of the performance)
And the query working on the previous DSC 2.5.0, any reasons to not allowing this operation on the latest version?
That’s correct behaviour of Cassandra & Spark Cassandra connector:
id couldn’t be used with inequality operators because it’s a partition key and it’s hashed
inequality operators usually are applied only to clustering columns (first one, or if all preceding are restricted by equality operation)
other predicates may be pushed down if you have supporting structures - secondary indexes (equality only), DSE Search, etc.
Documentation for 2.5.x has bigger set of the rules: https://github.com/datastax/spark-cassandra-connector/blob/b2.5/doc/14_data_frames.md#full-list-of-predicate-pushdown-restrictions, but it’s still not completely correct/clear.
Basically, the rule of thumb is - if you can make your query work as CQL in clash, then it will be pushed by SCC.