distinct() rises NullPointerException in Catalyst generated code when LIMIT clause used

Description

In general:
Using two columns table and LIMIT clause leeds to NPE in DataFrame.distinct() if limit less then the table size.
No such behaviour observed for Elasticsearch Spark connector on the same data and code (select -> distinct)

Steps to reproduce: use the code published in GitHub repo

Environment

Apache Spark v.2.0.2 (1 master + 2 workers)
C* 3.3 (standalone)

Pull Requests

None

Activity

Show:

Russell Spitzer 
December 13, 2016 at 10:29 PM

Marking as Invalid since this is purely a Spark Bug

Russell Spitzer 
December 13, 2016 at 10:28 PM

My issue was a Dupe, there is a PR waiting to fix this in Spark 2.0
https://issues.apache.org/jira/browse/SPARK-18528

Russell Spitzer 
December 13, 2016 at 10:12 PM

This breaks on any aggregate following the limit

Russell Spitzer 
December 13, 2016 at 10:09 PM

Russell Spitzer 
December 13, 2016 at 10:03 PM

Same Problem exists without Cassandra

I'll report this as a Spark bug

Invalid

Details

Assignee

Reporter

Components

Affects versions

Priority

Created December 13, 2016 at 2:59 PM
Updated February 19, 2017 at 6:51 PM
Resolved December 13, 2016 at 10:29 PM