distinct() rises NullPointerException in Catalyst generated code when LIMIT clause used

Description

In general:
Using two columns table and LIMIT clause leeds to NPE in DataFrame.distinct() if limit less then the table size.
No such behaviour observed for Elasticsearch Spark connector on the same data and code (select -> distinct)

Steps to reproduce: use the code published in GitHub repo

Environment

Apache Spark v.2.0.2 (1 master + 2 workers)
C* 3.3 (standalone)

Pull Requests

None

Activity

Show:

Russell Spitzer
December 13, 2016 at 10:29 PM

Marking as Invalid since this is purely a Spark Bug

Russell Spitzer
December 13, 2016 at 10:28 PM

My issue was a Dupe, there is a PR waiting to fix this in Spark 2.0
https://issues.apache.org/jira/browse/SPARK-18528

Russell Spitzer
December 13, 2016 at 10:12 PM

This breaks on any aggregate following the limit

Russell Spitzer
December 13, 2016 at 10:09 PM

https://issues.apache.org/jira/browse/SPARK-18851

Russell Spitzer
December 13, 2016 at 10:03 PM

Same Problem exists without Cassandra

I'll report this as a Spark bug

Resize issue view side panel

Invalid

Details

Assignee

Russell Spitzer

Reporter

Artem Soloviov

Components

Affects versions

2.0.0-M3

Priority

Major

Created December 13, 2016 at 2:59 PM

Updated February 19, 2017 at 6:51 PM

Resolved December 13, 2016 at 10:29 PM