Change Preferred Location to Use Hostname, Address for both Broadcast and Local

Description

On some systems the executors are bound to IP addresses which are not resolvable by reverse dns. When the preferred location method uses only the getHostAddress this can yield a string which does not match.

So if for example the exeuctor is advertised as "127.0.0.1" and we use getHostName we may return "localhost". localhost != 127.0.0.1 this address so this will not map correctly and spark won't understand the locality.

We most likely will need to change this to also include the listen addresses of C* nodes just to make sure that all of the possible interfaces a spark worker can be bound on are covered.

Pull Requests

None

Activity

Show:

Brian Cantoni April 30, 2015 at 5:37 PM

Abhinav Chawade April 30, 2015 at 5:31 PM
Edited

+BrianC Reopening the bug to back port the patch to 1.1

Brian Cantoni April 26, 2015 at 4:13 PM

Changes have been merged. For reference this was the PR with code review comments/discussion: https://github.com/datastax/spark-cassandra-connector/pull/635

April 23, 2015 at 11:57 AM

I reviewed it and have a few questions on the PR

Russell Spitzer April 17, 2015 at 11:11 PM

So in this ticket we've made sure the we make sure preferredLocations returns a string

Broadcast.Host, Broadcast.Adress, Local.Host, Local. Address for both sc.cassandraTable and repartionByCassandraReplica.

Previously this looked like
Brodacast.Host, Local.Host for sc.cassandraTable
and
Broadcast.Host for repartitionByCassandraReplica

This is important because Spark uses String Comparison do determine whether an executor's location is the same as the partition. So passing all of these strings will increase our chance of matching a user configuration.

Fixed

Details

Assignee

Reporter

Reviewer

Reviewer 2

Components

Fix versions

Priority

Created April 17, 2015 at 12:50 AM
Updated April 30, 2015 at 5:37 PM
Resolved April 28, 2015 at 4:54 PM

Flag notifications