We've previously limited parallesim in reading to the spark cores along with prefetching. This may be unsuitable for users which want to run with a limited number of spark cores. To fix this we can just subdivide the Partition Token range into Parallelism pieces and request each of those pieces concurrently and merge the resultant iterator.
would be changed to multiple TokenRange Iterators that are combined with something like `concatMap` from RxScala. The difficulty here is we don't want to break the TokenOrdering of the results but still maintain parallelism. Not sure there is a way to do this without buffering the whole result or whether tokenOrdering is neccessary to preserve now that we have the Partitioner Code. This would break SpanBy (if order is lost)