You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FunctionItem currently uses Java Serialization for serializing the contained RuntimeIterator.
Additionally, it's not clear whether Spark closures make use of Kryo for the iterators. It would be beneficial for performance to use Kryo in these operations.
The discussions regarding this issue can be found in the following PR
A place to look at would be the Spark configuration. There are two things that should be configurable: the serialization of results (the Items in the RDDs), which Stefan made use Kryo already during his thesis (plus you did the same for manual serialization in DataFrames), and on the other hand the serialization of all closures passed to Spark in transformations (and this is were the Iterators are serialized), which we are unsure uses Java IO or Kryo.
While I do not have the exact parameters on my mind, this may be a useful reading to start with:
However, since for now the matter with FunctionItems is fixed with using Java IO for the iterator, there is no urgency in this. You can look into this when you see fit.
FunctionItem currently uses Java Serialization for serializing the contained RuntimeIterator.
Additionally, it's not clear whether Spark closures make use of Kryo for the iterators. It would be beneficial for performance to use Kryo in these operations.
The discussions regarding this issue can be found in the following PR
Originally posted by @ghislainfourny in #353
The text was updated successfully, but these errors were encountered: