Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Kryo Serialization for FunctionItem and RuntimeIterator #355

Open
CanBerker opened this issue Nov 1, 2019 · 0 comments
Open

Use Kryo Serialization for FunctionItem and RuntimeIterator #355

CanBerker opened this issue Nov 1, 2019 · 0 comments

Comments

@CanBerker
Copy link
Collaborator

CanBerker commented Nov 1, 2019

FunctionItem currently uses Java Serialization for serializing the contained RuntimeIterator.

Additionally, it's not clear whether Spark closures make use of Kryo for the iterators. It would be beneficial for performance to use Kryo in these operations.

The discussions regarding this issue can be found in the following PR

A place to look at would be the Spark configuration. There are two things that should be configurable: the serialization of results (the Items in the RDDs), which Stefan made use Kryo already during his thesis (plus you did the same for manual serialization in DataFrames), and on the other hand the serialization of all closures passed to Spark in transformations (and this is were the Iterators are serialized), which we are unsure uses Java IO or Kryo.
While I do not have the exact parameters on my mind, this may be a useful reading to start with:

https://ogirardot.wordpress.com/2015/01/09/changing-sparks-default-java-serialization-to-kryo/

However, since for now the matter with FunctionItems is fixed with using Java IO for the iterator, there is no urgency in this. You can look into this when you see fit.

Originally posted by @ghislainfourny in #353

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants