Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow deserialization #100

Open
ezkangaroo opened this issue Dec 11, 2023 · 3 comments
Open

Very slow deserialization #100

ezkangaroo opened this issue Dec 11, 2023 · 3 comments

Comments

@ezkangaroo
Copy link

I found that in practice, use of DoubleArrayAhoCorasick::<V>::deserialize_unchecked() can be extremely slow. It typically takes 4-5s on my m1 (16gb ram) machine.

Here are some stats:

heap_bytes=90855648 // .heap_bytes()
num_states=7440593  // .num_states()
2023-12-11T01:35:52.049237Z  INFO load_index{index_file="index.bin"}:load:deserialize
2023-12-11T01:35:56.495537Z  INFO ...
@ezkangaroo
Copy link
Author

ezkangaroo commented Dec 13, 2023

I'm happy to add serde's Serialize, and Deserialize trait (maybe under create's feature serde). I was able to get this to ~1s with Serde. Also, if we do this, we can skip unsafe. Let me know - happy to create a patch PR.

@vbkaisetsu
Copy link
Member

@ezkangaroo The DoubleArrayAhoCorasick structure internally uses unsafe functions, so I don't want to implement Serialize and Deserialize traits.
Instead, it is possible to use serde internally and provide a deserialization API as an unsafe function.

You are welcome to write a pull request!

@vbkaisetsu
Copy link
Member

Tip.

Although deserialize itself is a safe process, there is no guarantee that the automaton is correct, and in some cases memory access violations will occur, so the function must be unsafe to notify the user.

This is similar to String::from_utf8_unchecked, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants