-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad performance when reading large array of strings. #24
Comments
This package has not been touched in a long long time, and I don't think that there is anyone in the Julia community who understands its inner workings anymore. |
Hi @Chakerbh, The README suggests using mmap for large files. This seems much faster: julia> for i in 1:5
items = i * 10000000
json_file = open("/tmp/json", "w")
write(json_file, JSON.json(Dict("a"=> "a", "b"=>repeat(["test"], items))))
close(json_file)
json_file = open("/tmp/json")
s = String(Mmap.mmap(json_file))
t = @elapsed collect(keys(LazyJSON.value(s)))
println("$items $t")
close(json_file)
end
10000000 0.184056137
20000000 0.368260537
30000000 0.594284688
40000000 0.792973539
50000000 1.011099886 Even using The problem seems to be here on line 98 where data is read using Lines 92 to 98 in 53c63f0
Printing out the number of bytes read at that point, it looks like readavailable only returns 32k at a time, even though there are many MB available. Bug in base?
I think the IOString.jl interface is intended for streaming from network sockets, it may be that Anyway, if I replace line 98 with julia> for i in 1:5
items = i * 10000000
json_file = open("/tmp/json", "w")
write(json_file, JSON.json(Dict("a"=> "a", "b"=>repeat(["test"], items))));
close(json_file)
json_file = open("/tmp/json")
t = @elapsed collect(keys(LazyJSON.value(json_file)))
println("$items $t");
close(json_file)
end
10000000 0.673818205
20000000 0.817360912
30000000 1.864231564
40000000 2.132910674
50000000 2.84851389 However, if you are dealing with local files, use Mmap. |
If you make a s = StringView(Mmap.mmap(json_file))
j = LazyJSON.value(s) |
The above is asymptotically problematic when
json_file
contain large array of strings.I run this code
And as you can see from the result it's far away from linear (the second column is in seconds and the first is the number of items in the array)
Compared to
JSON.parse
which returnWe did some profiling and it seems that most of the time is spent in
LazyJSON.jl/src/LazyJSON.jl
Lines 478 to 496 in 53c63f0
The text was updated successfully, but these errors were encountered: