-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore "nanoarrow-js" #41
Comments
Look at zarrita.js implementation to consider typescript typing for this approach. Seems like type guards would be very useful here. let arrayData: ArrowArray = ...;
function isStringArray(data: ArrowArray): data is StringArray keep in mind though that if StringArray doesn't change the actual interface, a |
Arrow JS is a larger library but it is super treeshakeable. So if you don't need IPC reading/writing for example, you can get a much smaller bundle. If you just import one type, it can be tiny. |
As a disclaimer, I'm horrible at bundling, so it's very possible I'm doing something wrong, but in geoarrow/geoarrow-js#20 I found that the
I was suspicious because originally the unminified worker output from the latter still had IPC read/write code. In the end, because I knew in this worker I was only using attributes of the So, naively, it seems to enable tree shaking I have to ensure imports are from the internal file? Or maybe I'm using esbuild wrong 🤷♂️ In any case, as I mentioned here, I'm already spread too thin and don't think I have the bandwidth to make a stable |
I think esbuild doesn't treeshake. We have a bundle test in arrow that compares different bundlers.
I filed an issue about it at evanw/esbuild#1922 but it sounds like esbuild will expect annotations so we should add those if esbuild is becoming popular. I have pretty good experiences with rollup. Would be awesome if the problem just went away with a better bundler so you don't have to rewrite the Arrow APIs. |
I'm going to close this because I don't have the maintenance bandwidth to try and implement data structures for Arrow outside of Arrow JS, and I don't have a use case at this point where Arrow JS's bundle size is a deal-breaker. |
Arrow JS is a big library! It's not really a tenable dependency for a very bundle size conscious library or application.
This is actually the same story as in C/C++/Python. The C++ Arrow library got so big that many projects didn't want to depend on it. That's why
nanoarrow
was created. As a super minimal library that works with the C Data Interface representation of Arrow arrays.I think there's definitely potential for a low level Arrow library in JS, that hews very closely to the C Data Interface.
Data structures would be essentially the JS counterpart of C Data Interface structs. All array data (no matter the logical type) would be a
Uint8Array
, that could later be viewed as another type or as strings.Because array data are all
Uint8Array
s, it means an array could either be "owned" in JS memory or "viewed" from wasm memory. So the memory safety wouldn't be great, but this is JS after all!It would make sense to have
toArrowJS
andfromArrowJS
functions that convert to and from Arrow JS arrays/Data
instances.An emphasis should be placed on a functional api instead of a class API to keep bundle size low.
Ideally, this would allow high-performance programs to rely on Arrow memory without fear of a huge bundle size impact! But this would be complementary not competitive with Arrow JS.
The text was updated successfully, but these errors were encountered: