Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hex_tarball:unpack/2, create/2 API changes #64

Open
wojtekmach opened this issue May 28, 2019 · 7 comments
Open

hex_tarball:unpack/2, create/2 API changes #64

wojtekmach opened this issue May 28, 2019 · 7 comments

Comments

@wojtekmach
Copy link
Member

Currently the API is:

-type contents() :: #{filename() => binary()}.
-type tarball() :: binary().
%% ...

-spec unpack(tarball(), memory) ->
                {ok, #{checksum => checksum(), metadata => metadata(), contents => contents()}} |
                {error, term()};
            (tarball(), filename()) ->
                {ok, #{checksum => checksum(), metadata => metadata()}} |
                {error, term()}.

so we are only able to accept binaries. I'm wondering if we should accept filenames too, so instead of binary() we accept {binary, binary()} | filename:filename() and deprecate the former? Eventually, at say v1.0, we'd accept {binary, binary()} | filename:filename_all().

This is a low-level library so I'm ok with pretty low level interface but I think this change would make it more convenient to use and I wish I'd have started with that.

Thoughts? cc @ferd @tsloughter

@ferd
Copy link
Contributor

ferd commented May 28, 2019

why not add unpack_file ? It would allow you to carry the intent ("the binary blob is actually a filename") in the function name itself, rather than asking the user to wrap the type in a tuple ("this is a filename binary, and this is a binary binary")

Otherwise I'd expect consistency: unpack({file, Name} | {binary, Bin}, ...)

@wojtekmach
Copy link
Member Author

Otherwise I'd expect consistency: unpack({file, Name} | {binary, Bin}, ...)

I was going for similar API to erl_tar:

extract(Name) -> RetValue
Name = filename() | {binary,binary()} | {file,Fd}

but I think I prefer your suggestion of unpack_file (and, consequently, an unpack_docs_file) better. Thanks 👍

@wojtekmach
Copy link
Member Author

Somewhat related, we allow:

  1. hex_tarball:create(Metadata, ["foo.erl"]).
  2. hex_tarball:create(Metadata, [{"foo.erl", "/path/on/disk/to/foo.erl"}]).
  3. hex_tarball:create(Metadata, [{"foo.erl", <<"-module(foo).">>}]).

Option 1. is a convenience inherited from erl_tar and I'd be ok with dropping it to keep api surface smaller.

@starbelly suggested extracting option 3. into a separate function, it could have the following signature:

-spec create_from_memory(metadata(), [{filename:filename_all(), binary()}]).

If we put all this together, do we have:

  1. unpack, unpack_file, create, create_from_memory
  2. unpack, unpack_from_memory, create, create_from_memory
  3. unpack_from_file, unpack_from_memory, create_from_files, create_from_memory
  4. ???

I think I like option 2 the most. Thoughts?

One more thing, as mentioned in the initial comment we allow:

  1. unpack(Tarball, memory)
  2. unpack(Tarball, OutputPath)

That is, we allow unpacking to a file on disk or we return it from the function. We currently does not have such capability for create, we always return the generated tarball as binary, we don't save it on disk. Should we mimic the memory option there, or wait until someone actually needs it? If we're doing big changes to the API I'd prefer to them at once I think.

@wojtekmach wojtekmach changed the title hex_tarball:unpack/2 API changes hex_tarball:unpack/2, create/2 API changes Dec 16, 2019
@starbelly
Copy link
Contributor

@wojtekmach I too like option 2. I also like being able to pass the output path in or an atom. Though I can also see the argument for having unpack_from_memory/1 as it is one less argument to get wrong.

@wojtekmach
Copy link
Member Author

wojtekmach commented Dec 16, 2019

oh, so unpack_from_memory/1,2? Yes, that's an option too, basically instead of:

-spec unpack(tarball(), memory) ->
                {ok, #{checksum => checksum(), metadata => metadata(), contents => contents()}} |
                {error, term()};
            (tarball(), filename()) ->
                {ok, #{checksum => checksum(), metadata => metadata()}} |
                {error, term()}.

we have:

-spec unpack(tarball()) ->
                {ok, #{checksum => checksum(), metadata => metadata(), contents => contents()}} |
                {error, term()}.
-spec unpack(tarball(), filename()) ->
                {ok, #{checksum => checksum(), metadata => metadata()}} |
                {error, term()}.

right? I guess that's a win, I kinda like that we don't have 2 different sets of returns. But then again, we actually would have:

-spec unpack(filename())
-spec unpack(filename(), filename())
-spec unpack_from_memory(binary())
-spec unpack_from_memory(binary(), filename())

does that look good? I think it's a little bit weird that in the function name we describe the input but not the output, is it? but then again calling it unpack_from_memory_to_memory would be terrible.

@starbelly
Copy link
Contributor

starbelly commented Dec 16, 2019

I see what you mean in regards to symmetry and that is a con. It's definitely a pick your poison situation :) As for the rationale of one less argument to pass around, this is true, but I don't actually see it being a problem. So, I think if we put that argument aside, which one is simpler for the user? I'm not entirely sure, I tend to favor being explicit though.
If we may ever have a need for yet another variation unpack or create then I would probably in favor of arguments vs function names, but I can't think of another variation 🤔

Note, that all approaches leads to two entry points for create* for me so that the validation parts are nice and clear 👍

@starbelly
Copy link
Contributor

starbelly commented Dec 16, 2019

@wojtekmach Something I'm doing right now as part of the validation PR is definitely making sense on my end and that is on topic with this issue perhaps is ...

Instead of passing around 2 arguments here and one here, and a list of tuples here, etc... I'm modifying the internal functions to work with a package() :

 -type source() :: filesystem | memory.
 -type package() :: #{ source => source(),
                       files => files(),
                       metadata => metadata(),
                       errors => errors(),
                       warnings => warnings(),
                       content_size => integer()
                    }.

This is a map vs a record for interoperability. I'm wondering if we're doing an API change, maybe expose this and helper functions for working with it, such as new_package/2 or new_package_from_filesystem|memory/2 or new_package/3 which would take a metadata and files params or in the case of new_package/3 those two params and a filesystem|memory atom.

Operations on the package within hex_tarball would be opaque via the helper, even in internal functions (e.g., updating the files, adding errors or warnings, etc.).

Entry point functions could either return a package() (which would contain the tarball()) or just return a tarball() .

In addition, intentions (if needed) can be placed in the structure and the execution carried out by internal functions. This also would keeps things open and extensible for the future as well as making the implementation more clear and hopefully more declarative. I think even if we don't accept a package() at the entry points it still makes sense to work with this internally within hex_tarball.

To note on another thing I kind of threw in the mix here : filesystem vs files. In both the case of creating or unpacking to/from files, we're still working with files, so maybe filesystem is a better way to distinguish the source.

Sorry to add entropy to the convo here, but I figured since we're throwing out ideas and this is making sense on my end, throw it in :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants