Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spread 'frozen' behavior through code base & document in dev docs #1616

Open
ctb opened this issue Jun 20, 2021 · 2 comments
Open

Spread 'frozen' behavior through code base & document in dev docs #1616

ctb opened this issue Jun 20, 2021 · 2 comments

Comments

@ctb
Copy link
Contributor

ctb commented Jun 20, 2021

If FrozenSourmashSignature #1610 goes in, we'll have a de facto documentation of everywhere in the code base where we modify signatures and sketches. This touches on a bunch of things that @luizirber and I have discussed in oblique ways - mostly when it's causing trouble for him in the Rust layer - and I think it might be time to double down and think more holistically on how to think about the codebase as a whole.

Related topics -

  • it's not been entirely clear how to think about signatures, minhash objects, and databases in the context of long running processes (batch processes, client/server, Web backends, etc.).
  • who "owns" a signature when it was loaded from a database? and when can the rust code assume that it has ownership of the backend signature, and is allowed to change it?
  • which parts of the code are "allowed" to make binding changes to a signature or sketch that should be reflected in a database?
  • which parts of the code are allowed to make changes to databases? and how do we enforce this?
  • related: what do the APIs return - original or changed signatures? per revisit LinearIndex.select behavior for scaled (and num?) #1433 and many of the recent changes, I've doubled down on returning original signatures from the search APIs, so that all the flattening and downsampling is done dynamically during search. I still think this is a good idea.

Based on #1610 and #1508, I do think we've done an OK job - it turns out that SourmashSignature and MinHash objects are mostly adjusted or changed in the top-level command-line functions, which limits the footprint of things we need to pay attention to. We could make this a policy, though shrug.

We could also make a clearer distinction between immutable database types (load & use, but don't change) and mutable database types (so far, mostly in memory ones, or when building on-disk ones). I'm not quite sure how to do this best, but strengthening the disconnect between building databases and using databases seems like a good idea.

Related: #1494 and discussions therein. Also see #1577 cc @gustavo-salazar, who might have the "honor" of further exploring some of the consequences of long-running sourmash processes :).

@ctb
Copy link
Contributor Author

ctb commented Jun 20, 2021

(the larger goal is to have sourmash continue to support nice CLI UX, but also be a robust and performant library that can be used for long-running processes and/or underpin other libraries and workflows)

@ctb
Copy link
Contributor Author

ctb commented Aug 3, 2022

ref #616 and #2133.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant