-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable listing of zip files in fromfiles for index, gather, search #354
base: main
Are you sure you want to change the base?
Conversation
maybe of interest - sourmash-bio/sourmash#3150 - please add notes and questions as you have them ;) |
I'm pretty new to Rust so gonna post my attempts and compile errors so far, in case it's helpful. I feel like I'm repeating incantations that I don't fully understand, so hopefully this at least helps me figure out what's going on.
First compile error:
|
let collection = if sigpath.extension().map_or(false, |ext| ext == "zip") { |
sourmash_plugin_branchwater/src/utils.rs
Line 602 in 5624f21
.filter_map(|path| match Signature::from_path(path) { |
Second compile error: error[E0599]: expected value, found macro 'line'
error[E0423]: expected value, found macro `line`
--> src/utils.rs:641:58
|
641 | match collection_from_zipfile(&PathBuf::from(line), &report_type) {
| ^^^^
| |
| not a value
| help: a local variable with a similar name exists: `lines`
This is me not knowing Rust. How can the path be initialized from the line in filter_map
, to be tested for both the extension and then be referenced with &
?
Third compile error: error[E0599]: no method named 'or_else' found for struct 'Collection' in the current scope
error[E0599]: no method named `or_else` found for struct `Collection` in the current scope
--> src/utils.rs:639:33
|
639 | let collection = collection.or_else(
| -----------^^^^^^^ method not found in `Collection`
This one I don't understand at all, because collection_from_signature
and load_collection
door_else
all over the place! Does some other thing need to be set for collection
?
sourmash_plugin_branchwater/src/utils.rs
Line 691 in 5624f21
collection.or_else(|| match collection_from_manifest(&sigpath, &report_type) { |
Fourth compile error: error[E0277]: can't compare 'std::option::Option<&str>' with '&str'
error[E0277]: can't compare `std::option::Option<&str>` with `&str`
--> src/utils.rs:640:81
|
640 | lines.par_iter().filter_map(|line| Some(PathBuf::from(line).extension() == "zip")) {
| ^^ no implementation for `std::option::Option<&str> == &str`
|
= help: the trait `PartialEq<&str>` is not implemented for `std::option::Option<&str>`
= help: the following other types implement trait `PartialEq<Rhs>`:
<std::option::Option<Box<U>> as PartialEq<rkyv::niche::option_box::ArchivedOptionBox<T>>>
<std::option::Option<T> as PartialEq>
<std::option::Option<U> as PartialEq<rkyv::option::ArchivedOption<T>>>
I'm not clear on how to get the extension of the file specified in the lines. I thought PathBuf.from(line).extension()
would create a string that could be compared, like this:
sourmash_plugin_branchwater/src/utils.rs
Line 678 in 5624f21
let collection = if sigpath.extension().map_or(false, |ext| ext == "zip") { |
Fifth compile error: error[E0308]: mismatched types
error[E0308]: mismatched types
--> src/utils.rs:645:21
|
641 | / match collection_from_zipfile(&PathBuf::from(line), &report_type) {
642 | | Ok(coll) => Some((coll, 0)),
643 | | Err(e) => {
644 | | last_error = Some(e);
645 | | None
| | ^^^^ expected `()`, found `Option<_>`
646 | | }
647 | | }
| |_____________- expected this to be `()`
|
= note: expected unit type `()`
found enum `std::option::Option<_>`
At this point, I'm lucky to even get into this loop, so I'm not too worried about this yet :)
EDIT: formatting with backticks to make look a bit nicer
…in collection_from_pathlist
Okay, I think this is closer. I created a fn collection_from_zipfile_or_signature_or_manifest(
sigpath: &Path,
report_type: &ReportType,
) -> Result<Option<Collection>, Option<anyhow::Error>> { Now, the only compiler error is a error[E0308]: mismatched types
--> src/utils.rs:654:9
|
654 | Ok((collection, n_failed))
| ^^^^^^^^^^ expected `Collection`, found `Vec<Collection>`
|
= note: expected struct `Collection`
found struct `Vec<Collection>` I'm not fully understanding how to manage this, since I thought the |
Hmm from looking in |
src/utils.rs
Outdated
.filter_map(|path| match collection_from_zipfile_or_signature_or_manifest(&path, &report_type) { | ||
Ok(collection) => { | ||
// For each record in the collection, get its path filename | ||
Some(collection | ||
.unwrap() | ||
.manifest() | ||
.iter() | ||
.filter_map(|record| match record.internal_location(){ | ||
PathBuf => Some(location) | ||
|
||
}) | ||
.collect()) | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting compiler error that Some
doesn't exist? Not sure what to make of this..
error[E0425]: cannot find value `location` in this scope
--> src/utils.rs:653:41
|
653 | PathBuf => Some(location)
| ^^^^^^^^ not found in this scope
src/utils.rs
Outdated
.filter_map(|path| match collection_from_zipfile_or_signature_or_manifest(&path, &report_type) { | ||
Ok(collection) => { | ||
// For each record in the collection, get its path filename | ||
Some(collection | ||
.unwrap() | ||
.manifest() | ||
.iter() | ||
.map(|record| record.internal_location()) | ||
.collect()) | ||
}, | ||
Err(err) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried using map
here, but then the type couldn't be inferred:
error[E0283]: type annotations needed
--> src/utils.rs:653:22
|
645 | .filter_map(|path| match collection_from_zipfile_or_signature_or_manifest(&path, &report_type) {
| ---------- type must be known at this point
...
653 | .collect())
| ^^^^^^^ cannot infer type of the type parameter `B` declared on the method `collect`
|
= note: cannot satisfy `_: Send`
note: required by a bound in `rayon::iter::ParallelIterator::filter_map`
--> /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/mod.rs:840:12
|
837 | fn filter_map<P, R>(self, filter_op: P) -> FilterMap<Self, P>
| ---------- required by a bound in this associated function
...
840 | R: Send,
| ^^^^ required by this bound in `ParallelIterator::filter_map`
help: consider specifying the generic argument
|
653 | .collect::<Vec<_>>())
| ++++++++++
I learned that It seems that My plan now is to:
Let me know if this is on the right track. The code is doing some double nesting things right now which is probably not the best, so happy to change to something more canonical Rust. I tried both with |
So, this seems to be working! (maybe?) Tests are passing:
📦 Built wheel for CPython 3.12 to /var/folders/7z/r1593ybs1sj2ks5zzl9vy8840000gn/T/.tmpW3hV1Q/sourmash_plugin_branchwater-0.9.6.dev0-cp312-cp312-macosx_10_14_x86_64.whl
✏️ Setting installed package as editable
🛠 Installed sourmash_plugin_branchwater-0.9.6.dev0
==================================================================== test session starts =====================================================================
platform darwin -- Python 3.12.3, pytest-8.2.1, pluggy-1.5.0 -- /Users/olgabot/anaconda3/envs/branchwater-dev/bin/python3.12
cachedir: .pytest_cache
rootdir: /Users/olgabot/code/sourmash_plugin_branchwater
configfile: pyproject.toml
collected 261 items / 260 deselected / 1 selected
src/python/tests/test_index.py::test_index_multiple_manifests PASSED [100%]
====================================================================== warnings summary ======================================================================
src/python/tests/sourmash_tst_utils.py:10
/Users/olgabot/code/sourmash_plugin_branchwater/src/python/tests/sourmash_tst_utils.py:10: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================== 1 passed, 260 deselected, 1 warning in 7.36s ======================================================= To inspect the output, added (base)
Sun 30 Jun - 16:05 /var/folders/7z/r1593ybs1sj2ks5zzl9vy8840000gn/T/sourmashtest_22dlbn8t
@olgabot ll
Permissions Size User Group Date Modified Name
.rw-r--r-- 23k olgabot staff 30 Jun 16:05 2.fa.sig.gz.mf.zip
.rw-r--r-- 43k olgabot staff 30 Jun 16:05 47.fa.sig.gz.mf.zip
.rw-r--r-- 44k olgabot staff 30 Jun 16:05 63.fa.sig.gz.mf.zip
.rw-r--r-- 272 olgabot staff 30 Jun 16:05 manifest_zips.txt
drwxr-xr-x - olgabot staff 30 Jun 16:05 out.db
(base)
Sun 30 Jun - 16:05 /var/folders/7z/r1593ybs1sj2ks5zzl9vy8840000gn/T/sourmashtest_22dlbn8t
@olgabot ll out.db
Permissions Size User Group Date Modified Name
.rw-r--r-- 0 olgabot staff 30 Jun 16:05 000012.log
.rw-r--r-- 175k olgabot staff 30 Jun 16:05 000013.sst
.rw-r--r-- 1.7k olgabot staff 30 Jun 16:05 000014.sst
.rw-r--r-- 16 olgabot staff 30 Jun 16:05 CURRENT
.rw-r--r-- 36 olgabot staff 30 Jun 16:05 IDENTITY
.rw-r--r-- 0 olgabot staff 30 Jun 16:05 LOCK
.rw-r--r-- 60k olgabot staff 30 Jun 16:05 LOG
.rw-r--r-- 642 olgabot staff 30 Jun 16:05 MANIFEST-000005
.rw-r--r-- 15k olgabot staff 30 Jun 16:05 OPTIONS-000009
.rw-r--r-- 15k olgabot staff 30 Jun 16:05 OPTIONS-000011 But unclear if this output folder is the correct To try using the created database, I did (branchwater-dev)
Sun 30 Jun - 16:08 /var/folders/7z/r1593ybs1sj2ks5zzl9vy8840000gn/T/sourmashtest_22dlbn8t
@olgabot sourmash scripts fastmultigather 63.fa.sig.gz.mf.zip out.db
== This is sourmash version 4.8.9. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==
=> sourmash_plugin_branchwater 0.9.6.dev0; cite Irber et al., doi: 10.1101/2022.11.02.514947
ksize: 31 / scaled: 1000 / moltype: DNA / threshold bp: 50000
gathering all sketches in '63.fa.sig.gz.mf.zip' against 'out.db' using 16 threads
Loaded DB
Reading query(s) from: '63.fa.sig.gz.mf.zip'
Loaded 1 query signature(s)
thread '<unnamed>' panicked at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sourmash-0.14.1/src/storage.rs:628:48:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "/Users/olgabot/anaconda3/envs/branchwater-dev/bin/sourmash", line 8, in <module>
sys.exit(main())
^^^^^^
File "/Users/olgabot/anaconda3/envs/branchwater-dev/lib/python3.12/site-packages/sourmash/__main__.py", line 20, in main
retval = mainmethod(args)
^^^^^^^^^^^^^^^^
File "/Users/olgabot/code/sourmash_plugin_branchwater/src/python/sourmash_plugin_branchwater/__init__.py", line 163, in main
status = sourmash_plugin_branchwater.do_fastmultigather(args.query_paths,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Option::unwrap()` on a `None` value With the backtrace, Full `sourmash scripts fastmultigather` output with backtrace(branchwater-dev)
✘ Sun 30 Jun - 16:08 /var/folders/7z/r1593ybs1sj2ks5zzl9vy8840000gn/T/sourmashtest_22dlbn8t
@olgabot export RUST_BACKTRACE=full RUST_LOG=trace
(branchwater-dev)
Sun 30 Jun - 16:14 /var/folders/7z/r1593ybs1sj2ks5zzl9vy8840000gn/T/sourmashtest_22dlbn8t
@olgabot sourmash scripts fastmultigather 63.fa.sig.gz.mf.zip out.db
== This is sourmash version 4.8.9. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==
=> sourmash_plugin_branchwater 0.9.6.dev0; cite Irber et al., doi: 10.1101/2022.11.02.514947
ksize: 31 / scaled: 1000 / moltype: DNA / threshold bp: 50000
gathering all sketches in '63.fa.sig.gz.mf.zip' against 'out.db' using 16 threads
Loaded DB
Reading query(s) from: '63.fa.sig.gz.mf.zip'
Loaded 1 query signature(s)
thread '<unnamed>' panicked at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sourmash-0.14.1/src/storage.rs:628:48:
called `Option::unwrap()` on a `None` value
stack backtrace:
0: 0x155aa9e65 - std::backtrace_rs::backtrace::libunwind::trace::h6f224ea69532b9c4
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
1: 0x155aa9e65 - std::backtrace_rs::backtrace::trace_unsynchronized::h7e173ef23e4514eb
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x155aa9e65 - std::sys_common::backtrace::_print_fmt::hb68789b79179df75
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/sys_common/backtrace.rs:68:5
3: 0x155aa9e65 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h46bc8e86d021398c
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/sys_common/backtrace.rs:44:22
4: 0x155ac8fc3 - core::fmt::rt::Argument::fmt::hf606bd67a54daa54
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/fmt/rt.rs:142:9
5: 0x155ac8fc3 - core::fmt::write::h1c24e945c7ef8357
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/fmt/mod.rs:1120:17
6: 0x155aa77ae - std::io::Write::write_fmt::hd9fe01a40900d808
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/io/mod.rs:1846:15
7: 0x155aa9c39 - std::sys_common::backtrace::_print::h3bfcf7a1dfb9a746
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/sys_common/backtrace.rs:47:5
8: 0x155aa9c39 - std::sys_common::backtrace::print::hdbfcabb7d363540a
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/sys_common/backtrace.rs:34:9
9: 0x155aab4c5 - std::panicking::default_hook::{{closure}}::hecb24382dfeeea98
10: 0x155aab23e - std::panicking::default_hook::h78dca947ab9e89eb
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:292:9
11: 0x155aab933 - std::panicking::rust_panic_with_hook::hb83cee65df04957a
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:781:13
12: 0x155aab80c - std::panicking::begin_panic_handler::{{closure}}::hbe6e44d88f82bc77
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:651:13
13: 0x155aaa359 - std::sys_common::backtrace::__rust_end_short_backtrace::h6560cb9c1f1224c1
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/sys_common/backtrace.rs:171:18
14: 0x155aab586 - rust_begin_unwind
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:647:5
15: 0x155aef125 - core::panicking::panic_fmt::h2aac8cf45f7ae617
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/panicking.rs:72:14
16: 0x155aef1f7 - core::panicking::panic::h4a7e975994ee54c1
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/panicking.rs:144:5
17: 0x155aef088 - core::option::unwrap_failed::h46e1b950e2bec24f
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/option.rs:1978:5
18: 0x15541d51c - core::option::Option<T>::unwrap::ha844df2bb8f7c47f
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/option.rs:931:21
19: 0x15541d51c - <sourmash::storage::MemStorage as sourmash::storage::Storage>::load_sig::h91196c07e71c948f
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sourmash-0.14.1/src/storage.rs:628:12
20: 0x1553ea261 - <std::sync::rwlock::RwLock<L> as sourmash::storage::Storage>::load_sig::hae08c7c075c3a91f
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sourmash-0.14.1/src/storage.rs:215:9
21: 0x155419dc6 - <sourmash::storage::InnerStorage as sourmash::storage::Storage>::load_sig::h5fdb003b953dd3b8
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sourmash-0.14.1/src/storage.rs:171:25
22: 0x1553c6c9b - sourmash::collection::Collection::sig_for_dataset::h9b931ea66e8df55a
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sourmash-0.14.1/src/collection.rs:178:19
23: 0x15537b90f - <sourmash::index::revindex::disk_revindex::RevIndex as sourmash::index::revindex::RevIndexOps>::gather::hab3b73cd197c9e6a
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sourmash-0.14.1/src/index/revindex/disk_revindex.rs:339:29
24: 0x155167e81 - <sourmash::index::revindex::RevIndex as sourmash::index::revindex::RevIndexOps>::gather::h9a74d626905065be
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sourmash-0.14.1/src/index/revindex/mod.rs:50:1
25: 0x1551647b7 - sourmash_plugin_branchwater::mastiff_manygather::mastiff_manygather::{{closure}}::h73189abd155ec8b5
at /Users/olgabot/code/sourmash_plugin_branchwater/src/mastiff_manygather.rs:70:39
26: 0x1551af37b - <rayon::iter::filter_map::FilterMapFolder<C,P> as rayon::iter::plumbing::Folder<T>>::consume::he68eca72d1c01790
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/filter_map.rs:123:36
27: 0x1551b3186 - rayon::iter::plumbing::Folder::consume_iter::h7aeb3ccd5b6f1a32
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/plumbing/mod.rs:178:20
28: 0x1551884ad - <rayon::iter::map::MapFolder<C,F> as rayon::iter::plumbing::Folder<T>>::consume_iter::ha7b9c9287517aaa6
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/map.rs:248:21
29: 0x15510afdd - rayon::iter::plumbing::Producer::fold_with::h15711142466a1e6c
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/plumbing/mod.rs:109:9
30: 0x1551bb435 - rayon::iter::plumbing::bridge_producer_consumer::helper::h70e96589b0d62aac
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/plumbing/mod.rs:437:13
31: 0x1551b6df1 - rayon::iter::plumbing::bridge_producer_consumer::h833f1810d0d01c85
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/plumbing/mod.rs:396:12
32: 0x1551b5bef - <rayon::iter::plumbing::bridge::Callback<C> as rayon::iter::plumbing::ProducerCallback<I>>::callback::h80010e9b057f1ad1
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/plumbing/mod.rs:372:13
33: 0x15510ad4a - <<rayon::iter::enumerate::Enumerate<I> as rayon::iter::IndexedParallelIterator>::with_producer::Callback<CB> as rayon::iter::plumbing::ProducerCallback<I>>::callback::h64c28e23065c8fab
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/enumerate.rs:78:17
34: 0x1551ace5f - <rayon::slice::Iter<T> as rayon::iter::IndexedParallelIterator>::with_producer::hfee0d6b876cd6531
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/slice/mod.rs:826:9
35: 0x15510b891 - <rayon::iter::enumerate::Enumerate<I> as rayon::iter::IndexedParallelIterator>::with_producer::h5f3cb5d218edf374
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/enumerate.rs:62:16
36: 0x1551c6f3d - rayon::iter::plumbing::bridge::hfa21054829fe1a61
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/plumbing/mod.rs:356:12
37: 0x15510b5dc - <rayon::iter::enumerate::Enumerate<I> as rayon::iter::ParallelIterator>::drive_unindexed::h764ef738ae0ba75b
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/enumerate.rs:38:9
38: 0x155187eec - <rayon::iter::map::Map<I,F> as rayon::iter::ParallelIterator>::drive_unindexed::h70798aec081ff73d
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/map.rs:49:9
39: 0x1551b46d8 - <rayon::iter::filter_map::FilterMap<I,P> as rayon::iter::ParallelIterator>::drive_unindexed::h912e986c1f764d91
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/filter_map.rs:46:9
40: 0x1550c16e5 - <rayon::iter::flatten::Flatten<I> as rayon::iter::ParallelIterator>::drive_unindexed::h00791cf5f0798cce
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/flatten.rs:38:9
41: 0x1550ac454 - <rayon::iter::map_with::MapWith<I,T,F> as rayon::iter::ParallelIterator>::drive_unindexed::hcafba9497f179ebc
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/map_with.rs:53:9
42: 0x1550f6acf - rayon::iter::try_reduce::try_reduce::h7a68390d0648840f
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/try_reduce.rs:21:5
43: 0x1550abd64 - rayon::iter::ParallelIterator::try_reduce::h7aab82da02d0d392
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/mod.rs:1087:9
44: 0x1550c11c1 - rayon::iter::ParallelIterator::try_for_each_with::h568bd72808bd6d49
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-1.10.0/src/iter/mod.rs:523:9
45: 0x1551f914e - sourmash_plugin_branchwater::mastiff_manygather::mastiff_manygather::h9fdf9888d2a86984
at /Users/olgabot/code/sourmash_plugin_branchwater/src/mastiff_manygather.rs:54:16
46: 0x15517eafe - sourmash_plugin_branchwater::do_fastmultigather::h814b200ad22dec20
at /Users/olgabot/code/sourmash_plugin_branchwater/src/lib.rs:122:15
47: 0x15517f93a - sourmash_plugin_branchwater::__pyfunction_do_fastmultigather::hb64ce83255049474
at /Users/olgabot/code/sourmash_plugin_branchwater/src/lib.rs:106:1
48: 0x1551942d6 - pyo3::impl_::trampoline::fastcall_with_keywords::{{closure}}::h0f8ff0dc78ec7ff5
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.21.2/src/impl_/trampoline.rs:45:29
49: 0x15519422b - pyo3::impl_::trampoline::trampoline::{{closure}}::hf36748a7cd33a7a3
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.21.2/src/impl_/trampoline.rs:187:54
50: 0x1550e7ff6 - std::panicking::try::do_call::h46969c12a76f2fc1
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:554:40
51: 0x1550ed8cd - ___rust_try
52: 0x1550e37b7 - std::panicking::try::h7589864a77e3035a
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:518:19
53: 0x1550a1664 - std::panic::catch_unwind::h9ba174ca62996f50
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panic.rs:142:14
54: 0x155194151 - pyo3::impl_::trampoline::trampoline::hd3b63f930fcf6ac6
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.21.2/src/impl_/trampoline.rs:187:9
55: 0x15509732d - pyo3::impl_::trampoline::fastcall_with_keywords::hfaee83008e4857ad
at /Users/olgabot/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.21.2/src/impl_/trampoline.rs:45:13
56: 0x15517efe4 - sourmash_plugin_branchwater::<impl sourmash_plugin_branchwater::do_fastmultigather::MakeDef>::_PYO3_DEF::trampoline::h1b0e605360433e6a
at /Users/olgabot/code/sourmash_plugin_branchwater/src/lib.rs:106:1
57: 0x100d863e6 - _cfunction_vectorcall_FASTCALL_KEYWORDS
58: 0x100e8d00b - __PyEval_EvalFrameDefault
59: 0x100e4a1ee - _PyEval_EvalCode
60: 0x100f121a7 - _run_mod
61: 0x100f11f75 - _pyrun_file
62: 0x100f119ff - __PyRun_SimpleFileObject
63: 0x100f11364 - __PyRun_AnyFileObject
64: 0x100f4064b - _pymain_run_file_obj
65: 0x100f3ff19 - _pymain_run_file
66: 0x100f3f78c - _Py_RunMain
67: 0x100ca23a8 - _main
68: 0x7ff809ebe366 - <unknown>
Traceback (most recent call last):
File "/Users/olgabot/anaconda3/envs/branchwater-dev/bin/sourmash", line 8, in <module>
sys.exit(main())
^^^^^^
File "/Users/olgabot/anaconda3/envs/branchwater-dev/lib/python3.12/site-packages/sourmash/__main__.py", line 20, in main
retval = mainmethod(args)
^^^^^^^^^^^^^^^^
File "/Users/olgabot/code/sourmash_plugin_branchwater/src/python/sourmash_plugin_branchwater/__init__.py", line 163, in main
status = sourmash_plugin_branchwater.do_fastmultigather(args.query_paths,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Option::unwrap()` on a `None` value However, it seems that
Thoughts? Happy to hear any Rust suggestions as I copied some internal code from Let me know what you think! |
Hi @olgabot I can't quite tell what you're trying to do in this PR 😅 The code in
By contrast, over in #235,
creates a standalone manifest CSV file - a CSV file that contains a pointer to sketches in another file. Revisiting your test, you're essentially doing this, right?
and then running things like This also doesn't currently work (except maybe in this branch!) and it's great to get it working, but it's different from manifests. |
when I run manysearch on list.txt in the main branch, I get:
If that's what you want to get working, then a better PR title here would be "enable listing of zip files in fromfiles". |
(and I verify that under this PR, the above command works fine 🎉 ) |
Yes, sorry, I got confused by manifests vs zip files vs signatures here 😅 . The goal of this PR is to allow for |
Whelp, I spoke too soon ... tried running the full test suite locally and even the first few test scripts had a ton of errors: (branchwater-dev)
✘ Sun 30 Jun - 17:35 ~/code/sourmash_plugin_branchwater origin ☊ olgabot/index-multiple-manifests 1●
@olgabot make test
python -m pytest
==================================================================== test session starts =====================================================================
platform darwin -- Python 3.12.3, pytest-8.2.1, pluggy-1.5.0
rootdir: /Users/olgabot/code/sourmash_plugin_branchwater
configfile: pyproject.toml
collected 261 items
src/python/tests/test_cluster.py ..........FFFF.. [ 6%]
src/python/tests/test_gather.py .F.F...FF...FF.F...F.F.F.F.F....F.FF [ 19%]
src/python/tests/test_index.py .F......FF.F....FF [ 26%]
src/python/tests/test_multigather.py .FFFFFFFF..FFFF.FFFFFFFFF.FF.FFFFFFFFFF.FF......FF [ 45%]
src/python/tests/test_multisearch.py .FFF.FFF.FFF....FF.FFF More work to be done! |
heh, I almost had suggested We (I) have been inconsistent, it seems - the branchwater plugin documentation uses fromfile (e.g. link), while sourmash uses pathlist (e.g. see refactoring here). So anyway two comments -
tl;dr it would be good to fix the PR name/description somehow. On a side note - This issue is also worth skimming for @bluegenes discussion of how the code works: #264 |
Hello, Hope you are doing well! I was able to get my ONE test to work, but then it broke all the other tests. Let me share what I’ve tried so far. The changes in this PR:
How to convert
|
Got rid of errors in (branchwater-dev) (base)
✘ ♥ 77% Fri 5 Jul - 12:00 ~/code/sourmash_plugin_branchwater origin ☊ olgabot/index-multiple-manifests ↑1 ✔
@olgabot pytest | head -n 20
============================= test session starts ==============================
platform darwin -- Python 3.12.3, pytest-8.2.1, pluggy-1.5.0
rootdir: /Users/olgabot/code/sourmash_plugin_branchwater
configfile: pyproject.toml
collected 261 items
src/python/tests/test_cluster.py ................ [ 6%]
src/python/tests/test_gather.py ............FF.F...................F [ 19%]
src/python/tests/test_index.py .................. [ 26%]
src/python/tests/test_multigather.py ......FFF.....F..FF.F.F...F..F....F [ 40%]
F.....F......FF [ 45%]
src/python/tests/test_multisearch.py .................F....F............ [ 59%]
... [ 60%]
src/python/tests/test_pairwise.py .........F................ [ 70%]
src/python/tests/test_search.py .......................FF....F.......... [ 85%]
.............. [ 91%]
src/python/tests/test_sketch.py ....................F.F [100%] |
hi @olgabot - not sure if this applies to this PR, but I wanted to alert you to some considerations that are coming up for manifests and relative path interpretation over in #390 (comment). Happy to chat more if you think it is relevant. |
(I think as long as you are working with fromfiles/pathlists you're fine, and the main recommendation for you currently is - get things working for absolute paths first, because relative path interpretation is a bit wibbly and might need some tender loving care in the future.) |
@olgabot I tackled this from another angle - standalone manifests - and I ended up writing a whole new struct, |
Thanks @ctb! Sorry I missed your message, it's been a bit crazy on my side. If |
Hmm though it does seem like |
(right, for internal storage it can work, but ... probably not yet ;) |
Internal storage is fine for me since all the files are on s3 anyway, so referencing other blobs isn't the best idea. However, I'm still not able to do Error message from `sourmash index list_of_zips.txt`
|
#430 will probably not implement this for For now, Although, I didn't think about supporting it just for internal storage... that should actually work. Will see what I can do :). |
Implemented in #451, which I'll be merging into #450 and #430. I'll update here when it's officially tested, but it should work now. |
#430 is merged, and all the functionality is released in sourmash_plugin_branchwater v0.9.8 🎉 . |
Fixes #347, #266, #235.
This PR adds the ability to do
index|multigather|multisearch|pairwise|search
, usingfromfiles
containing.zip
files created bymanysketch
.