Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genome_fetcher.py error #26

Open
takeiga opened this issue Mar 29, 2024 · 4 comments
Open

genome_fetcher.py error #26

takeiga opened this issue Mar 29, 2024 · 4 comments
Assignees
Labels
bug Something isn't working request New feature or request

Comments

@takeiga
Copy link

takeiga commented Mar 29, 2024

After updating to 0.4.3, genome_fetcher.py reported error:

DAJIN2 --control barcode01 --sample barcode02 --allele actc1L_cont_knockin.fa --name 02 --genome xenLae2 --threads 8
2024-03-29 11:48:17, INFO, barcode01 is now processing...
2024-03-29 11:48:19, ERROR, Catch an Exception. Traceback:
Traceback (most recent call last):
  File "/home/igawa/miniconda3/envs/dajin2/bin/DAJIN2", line 8, in <module>
    sys.exit(execute())
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 236, in execute
    execute_single_mode(arguments)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 47, in execute_single_mode
    core.execute_control(arguments)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/core.py", line 26, in execute_control
    ARGS: FormattedInputs = preprocess.format_inputs(arguments)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/input_formatter.py", line 96, in format_inputs
    genome_coordinates = get_genome_coordinates(genome_urls, fasta_alleles, is_cache_genome, tempdir)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/input_formatter.py", line 67, in get_genome_coordinates
    genome_coordinates = preprocess.fetch_coordinates(genome_coordinates, genome_urls, fasta_alleles["control"])
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/genome_fetcher.py", line 29, in fetch_coordinates
    coordinate_start = fetch_seq_coordinates(genome, blat_url, seq_start)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/genome_fetcher.py", line 18, in fetch_seq_coordinates
    raise ValueError(f"{seq[:60]}... is not found in {genome}")
ValueError: TTATAATTCAGCATCTAGACAGCAGCAACAAGCATTACCCTGGAATGGTTCATAATATGC... is not found in xenLae2

I confirmed run completed successfully when I replaced older genome_fetcher.py, so it may come from updated one.
Thank you for your effort, anyway!

@takeiga takeiga added the bug Something isn't working label Mar 29, 2024
@akikuno
Copy link
Owner

akikuno commented Mar 29, 2024

I appreciate your reports!

As you mentioned, I've updated genome_fetcher.py to ensure a perfect match between the control sequence and its reference. Consequently, there's a possibility that your control sequence in actc1L_cont_knockin.fa might not align with the reference. Could you possibly share your control sequence in actc1L_cont_knockin.fa? I'd like to examine the cause of error using your control sequence if possible.

I confirmed run completed successfully when I replaced older genome_fetcher.py, so it may come from updated one.

I am pleased to hear this. Your feedback is greatly appreciated! Thank you for your valuable contribution.

@takeiga
Copy link
Author

takeiga commented Mar 29, 2024

Our control sequence data is derived from the latest version of NCBI's Xenopus laevis reference genome (Xenopus_laevis_v10.1). But I used old ref genome, UCSC's xenLae2 for DAJIN2 prerequisite and those two different version of the genome contains nucleotide differences. So I understood the reason why the genome_fetcher.py showed error and will use UCSC's data if needed.
I personally hope we can use NCBI data for reference data in further update of DAJIN2 if possible.

@akikuno akikuno added the request New feature or request label Mar 31, 2024
@akikuno
Copy link
Owner

akikuno commented Mar 31, 2024

Thank you for your description!
I'll rethink the method for obtaining genome coordinates.
It might take some time, but I'll share the information here once it's updated.

@akikuno akikuno self-assigned this Jul 28, 2024
@akikuno
Copy link
Owner

akikuno commented Oct 19, 2024

[Memorandum]

For genomes not included in UCSC DAS, use GGGenome to convert genome coordinates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants