Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] TypeError: unsupported operand type(s) for -: 'NoneType' and 'int' when running write_tfrecord_species.py #6

Open
jolespin opened this issue Oct 7, 2024 · 3 comments

Comments

@jolespin
Copy link

jolespin commented Oct 7, 2024

I'd like to train a model on some genomes I have available locally. I have my gene models in GFF format so I converted to GTF with gffread, extracted the longest isoform, then tried creating tfrecords but the script failed.

I'm attaching the input files in case it's useful:

$ python Tiberius/bin/write_tfrecord_species.py --fasta ${SPECIES}.fa --gtf ${SPECIES}.longest.gtf --out tfrecords/${SPECIES}
2024-10-07 21:43:22.216480: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/efs/sandbox/sandbox/development/jolespin/Testing/EukGeneModeling/tiberius_testing/Tiberius/bin/write_tfrecord_species.py", line 371, in <module>
    main()
  File "/home/ec2-user/SageMaker/efs/sandbox/sandbox/development/jolespin/Testing/EukGeneModeling/tiberius_testing/Tiberius/bin/write_tfrecord_species.py", line 309, in main
    fasta, ref = get_species_data_hmm(genome_path=args.fasta, annot_path=args.gtf, 
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/SageMaker/efs/sandbox/sandbox/development/jolespin/Testing/EukGeneModeling/tiberius_testing/Tiberius/bin/write_tfrecord_species.py", line 119, in get_species_data_hmm
    f_chunk = fasta.get_flat_chunks(strand='+', pad=False)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/SageMaker/efs/sandbox/sandbox/development/jolespin/Testing/EukGeneModeling/tiberius_testing/Tiberius/bin/genome_fasta.py", line 143, in get_flat_chunks
    // (self.chunksize - self.overlap) + 1
        ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

pt_mag.tar.gz

@LarsGab
Copy link
Collaborator

LarsGab commented Oct 8, 2024

Hi,

thanks for using Tiberius!

The script requires you have to use the --wsize argument to specify the sequence size of each training example. For example, we used --wsize 9999 for the training with the mammalian genomes.

I seem to have forgotten to include it in the documentation, I'm sorry.

Let me know if you encounter any other issues..

Best,
Lars

@jolespin
Copy link
Author

jolespin commented Oct 9, 2024

Thanks for insight. Do you recommend a good parameter choice that I can use for diatoms? I'm testing out the model on a bunch of algae genomes that need gene calls. The alternative is MetaEuk but with my database it's using a considerable amount of resources.

@KatharinaHoff
Copy link
Member

KatharinaHoff commented Oct 9, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants