You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to train a model on some genomes I have available locally. I have my gene models in GFF format so I converted to GTF with gffread, extracted the longest isoform, then tried creating tfrecords but the script failed.
I'm attaching the input files in case it's useful:
$ python Tiberius/bin/write_tfrecord_species.py --fasta ${SPECIES}.fa --gtf ${SPECIES}.longest.gtf --out tfrecords/${SPECIES}
2024-10-07 21:43:22.216480: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/efs/sandbox/sandbox/development/jolespin/Testing/EukGeneModeling/tiberius_testing/Tiberius/bin/write_tfrecord_species.py", line 371, in <module>
main()
File "/home/ec2-user/SageMaker/efs/sandbox/sandbox/development/jolespin/Testing/EukGeneModeling/tiberius_testing/Tiberius/bin/write_tfrecord_species.py", line 309, in main
fasta, ref = get_species_data_hmm(genome_path=args.fasta, annot_path=args.gtf,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/SageMaker/efs/sandbox/sandbox/development/jolespin/Testing/EukGeneModeling/tiberius_testing/Tiberius/bin/write_tfrecord_species.py", line 119, in get_species_data_hmm
f_chunk = fasta.get_flat_chunks(strand='+', pad=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/SageMaker/efs/sandbox/sandbox/development/jolespin/Testing/EukGeneModeling/tiberius_testing/Tiberius/bin/genome_fasta.py", line 143, in get_flat_chunks
// (self.chunksize - self.overlap) + 1
~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'
The script requires you have to use the --wsize argument to specify the sequence size of each training example. For example, we used --wsize 9999 for the training with the mammalian genomes.
I seem to have forgotten to include it in the documentation, I'm sorry.
Thanks for insight. Do you recommend a good parameter choice that I can use for diatoms? I'm testing out the model on a bunch of algae genomes that need gene calls. The alternative is MetaEuk but with my database it's using a considerable amount of resources.
Please send me an email (or give me a phone call). I am willing to share
results on this, possibly collaborate, but the current results are not so
great that I want to publish a parameter set, yet.
Josh L. Espinoza ***@***.***> schrieb am Mi. 9. Okt. 2024 um
18:18:
I'd like to train a model on some genomes I have available locally. I have my gene models in GFF format so I converted to GTF with
gffread
, extracted the longest isoform, then tried creatingtfrecords
but the script failed.I'm attaching the input files in case it's useful:
pt_mag.tar.gz
The text was updated successfully, but these errors were encountered: