Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impute some of these cells with the trained model #57

Open
Accompany0313 opened this issue Sep 15, 2024 · 1 comment
Open

impute some of these cells with the trained model #57

Accompany0313 opened this issue Sep 15, 2024 · 1 comment

Comments

@Accompany0313
Copy link

Accompany0313 commented Sep 15, 2024

Hi Ruochi,
Nice work!

When I use Higashi to impute the data with 10K resolution, the program always breaks due to memory limitations. So can I impute some of the cells in this data set using a model trained on a complete data set? For example, I train all 4238 cells from the Lee2019 dataset, and then I separately impute the 4238 cells in batches of 1000 cells at a time. But since I was reloading the 1000 cells as I impute each batch, I wondered if this had any impact on the results.

Here is the code I used to train my model:

from higashi.Higashi_wrapper import *
import numpy as np
config = "/home/zzl/ygc/Higashi/tan2021/10K/100K_128/config.JSON"
higashi_model = Higashi(config)
higashi_model.generate_chrom_start_end()
higashi_model.extract_table()
higashi_model.create_matrix()
higashi_model.prep_model()

higashi_model.train_for_embeddings()

higashi_model.train_for_imputation_nbr_0()

higashi_model.train_for_imputation_with_nbr()

Here is my code where I impute 1000 of these cells:

from higashi.Higashi_wrapper import *
import numpy as np
config = "/home/zzl/ygc/Higashi/tan2021/10K/100K_128/config.JSON"
higashi_model = Higashi(config)
higashi_model.generate_chrom_start_end()
higashi_model.extract_table()
higashi_model.create_matrix()
higashi_model.prep_model()

higashi_model.impute_no_nbr()

higashi_model.impute_with_nbr()

@ruochiz
Copy link
Collaborator

ruochiz commented Sep 17, 2024

Hum.. The only potential error would be if the cells you input is not the first x cells in the original dataset, the cell embedding would be offset a little bit. you can still hack into the system by replacing the cell embeddings .npy file with the embeddings of those subset of cells in the same order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants