decontX not reproducible? #378
Replies: 4 comments
-
Hi @trnguyenuka, thanks for using our tool! I am sorry about the extra hassle related to reproducibility. I don't think the underlying algorithm or random initialization for the variational inference has changed. However, the initialization related to the clustering has been updated. So the initial cluster labels that decontX uses may be somewhat different in newer versions. We rely on functions the scater and scuttle packages for the initial clustering. So if things changed in newer versions of those packages, then this may also change the output of decontX somewhat. If you can get the original decontX cluster labels and supply them in to the If the initial clustering is similar, then the decontX results should be similar. I wonder if you can see if the major cell types are being correctly clustered in your newer decontX runs. You bring up a good point related to reproducibility that it is a generally good tip to put the |
Beta Was this translation helpful? Give feedback.
-
Hi Mr. Campbell, thank you so much for your prompt and detailed response. I have found the cause of my issue: That was not because of Thank you again and all the best, |
Beta Was this translation helpful? Give feedback.
-
Thanks so much @trnguyenuka! Yes, that is correct, we also make use of the uwot package and underlying changes to that will also affect the clustering. I am going to move this to a Discussion thread so others can also see it. |
Beta Was this translation helpful? Give feedback.
-
Hi @joshua-d-campbell I have recently noticed that the results I get from I have a Docker image that provides R and all my R libraries, including this one. When I run decontX on a linux instance that has 4 cores, I get different results from when I run it on an instance that has 32 cores. I explicitly pass a seed value (42 of course ;) in both cases (although the default seed value of 12345 would be fine too.) Watching htop while decontX is running, it seems there are at least two phases of the algorithm that are parallelized. Perhaps the parallelization is actually happening in libraries that this project uses, like uwot. Let me know if you'd like me to create an issue for this, or if you need more info from me. I am using celda_1.16.1 with R 4.3.3. (Not sure if any releases since 1.16.1 might have addressed this?) |
Beta Was this translation helpful? Give feedback.
-
Hello campbio,
thank you for developing a great tool. I have been using this tool for the process of estimating contaminations in my scRNA-seq projects for a while. Recently, when I re-run an analysis pipeline for an old data, I notice a big difference in the estimated contamination level output from decontX. Since I apply a filter "AmbientRNA < 0.5", this difference changes the result of all downstream steps...
I'm quite sure that I didn't change anything in my code nor the input data. So I'm just wondering where the problem could come from. As I see in the function "decontX", there is an input argument for "seed", does that mean "decontX" rely on a stochastic algorithm?
Moreover, does different version of "celda" affect heavily on the results? I guess I have made a silly mistake; that I didn't keep a recod of SessionInfo of the run back then, so now I cannot recall which version I have used back then ... :(
Thank you very much and I'm looking forward to your reply.
Best regards,
H.N
Beta Was this translation helpful? Give feedback.
All reactions