-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reference data set for automated build testing #9
Comments
I think at one point we want to compare our results with the output of the CellRanger pipeline, right? |
I was actually wondering whether we go with the CellRanger or do it in a different way? |
It's never too early! It's super useful to have these for the early development work I've found. If possible it's best to find something from yeast / an organism with a small reference genome, to keep the filesize small. Otherwise we'll need to mess around subsampling the data to a single chromosome or something to make the tests run quickly (possible, but a faff). |
@wikiselev - as to which tools to use, probably nice to create a separate issue for that. But also check out ideas.md if you haven't already. I think it was @subwaystation's idea that we'd want to compare output to cellranger, not necessarily run cellranger. Phil |
@wikiselev - Currently, we would not just "rebuild" CellRanger. I would rather regard it as a reference pipeline, but we are free to build it different, dependent on what we will find out the next few weeks. I think we should probably schedule a new hangout call for the further discussion :) |
I feel that CellRanger is quite in use and demand by lots of users, therefore rebuilding makes sense to start with it. Also keeping in mind that it's 10X own solution I doubt we can do significantly better. |
I agree - we should start with CellRanger and then improve upon that once we have something working reasonably well. |
@wikiselev I mean, don't get me wrong, CellRanger might be a good customised solution. Imho the first goal would be to put it in a Scientific WF Framework, including stable environments for the tools with Singularity as container solution. And give the community the possibility to easily install and run it on any cluster plus have it reproducible. Modularity of the tools should enhance the possibility to customize the pipeline (e.g. different mapper, etc). Moreover, this would be a good basis for future benchmarks of the pipeline. I do not completely agree with the performance. For example the duplicate removal step. I would really like to see the performance differences between different tools here, as this is a crucial step :) |
Hey guys,
maybe it is to early for that, but I was thinking about which reference data sets to use for pipeline evaluation and automated build testing.
We can use this thread to collect ideas :)
The text was updated successfully, but these errors were encountered: