-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME
154 lines (117 loc) · 5.86 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
########################################
# CrisprCon #
# Crispr/Cas allele analysis. #
# EMBL-EBI #
########################################
# PROJECT URLs:
https://github.com/mpi2/crisprcon
http://www.mousephenotype.org/
# CONTACT:
International Mouse Phenotyping Consortium (IMPC):
Mouse Informatics Project leader:
Terry Meehan ([email protected])
############
# CONTENTS #
############
1. General Information
2. Variant calling (crispFa)
3. gRNA consequence prediction (crispRNA)
4. References
# Last updated: June 2018
##########################
# 1. General Information #
##########################
# Reference genome
All variant calls are relative to a reference genome, e.g: C57BL/6J (GRCm38).
An example of the last major release of the reference genome can be found here:
ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Mus_musculus/GRCm38/
# Annotation file
A GFF file is used for the consequence prediction of the variants.
An example of a GFF file can be found here:
ftp://ftp.ensembl.org/pub/release-92/gff3/mus_musculus/Mus_musculus.GRCm38.92.gff3.gz
# Software requirements
CrisprCon requires that BWA, SAMtools, BCFtools and BEDtools are in the path.
BCFtools csq runs with the option --force. Therefore, a version of BCFtools higher or equal than 1.9 is required.
Usage: ./crisprcon.py [-h] {crispFa,crispRNA}
crispFa Variant caller for CRISPR edited alleles
crispRNA gRNA consequence predictor. It takes the coordinates of
gRNA and it outputs a VCF with the predicted
consequences
#####################################
# 2. Variant calling (crispFa) #
#####################################
Usage: ./crisprcon.py crispFa [-h] [-i INPUT_FA] [-o OUTPUT_VCF] [-f REF_GENOME]
[-r COORDS] [-g GFF] [-d OUT_DIR] [-t TEMP_DIR]
[-p PREFIX] [-c GRNA GRNA]
Positional arguments:
-i INPUT_FA, --input INPUT_FA
Input FASTA.
-o OUTPUT_VCF, --output OUTPUT_VCF
Output VCF file.
-f REF_GENOME, --fasta-ref REF_GENOME
Reference file in fasta format (must be .fai indexed)
-r COORDS, --region COORDS
Coordinates of CRISPR/Cas target region (ie: chr:start-end)
-g GFF, --gff-annot GFF
gff3 annotation file
Optional arguments:
-d OUT_DIR, --dir OUT_DIR
Optional: Directory for the output files. If not present, it will be created. Default: current directory
-t TEMP_DIR, --temp TEMP_DIR
Optional: Directory for temporary and intermediate files. Default: creates "temp" directory at the output directory
-p PREFIX, --prefix PREFIX
Optional: Prefix for temporary files. Default: input_name
-c GRNA GRNA, --crispr-grna GRNA GRNA
Optional: gRNA pair coordinates used for CRISPR. Advised for validation of indels and potential false positives. Use different flags for more than one pair.
Eg: -c 72206978 72207000 -c 72206982 72207004
-h, --help Show this help message and exit
##############################################
# 3. gRNA consequence prediction (crispRNA) #
##############################################
Usage: crispCon crispRNA [-h] [-o OUTPUT_VCF] [-f REF_GENOME] [-g GFF]
[-c GRNA GRNA] [-s DONOR] [-d OUT_DIR] [-t TEMP_DIR]
[-p PREFIX]
Positional arguments:
-o OUTPUT_VCF, --output OUTPUT_VCF
Output VCF file.
-f REF_GENOME, --fasta-ref REF_GENOME
Reference file in fasta format (must be .fai indexed)
-g GFF, --gff-annot GFF
gff3 annotation file
-c GRNA GRNA, --crispr-grna GRNA GRNA
gRNA pair coordinates and sequence used for CRISPR (sequence chr:start-end). Use different flags for more than one pair.
Eg: -c GATTCTGGCATCATCTATGTGGG 1:72206978-72207000 -c GCTTCTGCCAATATCTATTTGGG 1:72206982-72207004
-s DONOR, --seq-donor DONOR
Sequence of the oligo donor
Optional arguments:
-d OUT_DIR, --dir OUT_DIR
Optional: Directory for the output files. If not present, it will be created. Default: current directory
-t TEMP_DIR, --temp TEMP_DIR
Optional: Directory for temporary and intermediate files. Default: creates "temp" directory at the output directory
-p PREFIX, --prefix PREFIX
Optional: Prefix for temporary files. Default: input_name
-h, --help Show this help message and exit
#################
# 5. References #
#################
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N,
Marth G, Abecasis G, Durbin R; 1000 Genome Project Data
Processing Subgroup. The Sequence Alignment/Map format and
SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. Epub 2009 Jun
8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA,
Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R;
1000 Genomes Project Analysis Group. The variant call format and
VCFtools. Bioinformatics. 2011 Aug 1;27(15):2156-8. Epub 2011 Jun
7. PubMed PMID: 21653522; PubMed Central PMCID: PMC3137218.
Petr Danecek, Shane A McCarthy; BCFtools/csq: haplotype-aware variant
consequences, Bioinformatics, Volume 33, Issue 13, 1 July 2017,
Pages 2037–2039.
Li H, Durbin R. Fast and accurate short read alignment
with Burrows-Wheeler transform. Bioinformatics. 2009 Jul
15;25(14):1754-60. Epub 2009 May 18. PubMed PMID: 19451168; PubMed
Central PMCID: PMC2705234.
Aaron R. Quinlan, Ira M. Hall; BEDTools: a flexible suite of utilities
for comparing genomic features, Bioinformatics, Volume 26, Issue 6,
15 March 2010, Pages 841–842.