update README

pangenome · Jan 3, 2025 · ff15b0c · ff15b0c
1 parent 4cc6a98
commit ff15b0c
Showing 1 changed file with 58 additions and 15 deletions.
diff --git a/README.md b/README.md
@@ -6,19 +6,6 @@ Pangenome graphs and whole genome multiple alignments are powerful tools, but th
 Often, we would like to be able to break a small piece out of a pangenome without constructing the whole thing.
 `impg` lets us do this by projecting sequence ranges through many-way (e.g. all-vs-all) pairwise alignments built by tools like `wfmash` and `minimap2`.
 
-## What does `impg` do?
-
-At its core, `impg` lifts over ranges from a target sequence (used as reference) into the queries (the other sequences aligned to the sequence used as reference) described in alignments.
-In effect, it lets us pick up homologous loci from all genomes mapped onto our specific target region.
-This is particularly useful when you're interested in comparing a specific genomic region across different individuals, strains, or species in a pangenomic or comparative genomic setting.
-The output is provided in BED, BEDPE and PAF formats, making it straightforward to use to extract FASTA sequences for downstream use in multiple sequence alignment (like `mafft`) or pangenome graph building (e.g., `pggb` or `minigraph-cactus`).
-
-## How does it work?
-
-`impg` uses coitrees (implicit interval trees) to provide efficient range lookup over the input alignments.
-CIGAR strings are converted to a compact delta encoding.
-This approach allows for fast and memory-efficient projection of sequence ranges through alignments.
-
 ## Using `impg`
 
 Getting started with `impg` is straightforward. Here's a basic example of how to use the command-line utility:
@@ -45,7 +32,7 @@ In this example, `-p` specifies the path to the PAF file, `-r` defines the targe
 That is, for each collected range, we then find what sequence ranges are aligned onto it.
 This is done progressively until we've closed the set of alignments connected to the initial target range.
 
-### Installation
+## Installation
 
 To compile and install `impg` from source, you'll need a recent rust build toolchain and cargo.
 
@@ -57,10 +44,66 @@ To compile and install `impg` from source, you'll need a recent rust build toolc
    ```bash
    cd impg
    ```
-3. Compile the tool (requires rust build tools):
+3. Compile the tool:
    ```bash
    cargo install --force --path .
    ```
+## Commands
+
+`impg` provides three main commands:
+
+### Query
+Query overlaps in the alignment:
+```bash
+# Query a single region
+impg query -p alignments.paf -r chr1:1000-2000 
+
+# Query multiple regions from a BED file
+impg query -p alignments.paf -b regions.bed
+
+# Enable transitive overlap search
+impg query -p alignments.paf -r chr1:1000-2000 -x
+
+# Output in PAF format
+impg query -p alignments.paf -r chr1:1000-2000 -P
+```
+
+### Partition
+Partition the alignment into smaller pieces:
+```bash
+impg partition -p alignments.paf -w 1000000 -s chr1 -d 10000 -l 5000
+```
+- `-w`: Window size for partitioning
+- `-s`: Prefix of sequence names to start partitioning from
+- `-d`: Maximum distance to merge intervals in each partition
+- `-l`: Minimum length for intervals in each partition (this can lead to overlapping partitions)
+
+### Stats
+Print alignment statistics:
+```bash
+impg stats -p alignments.paf
+```
+
+### Common Options
+
+All commands support these options:
+- `-p, --paf-file`: Path to PAF file (gzipped or uncompressed)
+- `-t, --num-threads`: Number of threads (default: 1)
+- `-I, --force-reindex`: Force regeneration of index
+- `-v, --verbose`: Verbosity level (0=error, 1=info, 2=debug)
+
+## What does `impg` do?
+
+At its core, `impg` lifts over ranges from a target sequence (used as reference) into the queries (the other sequences aligned to the sequence used as reference) described in alignments.
+In effect, it lets us pick up homologous loci from all genomes mapped onto our specific target region.
+This is particularly useful when you're interested in comparing a specific genomic region across different individuals, strains, or species in a pangenomic or comparative genomic setting.
+The output is provided in BED, BEDPE and PAF formats, making it straightforward to use to extract FASTA sequences for downstream use in multiple sequence alignment (like `mafft`) or pangenome graph building (e.g., `pggb` or `minigraph-cactus`).
+
+## How does it work?
+
+`impg` uses coitrees (implicit interval trees) to provide efficient range lookup over the input alignments.
+CIGAR strings are converted to a compact delta encoding.
+This approach allows for fast and memory-efficient projection of sequence ranges through alignments.
 
 ## Authors