Skip to content

Commit

Permalink
Update the docs (#236)
Browse files Browse the repository at this point in the history
* Update docstrings

* Update the README

* Update CLI reference

* Update requirements.txt for the docs

* Update test dataset download instructions

* Add docs for hictk metadata

* Update CLI reference

* Update docs for C++ API

* Update docs and tutorials for hictk

* hictk load: document supported compression algorithms

* Update docs for To*Matrix transformers

* Update CITATION.cff and add workflow to lint CITATION.cff

* Rewrite generate_cli_reference script in python

* Fix incorrect display of std::uint8_t default values in the CLI help message

* Remove unnecessary extension from docs/conf.py

* Switch to using build.commands in .readthedocs.yaml in preparation for RTD Addons

* Check for broken links when building the docs

* Add script to automate updating doc links in index.rst

* Update links to the doc in the readme [no ci]
  • Loading branch information
robomics authored Oct 11, 2024
1 parent dfd3db3 commit a905013
Show file tree
Hide file tree
Showing 44 changed files with 1,143 additions and 725 deletions.
62 changes: 62 additions & 0 deletions .github/workflows/lint-cff.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Copyright (C) 2024 Roberto Rossini <[email protected]>
# SPDX-License-Identifier: MIT

name: Lint CITATION.cff

on:
push:
branches: [main]
paths:
- ".github/workflows/lint-cff.yml"
- "CITATION.cff"

pull_request:
paths:
- ".github/workflows/lint-cff.yml"
- "CITATION.cff"

# https://stackoverflow.com/a/72408109
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

defaults:
run:
shell: bash

jobs:
lint-cff:
runs-on: ubuntu-latest
name: Lint CITATION.cff

steps:
- uses: actions/checkout@v4
with:
sparse-checkout: CITATION.cff
sparse-checkout-cone-mode: false

- name: Generate DESCRIPTION file
run: |
cat << EOF > DESCRIPTION
Package: hictk
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R:
person("First", "Last", , "[email protected]", role = c("aut", "cre"))
Description: What the package does (one paragraph).
License: MIT
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Imports:
cffr
EOF
- name: Setup R
uses: r-lib/actions/setup-r@v2

- name: Add requirements
uses: r-lib/actions/setup-r-dependencies@v2

- name: Lint CITATION.cff
run: Rscript -e 'cffr::cff_validate("CITATION.cff")'
21 changes: 11 additions & 10 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,19 @@
version: 2

build:
os: ubuntu-22.04
apt_packages:
- librsvg2-bin
os: ubuntu-24.04
tools:
python: "3.11"
python: "3.12"

sphinx:
configuration: docs/conf.py

python:
install:
- requirements: docs/requirements.txt
commands:
- pip install -r docs/requirements.txt
- docs/update_index_links.py --root-dir "$PWD" --inplace
- make -C docs linkcheck
- make -C docs html
- make -C docs latexpdf
- mkdir -p "$READTHEDOCS_OUTPUT/pdf"
- cp -r docs/_build/html "$READTHEDOCS_OUTPUT/"
- cp docs/_build/latex/hictk.pdf "$READTHEDOCS_OUTPUT/pdf/"

formats:
- pdf
35 changes: 29 additions & 6 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,19 @@ abstract: 'Blazing fast toolkit to work with .hic and .cool files.'
doi: '10.5281/zenodo.8214220'
url: 'https://github.com/paulsengroup/hictk'
repository-code: 'https://github.com/paulsengroup/hictk'
repository-artifact: 'https://github.com/paulsengroup/hictk/pkgs/container/hictk'
type: software
license: MIT
keywords:
- bioinformatics
- cxx
- conversion
- cooler
- cli-application
- hic
- cxx17
- cxx-library
- hictk
preferred-citation:
type: article
authors:
Expand All @@ -30,10 +41,22 @@ preferred-citation:
orcid: 'https://orcid.org/0000-0002-7918-5495'
email: [email protected]
affiliation: 'Department of Biosciences, University of Oslo'
doi: '10.1101/2023.11.26.568707'
url: 'https://doi.org/10.1101/2023.11.26.568707'
journal: 'Cold Spring Harbor Laboratory'
year: 2023
month: 11
doi: '10.1093/bioinformatics/btae408'
url: 'https://academic.oup.com/bioinformatics/article/40/7/btae408/7698028'
journal: 'Bioinformatics'
year: 2024
month: 06
title: 'hictk: blazing fast toolkit to work with .hic and .cool files'
abstract: 'We developed hictk, a toolkit that can transparently operate on .hic and .cool files with excellent performance. The toolkit is written in C++ and consists of a C++ library with Python bindings as well as CLI tools to perform common operations directly from the shell, including converting between .hic and .mcool formats. We benchmark the performance of hictk and compare it with other popular tools and libraries. We conclude that hictk significantly outperforms existing tools while providing the flexibility of natively working with both file formats without code duplication.'
abstract: >
Hi-C is gaining prominence as a method for mapping genome organization.
With declining sequencing costs and a growing demand for higher-resolution data, efficient tools for processing Hi-C datasets at different resolutions are crucial.
Over the past decade, the .hic and Cooler file formats have become the de-facto standard to store interaction matrices produced by Hi-C experiments in binary format.
Interoperability issues make it unnecessarily difficult to convert between the two formats and to develop applications that can process each format natively.
We developed hictk, a toolkit that can transparently operate on .hic and .cool files with excellent performance.
The toolkit is written in C++ and consists of a C++ library with Python and R bindings as well as CLI tools to perform common operations directly from the shell, including converting between .hic and .mcool formats. We benchmark the performance of hictk and compare it with other popular tools and libraries.
We conclude that hictk significantly outperforms existing tools while providing the flexibility of natively working with both file formats without code duplication.
The hictk library, Python bindings and CLI tools are released under the MIT license as a multi-platform application available at github.com/paulsengroup/hictk.
Pre-built binaries for Linux and macOS are available on bioconda.
Python bindings for hictk are available on GitHub at github.com/paulsengroup/hictkpy, while R bindings are available on GitHub at github.com/paulsengroup/hictkR.
37 changes: 19 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,41 +23,42 @@ hictk is a blazing fast toolkit to work with .hic and .cool files.

This repository hosts `hictk`: a set of CLI tools to work with Cooler, as well as `libhictk`: the C++ library underlying `hictk`.

Python bindings for `libhictk` are available at [paulsengroup/hictkpy](https://github.com/paulsengroup/hictkpy).
Python bindings for `libhictk` are available at [paulsengroup/hictkpy](https://github.com/paulsengroup/hictkpy), while R bindings are published at [paulsengroup/hictkR](https://github.com/paulsengroup/hictkR).

hictk is capable of reading files in `.cool`, `.mcool`, `.scool` and `.hic` format (including hic v9) as well as writing `.hic`, `.cool` and `.mcool` files.

## Installing hictk

hictk is developed on Linux and tested on Linux, MacOS and Windows.

hictk can be installed using containers, bioconda or directly from source. Refer to [Installation](https://hictk.readthedocs.io/en/latest/installation.html) for more information.
hictk can be installed using containers, bioconda or directly from source. Refer to [Installation](https://hictk.readthedocs.io/en/stable/installation.html) for more information.

## Running hictk

hictk provides the following subcommands:

| subcommand | description |
| ---------------------- | ---------------------------------------------------------------------------------- |
| **balance** | Balance HiC matrices using ICE, SCALE or VC. |
| **convert** | Convert matrices to a different format. |
| **dump** | Dump data from .hic and Cooler files to stdout. |
| **fix-mcool** | Fix corrupted .mcool files. |
| **load** | Build .cool and .hic files from interactions in various text formats. |
| **merge** | Merge multiple Cooler or .hic files into a single file. |
| **rename-chromosomes** | Rename chromosomes found in a Cooler file. |
| **validate** | Validate .hic and Cooler files. |
| **zoomify** | Convert single-resolution Cooler and .hic files to multi-resolution by coarsening. |

Refer to [Quickstart (CLI)](https://hictk.readthedocs.io/en/latest/quickstart_cli.html) and [CLI Reference](https://hictk.readthedocs.io/en/latest/cli_reference.html) for more details.
| subcommand | description |
| ---------------------- | ---------------------------------------------------------------------------------------------- |
| **balance** | Balance Hi-C files using ICE, SCALE, or VC. |
| **convert** | Convert Hi-C files between different formats. |
| **dump** | Read interactions and other kinds of data from .hic and Cooler files and write them to stdout. |
| **fix-mcool** | Fix corrupted .mcool files. |
| **load** | Build .cool and .hic files from interactions in various text formats. |
| **merge** | Merge multiple Cooler or .hic files into a single file. |
| **metadata** | Print file metadata to stdout. |
| **rename-chromosomes** | Rename chromosomes found in a Cooler file. |
| **validate** | Validate .hic and Cooler files. |
| **zoomify** | Convert single-resolution Cooler and .hic files to multi-resolution by coarsening. |

Refer to [Quickstart (CLI)](https://hictk.readthedocs.io/en/stable/quickstart_cli.html) and [CLI Reference](https://hictk.readthedocs.io/en/stable/cli_reference.html) for more details.

## Using libhictk

libhictk can be installed in various way, including with Conan and CMake FetchContent. Section [Quickstart (API)](https://hictk.readthedocs.io/en/latest/quickstart_api.html) of hictk documentation contains further details on how this can be accomplished.
libhictk can be installed in various way, including with Conan and CMake FetchContent. Section [Quickstart (API)](https://hictk.readthedocs.io/en/stable/quickstart_api.html) of hictk documentation contains further details on how this can be accomplished.

[Quickstart (API)](https://hictk.readthedocs.io/en/latest/quickstart_api.html) also showcases the basic functionality offered by libhictk. For more complex examples refer to the sample programs under the [examples/](./examples/) folder as well as to the [source code](./src/hictk/) of hictk.
[Quickstart (API)](https://hictk.readthedocs.io/en/stable/quickstart_api.html) also showcases the basic functionality offered by libhictk. For more complex examples refer to the sample programs under the [examples/](./examples/) folder as well as to the [source code](./src/hictk/) of hictk.

The public C++ API of hictk is documented in the [C++ API Reference](https://hictk.readthedocs.io/en/latest/cpp_api/index.html) section of hictk documentation.
The public C++ API of hictk is documented in the [C++ API Reference](https://hictk.readthedocs.io/en/stable/cpp_api/index.html) section of hictk documentation.

## Citing

Expand Down
Binary file added docs/assets/4dnucleome_bug_notice.pdf
Binary file not shown.
28 changes: 15 additions & 13 deletions docs/balancing_matrices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,22 @@ The following is an example showing how to balance a .cool file using ICE.
user@dev:/tmp$ hictk balance ice 4DNFIZ1ZVXC8.mcool::/resolutions/1000
[2023-10-01 13:18:02.119] [info]: Running hictk v0.0.2-f83f93e
[2023-10-01 13:18:02.130] [info]: Writing interactions to temporary file /tmp/4DNFIZ1ZVXC8.tmp0...
[2023-10-01 13:18:05.098] [info]: Initializing bias vector...
[2023-10-01 13:18:05.099] [info]: Masking rows with fewer than 10 nnz entries...
[2023-10-01 13:18:06.298] [info]: Masking rows using mad_max=5...
[2023-10-01 13:18:06.971] [info]: Iteration 1: 36874560.192587376
[2023-10-01 13:18:07.634] [info]: Iteration 2: 21347543.04950776
[2023-10-01 13:18:08.307] [info]: Iteration 3: 7819314.542541969
[2024-09-26 16:02:19.731] [info]: Running hictk v1.0.0-fbdcb591
[2024-09-26 16:02:19.731] [info]: balancing using ICE (GW_ICE)
[2024-09-26 16:02:19.734] [info]: Writing interactions to temporary file /tmp/hictk-tmp-XXXX1ZC9FF/4DNFIZ1ZVXC8.mcool.tmp...
[2024-09-26 16:02:22.480] [info]: Initializing bias vector...
[2024-09-26 16:02:22.482] [info]: Masking rows with fewer than 10 nnz entries...
[2024-09-26 16:02:23.392] [info]: Masking rows using mad_max=5...
[2024-09-26 16:02:23.860] [info]: Iteration 1: 36452362.243888594
[2024-09-26 16:02:24.327] [info]: Iteration 2: 21649057.88060747
[2024-09-26 16:02:24.792] [info]: Iteration 3: 7890065.688497526
...
[2023-10-01 13:19:20.365] [info]: Iteration 105: 2.1397932757529552e-05
[2023-10-01 13:19:21.146] [info]: Iteration 106: 1.6604770462001875e-05
[2023-10-01 13:19:21.870] [info]: Iteration 107: 1.2885285040054778e-05
[2023-10-01 13:19:22.608] [info]: Iteration 108: 9.99900768769869e-06
[2023-10-01 13:19:22.619] [info]: Writing weights to 4DNFIZ1ZVXC8.mcool::/resolutions/1000/bins/weight...
[2024-09-26 16:03:12.285] [info]: Iteration 107: 2.0533518142916073e-05
[2024-09-26 16:03:12.752] [info]: Iteration 108: 1.601698258037195e-05
[2024-09-26 16:03:13.216] [info]: Iteration 109: 1.2493901433163442e-05
[2024-09-26 16:03:13.681] [info]: Iteration 110: 9.745791018854495e-06
[2024-09-26 16:03:13.707] [info]: Writing weights to 4DNFIZ1ZVXC8.mcool::/resolutions/1000/bins/GW_ICE...
[2024-09-26 16:03:13.708] [info]: Linking weights to 4DNFIZ1ZVXC8.mcool::/resolutions/1000/bins/weight...
When balancing files in .mcool or .hic formats, all resolutions are balanced.

Expand Down
Loading

0 comments on commit a905013

Please sign in to comment.