SPAN Peak Analyzer

+----------------------------------+
|SPAN Semi-supervised Peak Analyzer|
+----------------------------|/----+
           ,        ,
      __.-'|'-.__.-'|'-.__
    ='=====|========|====='=
    ~_^~-^~~_~^-^~-~~^_~^~^~^

SPAN Peak Analyzer is a multipurpose peak caller capable of processing a broad range of ChIP-seq, ATAC-seq, and single-cell ATAC-seq datasets.
In semi-supervised mode it is capable to robustly handle multiple replicates and noise by leveraging limited manual annotation information.

Open Access Paper: https://doi.org/10.1093/bioinformatics/btab376

Citation: Shpynov O, Dievskii A, Chernyatchik R, Tsurinov P, Artyomov MN. Semi-supervised peak calling with SPAN and JBR Genome Browser. Bioinformatics. 2021 May 21.

Latest release

See releases section for actual information.

Requirements

Download and install Java 8.

Peak calling

To analyze a single (possibly replicated) biological condition use analyze command. See details with command:

$ java -jar span.jar analyze --help

The <output.bed> file will contain predicted and FDR-controlled peaks in the ENCODE broadPeak (BED 6+3) format:

<chromosome> <peak start offset> <peak end offset> <peak_name> <score> . <coverage or fold/change> <-log p-value> <-log Q-value>

Examples:

Regular peak calling
java -Xmx8G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -p Results.peak
Semi-supervised peak calling
java -Xmx8G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -l Labels.bed -p Results.peak
Model fitting only
java -Xmx8G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -m Model.span

Differential peak calling

The compare two (possibly replicated) biological conditions use the compare. See help for details:

$ java -jar span.jar compare --help

Command line options

Parameter	Description
`-t, --treatment TREATMENT` required	Treatment file. Supported formats: BAM, BED, or BED.gz file. If multiple files are provided, they are treated as replicates. Multiple files should be separated by commas: `-t A,B,C`. Multiple files are processed as replicates on the model level.
`-c, --control CONTROL`	Control file. Multiple files should be separated by commas. A single control file, or a separate file per each treatment file is required. Follow the instructions for `-t`, `--treatment`.
`-cs, --chrom.sizes CHROMOSOMES_SIZES` required	Chromosome sizes file for the genome build used in TREATMENT and CONTROL files. Can be downloaded at UCSC.
`-b, --bin BIN_SIZE`	Peak analysis is performed on read coverage tiled into consequent bins of configurable size.
`-f, --fdr FDR`	False Discovery Rate cutoff to call significant regions.
`-g, --gap GAP`	Gap size to merge spatially close peaks. Useful for wide histone modifications.
`-p, --peaks PEAKS`	Resulting peaks file in ENCODE broadPeak* (BED 6+3) format. If omitted, only the model fitting step is performed.
`--labels LABELS`	Labels BED file. Used in semi-supervised peak calling.
`-m, --model MODEL`	This option is used to specify SPAN model path. Required for further semi-supervised peak calling.
`-w, --workdir PATH`	Path to the working directory. Used to save coverage and model cache.
`--bg-sensitivity SENSITIVITY`	Configures background sensitivity for peaks. Recommended value for generic ChIP-seq: `0.2`. Recommended value for TFs and ATAC-seq: `0.8`.
`--clip CLIP`	Clip peaks to improve peaks density using local signal coverage. Recommended value for generic ChIP-seq: `0.4`. Recommended value for TFs and ATAC-seq: `0.8`.
`--ext`	Save extended states information to model file. Required for model visualization in JBR Genome Browser.
`--fragment FRAGMENT`	Fragment size. If provided, reads are shifted appropriately. If not provided, the shift is estimated from the data. `--fragment 0` recommended for ATAC-Seq data processing.
`-kd, --keep-duplicates`	Keep duplicates. By default, SPAN filters out redundant reads aligned at the same genomic position. Recommended for bulk single cell ATAC-Seq data processing.
`-i, --iterations`	Maximum number of iterations for Expectation Maximisation (EM) algorithm.
`--tr, --threshold`	Convergence threshold for EM algorithm, use `--debug` option to see detailed info.
`--threads THREADS`	Configure the parallelism level.
`-l, --log LOG`	Path to log file, if not provided, it will be created in working directory.
`-d, --debug`	Print debug information, useful for troubleshooting.
`-q, --quiet`	Turn off standard output.
`-kc, --keep-cache`	Keep cache files. By default SPAN creates cache files in working directory and removes them after computation is done.

Example

Step-by-step example with test dataset is available here.

Pipeline

SPAN can be used as a part of snakemake pipeline.
Example of ChIP-seq analysis pipeline from raw reads to visualization and peak calling can be found here.

Build from sources

Clone bioinf-commons library under the project root.

git clone git@github.com:JetBrains-Research/bioinf-commons.git

Launch the following command line to build SPAN jar:

./gradlew shadowJar

The SPAN jar file will be generated in the folder build/libs.

FAQ

Q: What is the average running time?
A: SPAN is capable of processing a single ChIP-Seq track in less than 20 minutes on an average laptop.
Q: Which operating systems are supported?
A: SPAN is developed in modern Kotlin programming language and can be executed on any platform supported by java.
Q: Where did you get this lovely span picture?
A: From ascii.co.uk, the original author goes by the name jgs.

Errors Reporting

Use GitHub issues to suggest new features or report bugs.

Authors

JetBrains Research BioLabs

Name		Name	Last commit message	Last commit date
Latest commit History 493 Commits
gradle/wrapper		gradle/wrapper
src		src
.gitignore		.gitignore
CHANGES.txt		CHANGES.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

License

JetBrains-Research/span

Folders and files

Latest commit

History

Repository files navigation

SPAN Peak Analyzer

Latest release

Requirements

Peak calling

Differential peak calling

Command line options

Example

Pipeline

Build from sources

FAQ

Errors Reporting

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Languages