Research group


SPAN Peak Analyzer

|SPAN Semi-supervised Peak Analyzer|
           ,        ,

SPAN is a semi-supervised multipurpose peak caller capable of processing a broad range of ChIP-seq, ATAC-seq, and single-cell ATAC-seq datasets that robustly handles multiple replicates and noise by leveraging limited manual annotation information.



  • Part of an integrated peak calling solution
  • Works with both conventional and ultra-low-input ChIP-seq data
  • Works with both narrow and wide modifications
  • Works with both single-end and paired-end libraries
  • Fragment size prediction for single-end libraries
  • Capable of processing tracks with different signal-to-noise ratios
  • Supports optional control track
  • Supports replicates on the model level
  • False Discovery Rate correction
  • Experimental: differential peak calling
  • New Multistart during model fitting


Download Description
span-0.13.5244.jar Multi-platform JAR package


  1. 4 GB RAM minimum.
  2. Download and install Java 8.
  3. Download the <build>.chrom.sizes chromosome sizes of the organism you want to analyze from the UCSC website
    Here is the file used in our study.

Usage of SPAN

java -Xmx4G -jar span-0.13.5244.jar [-h] [--version] analyze

Use java -Xmx memory settings to configure memory usage. 4 gigabytes are used in the examples.

Example of regular peak calling: java -Xmx4G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -p Results.peak

Example of supervised peak calling: java -Xmx4G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -l Labels.bed -p Results.peak

Example of model fitting: java -Xmx4G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes

Peak calling

To analyze a single (possibly replicated) biological condition use the analyze command.

-b, --bin BIN_SIZE

Peak analysis is performed on read coverage tiled into consequent bins of configurable size. Default value is 200bp, approximately the length of one nucleosome.

-t, --treatment TREATMENT

Required. ChIP-seq treatment file. Supported formats: BAM, BED, BED.gz or bigWig file. If multiple files are provided, they are treated as replicates. Multiple files should be separated by commas: -t A,B,C . Multiple files are processed as replicates on the model level.

-c, --control CONTROL

Control file. Multiple files should be separated by commas. A single control file or a separate file per each treatment file is required. 
Follow the instructions for -t, --treatment TREATMENT.

-cs, --chrom.sizes CHROMOSOMES_SIZES

Required. Chromosome sizes file for the genome build used in TREATMENT and CONTROL files. 
Can be downloaded at

--fragment FRAGMENT

Fragment size. If provided, reads are shifted appropriately. If not provided, the shift is estimated from the data. 
--fragment 0 argument is necessary for ATAC-Seq data processing.

-k, --keep-dup

Keep duplicates. By default SPAN filters out redundant reads aligned at the same genomic position.
--keep-dup argument is necessary for single cell ATAC-Seq data processing.

-m, --model MODEL

This option is used to specify SPAN model path, if not provided, model name is composed of input names and other arguments.

-p, --peaks PEAKS

Resulting peaks file in ENCODE broadPeak* (BED 6+3) format. If omitted, only the model fitting step is performed.

-f, --fdr FDR

Minimum FDR cutoff to call significant regions, default value is 1.0E-6. 
SPAN reports p- and q- values for the null hypothesis that a given bin is not enriched with a histone modification. Peaks are formed from a list of truly (in the FDR sense) enriched bins for the analyzed biological condition by thresholding the Q-value with a cutoff FDR and merging spatially close peaks using GAPoption to broad ones. This is equivalent to controlling FDR. 
q-values are are calculated from p-values using the Benjamini-Hochberg procedure.

-g, --gap GAP

Gap size to merge spatially close peaks. Useful for wide histone modifications. Default value is 5, i.e. peaks separated by 5* BIN distance or less are merged.

--labels LABELS

Labels BED file. Used in semi-supervised peak calling.

-d, --debug

Print all the debug information, used for troubleshooting.

-q, --quiet

Turn off output.

-w, --workdir PATH

Path to the working directory (stores coverage and model caches).

--threads THREADS

Configures the parallelism level.
SPAN utilizes both multithreading and specialized processor extensions like SSE2, AVX, etc. Parallel computations were performed using an open-source library viktor for parallel matrix computations in Kotlin programming language.

--ms, --multistarts

Number of multistart runs using different model initializations. Use 0 to disable (default: 5)

--ms-iterations, --msi

Number of iterations for each multistart run (default: 5)

--threshold, --tr

Convergence threshold for EM algorithm, use --debug option to see detailed info (default: 0.1)

Supervised peak calling

When LABELS parameter is given, it is used to optimize peak caller parameters for markup.

Model fitting

SPAN workflow consists of several steps:

  1. Convert raw reads to tags with user-supplied FRAGMENT parameter or maximum cross-correlation estimate.
  2. Compute coverage for the whole genome tiled into bins of BIN base pairs.
  3. Fit 3-state hidden Markov model that classifies bins as ZERO states with no coverage, LOW states of non-specific binding, and HIGH states of specific binding.
  4. Compute posterior HIGH state probability for each bin.
  5. Trained model is saved into .span binary format.
  6. Peaks are computed using trained model and FDR and GAP parameters.
  7. If LABELS are provided, optimal parameters are computed to conform with them.

Model fitting mode produces a trained model file in binary format as output, which can be:

  1. visualized directly in JBR Genome Browser
  2. used in an integrated peak calling pipeline

Output files

  • If OUTPUT file is given, it will contain predicted and FDR-controlled peaks in the ENCODE broadPeak format, i.e. BED 6+3: 
    <chromosome> <peak start> <peak end> <peak name> <score> . <coverage / foldchange> <-log p-value> <-log Q-value>
    Same format is used by MACS2 peak caller.
    • chromosome name
    • start position of the peak
    • end position of the peak
    • peak name
    • score of the peak, computed as log10(qvalue) * log(peak length). Useful for peak ranking with wide histone modifications.
    • . (represents the strand)
    • total peak coverage averaged over replicates or fold-change of the coverage for differential analysis.
    • -log10(pvalue) of the null-hypothesis that a given peak is in ZERO or LOW state.
    • -log10(qvalue), calculated from p-values using the Benjamini-Hochberg procedure. Median value for a merged peak.
  • In the case of SPAN model fitting, it produces the model file in binary format. 
    NOTE: after the model is trained once, it will be reused automatically in other modes.

Study Cases

As a benchmark we applied the SPAN peak calling approach to public conventional ChIP-seq datasets as well as to a ULI ChIP-seq dataset.

CD14+ classical monocytes tracks available from the ENCODE database were a natural choice for a conventional ChIP-seq dataset. 
We also used the data from Hocking et al. to evaluate SPAN.

Chen C et al. presented an ultra-low-input micrococcal nuclease-based native ChIP (ULI-NChIP) and sequencing method to generate genome-wide histone mark profiles with high resolution and reproducibility from as few as one thousand cells. We used these tracks to test the semi-supervised approach in extreme conditions.

SPAN produced high quality peak calling in all of these cases, see report
This suggests that SPAN Peak Analyzer can be used as a general purpose peak calling solution.


SPAN is available as a tool in the official ToolShed for Galaxy. You can ask your Galaxy administrator to install it.

Error reporting

Report any errors or comments in the public SPAN issue tracker.


Q: What is the average running time? 
A: SPAN is capable of processing a single ChIP-Seq track in less than 1 hour on an average laptop (MacBook Pro 2015).

Q: Which operating systems are supported? 
A: SPAN is developed in modern Kotlin programming language and can be executed on any platform supported by java.

Q: Is differential peak calling supported? 
A: This is an experimental feature, for details see: java -Xmx4G -jar span.jar compare -h

Q: Where is SPAN source code? 
A: Source code is available at GitHub

Q: Where did you get this lovely span picture? 
A: From, it seems the original author goes by the name jgs.

Modified Wed Aug 12 17:19:12 2020 UTC