bicycle bicycle


bicycle (bisulfite-based methylcytosine caller) is a next-generation sequencing bioinformatic pipeline aimed to analyze whole genome bisulfite sequencing data. It can process data from directional (Lister) and non-directional (Cokus) bisulfite sequencing protocols, and from single-end and paired-end sequencing, and performs methylation calls for cytosines in CG and non-CG contexts (CHG and CHH).

bicycle uses as input the bisulfite sequencing files from the different samples (FASTQ format) and a reference genome (FASTA format). It then performs: generation and indexing of Watson and Crick bisulfited versions of the reference genome, in-silico bisulfitation of sequenced reads, read alignment, error estimation in bisulfite conversion, identification of clonal and ambiguous reads, cytosine methylation detection in CG and non-CG contexts, with non-CG to CG context correction when appropriated, calculates methylation ratios, beta scores and weighted mean of cytosine methylation status, and performs genomic annotation of methylated regions, and differential methylation for cytosines (DMC) and genomic regions (DMR).

Download bicycle now!

Main features of bicycle

  • Statistical methylcytosine calling. All reference cytosines with higher methylation level (methylated vs. unmethylated reads) than expected will be called as methylcytosines. We include two empirical error computation procedures:
    • From a control genome: an unmethylated genome that allows to calculate the bisulfite conversion error rate as the per-context detected methylation level in the control genome.
    • From barcodes: in case that the experiment includes barcodes with unmethylated cytosines, bicycle considers the ratio of unconverted cytosines in barcodes as the error rate.
    • Fixed error rate given by the user (not empirical).
  • Output in a custom VCF.
  • Several filters:
    • Removal of ambiguous reads, which are those which aligned to both the Watson and Crick reference genomes.
    • Removal of non-correctly converted reads (those with more than three cytosines in a non-CG context).
    • Trim reads to 'x' mismatch (default 4).
    • Removal of "clonal" reads (those reads that shared the same 5' alignment position, leaving the read at that position that had the highest sum quality score).
  • CH to CG context correction (possible SNPs).
  • Multithread support. Both the alignments (performed with Bowtie) and the methylcytosine calling (programmed with GATK) support multi-core processing.
  • Step by step execution, allowing the re-execution of parts with different parameters.