Features¶
This page is intended for:
Developers: in order to be sure they are developing the right project that fulfills requirements provided in this document.
Users: in order to get familiar with the idea of the project and suggest other features that would make it even more functional.
Cpipe should include the following features.
Supported formats¶
Cpipe should support the following format of Raw Data
as input:
Format | Type | Steps |
---|---|---|
FASTQ | Seq | |
BED | Mapped | Skip mapping |
BAM | Mapped | Skip mapping |
Cpipe may not support the following format in current version:
Format | Type | Solution |
---|---|---|
SRA | Seq | Use SRA Toolkit to convert to FASTQ format |
BED | Summit | |
BED | Peak | |
wig | Profile | |
Bigwig | Profile |
Data preprocession¶
Convert the raw sequencing data into intervals and profiles.
- Use Bowtie for tag alignment (mapping)
- Use MACS2 for peak calling
Correlation¶
Focus on the visulization of similarity between replicates.
- Draw the venn diagram for peaks if there’re less than 3 replicates (treatment or control)
Association Study¶
Focus on association between intervals (result of peak calling) and traits like genome annotation.
- CEAS: Annotate the given intervals and scores with genome features
- Conservation Plot: Calculates the PhastCons scores in several intervals sets
Motif¶
analysis the motif of the binding sites.
Quality control¶
Based on Chip-seq pipeline and Cistrome DC database, QC program will generate a comprehensive quality control report about a particular dataset as well as the relative result compared to the whole DC database.
- Basic information: Species, Cell Type, Tissue Origin, Cell line, Factor, Experiment, Platform, Treatment and Control.
- Reads Genomic Mapping QC measurement: QC of raw sequence data with FastQC, FastQC score distribution, Basic mapping QC statistics, Mappable reads ratio, Mappable Redundant rate.
- Peak calling QC measurement: Peak calling summary, High confident Peak, Peaks overlapped with DHS(Dnase Hypersensitivity sites), Velcro ratio(human only), Profile correlation within union peak regions, Peaks overlap between Replicates.
- Functional Genomic QC measurement: Peak Height distribution, Meta Gene distribution, Peak conservation score, Motif QCmeasurement analysis.