The ARGO Data Platform accepts raw sequencing data in both FASTQ and BAM (aligned or unaligned) format. The first processing step in the DNA-Seq Pipeline is uniformly aligning samples to the GRCh38 reference genome. For details, please see the latest version of the ARGO DNA Alignment.
- All alignments are performed using GRCh38 as the human reference genome
- Submitted FASTQ or BAM files(s)
- Submitted sequencing reads (FASTQ or BAM) are converted into lane level (i.e read group level) BAMs.
- Picard:CollectQualityYieldMetrics is used for read group level BAM QC.
- BWA-MEM (version 0.7.17-r1188) is performed to map the reads to the reference genome for each read group.
- Biobambam (version 2.0.153) is used to merge all read group level mapped BAMs into sample level BAM and mark duplicates per library.
- Samtools:stats is used to calculate Alignment QC metrics.
- Picard:CollectOxoGMetrics is used to calculate the
OxoQscore for oxidative artifact assessment.
- Aligned Reads CRAM and Aligned Reads Index
- QC metrics files