Software and reference dataΒΆ

  • Quality check of raw reads

    Quality check and removal of adapters from raw reads is done using the wrapper tool “Trim Galore!” which combines adapter removal with Cutadapt and quality checks with FastQC:

  • Aligment and variant calling

    This workflow is based on the GATK Best Practices, with the addition of a second variant caller. The workflow requires specification of paths to a number of programs and reference datasets which must be downloaded and installed first:

  • Reference genome

    In the NEXT bioinformatics network we use the Genomic Data Commons (GDC) version of the GRCh38 reference genome. The reference genome and associated index files are available for download here

  • GATK bundle

    For the GATK workflow a number of reference datasets with known variants are needed. A ressource bundle with all necessary files for the GATK workflow is provided by the Broad Institute.

  • COSMIC

    Somatic variant calling using Mutect2 uses a whitelist of mutations previously seen in cancers saved in a VCF file. VCF files containing both coding and non-coding mutations can be downloaded by following instructions on the COSMIC download page.