CLC Germline workflow¶
- fastq files are imported to clc.
- Reads that did not pass a quality filter are ignored.
- Trimming sequence ends.
- Quality trimming:
- Phred score > 20
- trim ambiguous nucleotides
- Sequence filtering:
- Remove 1 3’ terminal nucleotide
- number of residues of reads must be in that range: [30, 500]; short and long reads are discarded.
- Map reads to reference (hg19). For algorithm see: http://www.clcbio.com/files/whitepapers/whitepaper-on-CLC-read-mapper.pdf
- Mapping parameters:
- match score 1
- mismatch cost 2
- linear gap cost
- insertion cost 3
- deletion cost 3
- length fraction 0.95
- similarity fraction 0.95
- global alignment
- ignore non-specific matches.
- Local Realignment.
- Realign unaligned ends
- perform local realignment 2 times.
- Variant Detection. See: http://www.clcbio.com/files/whitepapers/whitepaper-probabilistic-variant-caller-1.pdf
- Minimum coverage:10
- min count: 3
- min frequency: 25.
- Restrict calling to target region: as given by SureSelect.