Variant Analysis Workflow in Bioinformatics

Comments · 86 Views

Rare DNA changes within a population of cells are the first manifestations of mutations, which are the fuel for evolution. The research of genomic variation between species and individual organisms has been revolutionized by next-generation sequencing (NGS) technologies.

Introduction to Variant Analysis

Rare DNA changes within a population of cells are the first manifestations of mutations, which are the fuel for evolution. The research of genomic variation between species and individual organisms has been revolutionized by next-generation sequencing (NGS) technologies.

 

Variant detection and analysis refer to the sequencing and difference analysis of a genome from an individual or a population of a species using high-throughput sequencing technology in order to acquire a large proportion of genetic variation information, such as single nucleotide polymorphisms (SNPs), insertion and deletion (InDel), structural variation (SV), copy number variation CNV, and so on. Mutation detection can be used to create molecular markers and a genetic polymorphism database, laying the groundwork for uncovering evolutionary relationships and identifying functional genes.

 

Principles of Variant Analysis

Read Mapping: Because Fastq files are classified in an arbitrary order based on where they were situated on the flow cell during sequencing, reap mapping is critical. This is not at all informative, and it would be impossible to check each read individually for specific information.

 

Single Nucleotide Variant Analysis: SNVs, or differences in a single nucleotide base between members of a population or between paired chromosomes in an individual, are studied.

 

Structural Variant Analysis: SVs, or variations in the structure of an organism's chromosome, are studied. It is made up of a variety of microscopic and submicroscopic kinds of variation in a single species' genome, including deletions, duplications, copy-number variants, insertions, inversions, and translocations.

 

Applications and Advantages of Variant Analysis

The following areas can benefit from variant detection analysis on whole-genome resequencing or reduced representation genome sequencing: (1) molecular markers, (2) pathogenic gene, (3) molecular breeding, (4) mining trait genes, (5) evolution, and (6) genetic polymorphism database.

 

The following are some of the benefits of variant analysis:
- variant detection analysis is suitable for a wide range of species and does not require a reference genome;
- variation detection is more thorough;
- new mutations can be identified.

 

Some of the tools used for Variant Analysis

MultiQC/fastQC Summary Tool: The MultiQC tool is a tool that works immediately on fastQC reports to create summary reports rapidly in order to both recognize specimens that are distinct among a group and make global decisions about how to handle a set of files.

 

Trimmomatic: Trimmomatic uses two methods to trim adapters: Simple and Palindrome Each adapter sequence is tested against the reads using 'simple' trimming, and if a sufficiently accurate match is identified, the read is clipped appropriately.

 

Genome Assembly (SPAdes): SPAdes are a De Bruijn graph assembler that has quickly become the standard in many labs and workflows.

 

Human Trios: Trio (or familial) analysis has proven to be extremely effective in detecting rare childhood diseases. This first illustration of whole-exome sequencing saving a life is the most well-known publication in this field.

 

Annovar Annotations: Annovar is a new type of annotator. Annovar will put the found variants in context given a vcf file from an unknown specimen and a wealth of established data about genes, other known SNPs, gene variants, and so on.

 

Genome Analysis Toolkit (GATK): The Genome Analysis Toolkit (GATK) is a collection of programs formed by a large institute with a large website. It makes use of read mappers, read aligners, variant callers, and everything else.

 

About CD Genomics Bioinformatics Analysis

The bioinformatics analysis department of CD Genomics provides novel solutions for data-driven innovation aimed at discovering the hidden potential in biological data, tapping new insights related to life science research, and predicting new prospects.

 

References

  1. Xin J, Mark A, Afrasiabi C, et al. High-performance web services for querying gene and variant annotation. Genome biology. 2016, 17(1).
  2. Yang H, Wang K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nature protocols. 2015, 10(10).
  3. McCarthy D, Humburg P, Kanapin A, et al. Choice of transcripts and software has a large effect on variant annotation. Genome Medicine. 2014, 6(3):26.