FAQ and the Definitions of Terms Used in Neurobioinfo Core Facility Workflows

Please note that the links presented below are thought to be good resources but do not signify endorsements of the institution, or company presenting these definitions or tools.

Any suggestions for additions to any of our pages, but particularly this one, will be most welcome. Send them to neurobioinfo@mcgill.ca.

Frequently Asked Questions (FAQ):

Q001 - What is the best e-mail to use to contact you?
A001 - neurobioinfo@mcgill.ca

Q002 - What is the best way to initiate a new project with the Neuro Bioinformatics Core Facility?
A002 - Fill out the form here.

Q003 - Do I need to have some Compute Canada disk and/or compute allocation before I initiate a project with you?
A003 - No. Actually good to talk to us before you initiate your project. Contact us at neurobioinfo@mcgill.ca or fill out the intake form here.

Q004 - Where do I go to get a Compute Canada account?
A004 - After talking to us, here is the link to get started on that: Compute Canada Account at McGill.

Have more questions? Contact us at: neurobioinfo@mcgill.ca

Definitions

A short glossary of terms and resources used at the Neuro Bioinformatics Core Facility:

Commonly used Tags:

  • Definition
  • File format
  • Ressource
  • Software
  • Wikipedia

Alignment metrics

BAM

BED

CNV

  • Wikipedia: Copy number variation
  • Definition: A copy number variation (CNV) is when the number of copies of a particular gene varies from one individual to the next. Following the completion of the Human Genome Project, it became apparent that the genome experiences gains and losses of genetic material. https://www.genome.gov/genetics-glossary/Copy-Number-Variation
  • Reference: Copy number variation in human health, disease, and evolution. Zhang F, Gu W, Hurles ME, Lupski JR.Annu Rev Genomics Hum Genet. 2009;10:451-81. PMID: 19715442 

CRAM

  • File format
  • Wikipedia: CRAM (file format)
  • Definition: CRAM is a compressed columnar file format for storing biological sequences aligned to a reference sequence, initially devised by Markus Hsi-Yang Fritz et al. CRAM was designed to be an efficient reference-based alternative to the Sequence Alignment Map (SAM) and Binary Alignment Map (BAM) file formats. (from CRAM)
  • Original Reference: Efficient storage of high throughput DNA sequencing data using reference-based compression. Hsi-Yang Fritz M, Leinonen R, Cochrane G, Birney E. Genome Res. 2011 May;21(5):734-40. PMID: 21245279 

FASTQ

GVCF

  • File format
  • Also see: VCF
  • Also see GATK
  • Definition: GVCF stands for Genomic VCF. A gVCF is a kind of VCF, so the basic format specification is the same as for a regular VCF (see the spec documentation here), but a Genomic VCF contains extra information. The key difference between a regular VCF and a gVCF is that the gVCF has records for all sites, whether there is a variant call there or not. 
  • Genomic VCF (gVCF) extended format includes additional information about “blocks” that match the reference and their qualities.
  • From the GATK pages: GVCF - Genomic Variant Call Format
  • Broad Institute GitHub: What is a GVCF and how is it different from a ‘regular’ VCF?

GATK

  • Software
  • Ressource
  • Definition: A genomic analysis toolkit (GATK) is for identifying SNPs and indels in germline DNA and RNAseq data. Its scope is now expanding to include somatic short variant calling, and to tackle copy number (CNV) and structural variation (SV). GATK also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data, and bundles the popular Picard toolkit.
  • From the Broad Institute: Genome Analysis Toolkit
  • Original publication: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA.Genome Res. 2010 Sep;20(9):1297-303. PMID: 20644199 

GWAS

  • Definition: A genome-wide association study (GWAS) is an approach used in genetics research to associate specific genetic variations with particular diseases. The method involves scanning the genomes from many different people and looking for genetic markers that can be used to predict the presence of a disease. (From the NHGRI/NIH)

IGV

Mapping rate

  • Mapping rate is the “Uniquely mapped reads %”, which is defined as a proportion of uniquely mapped reads out of all input reads. For a very good library, it exceeds 90%, and for good libraries, it should be above 80%. (from PMID: 26334920)
  • A reference: Mapping RNA-seq Reads with STAR. Dobin A, Gingeras TR. Curr Protoc Bioinformatics. 2015 Sep 3;51:11.14.1-11.14.19. PMID: 26334920

miRNA

  • Wikipedia: microRNA
  • Definition: A microRNA (abbreviated miRNA) is a small single-stranded non-coding RNA molecule (containing about 22 nucleotides) found in plants, animals and some viruses, that functions in RNA silencing and post-transcriptional regulation of gene expression. miRNAs function via base-pairing with complementary sequences within mRNA molecules. As a result, these mRNA molecules are silenced, by one or more of the following processes:
    1. Cleavage of the mRNA strand into two pieces
    2. Destabilization of the mRNA through shortening of its poly(A) tail,
    3. Less efficient translation of the mRNA into proteins by ribosomes.

Picard

QC report

Reference Genome

  • Definition: A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. … Instead a reference provides a haploid mosaic of different DNA sequences from each donor. (From Wikipedia: Reference genome).
  • Genome Reference Consortium: Human reference Genome: GRCh38.p13 (latest minor release)

SAM

  • File format
  • Also see: BAM
  • Definition: Sequence Alignment Map (SAM) is a text-based format originally for storing biological sequences aligned to a reference sequence developed by Heng Li and colleagues (From Wikipedia: SAM (File Format)).
  • Sequence Alignment/Map Format Specification
  • Original publication: The Sequence Alignment/Map format and SAMtools. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. Bioinformatics. 2009 Aug 15;25(16):2078-9.  PMID: 19505943.

Structural variations

  • Also see: CNV
  • Definition: Genomic structural variation is the variation in the structure of an organism’s chromosome. It consists of many kinds of variation in the genome of one species and usually includes microscopic and submicroscopic types, such as deletions, duplications, copy-number variants, insertions, inversions and translocations. Originally, a structure variation affects a sequence length about 1kb to 3Mb, which is larger than SNPs and smaller than chromosome abnormality (though the definitions have some overlap). (from Wikipedia entry)
  • Wikipedia: Structural variations.
  • Structural variation in the human genome. Feuk L, Carson AR, Scherer SW. Nat Rev Genet. 2006 Feb;7(2):85-97. PMID: 16418744
  • Reference: Structural variant calling: the long and the short of it. Mahmoud M, Gobet N, Cruz-Dávalos DI, Mounier N, Dessimoz C, Sedlazeck FJ. Genome Biol. 2019 Nov 20;20(1):246. PMID: 31747936

VCF

End of page

Want to see other entries? Send your suggestions to neurobioinfo@mcgill.ca!