Step 8: Differential gene expression (DGE) analysis

In Step 8, DGE analysis is computed to identify differentially expressed genes (DEG) between two conditions. Prior to computing DGE, users can add metdata containing phenotypic and experimental data to the Seurat object, which can then be used to define the groups used for DGE analysis. In order to define the contrasts used in the DGE analysis, users must modify the contrast matrices prior to submitting the command to compute DGE. ScRNAbox can compute DGE between conditions using all cell types or cell type groups. Furthermore, scRNAbox provides two frameworks for computing DGE:

1) Cell-based DGE
Cells are used as replicates and DGE is computed using the Seurat FindMarkers (Macosko et al. 2015). While FindMarkers supports several statistical frameworks to compute DGE, we set the default method in our implementation to MAST, which is tailored for scRNAseq data (Finak et al. 2015)

2) Sample-based DGE
Samples are used as replicates by applying a pseudo-bulk analysis. The Seurat AggregateExpression function is used to compute the sum of RNA counts for each gene across all cells from a particular sample (Cao et al. 2022). The DESq2 statistical framework is then used to compute DGE between conditions using the aggregated counts. (Love et al. 2014)

The following parameters are adjustable for Step 8:

DGE method	Parameter	Default	Description
General	par_save_RNA	Yes	Whether or not to export an RNA expression matrix
General	par_save_metadata	Yes	Whether or not to export a metadata dataframe
General	par_seurat_object	NULL	If users already have a Seurat object, they may provide the path to the Seurat object to initiate the pipeline at Step 7
Add metadata	par_merge_meta	orig.ident	The column from the Seurat metdata that will be used to merge the new metadata. This column must also exist in the submitted csv file contaning new metadata.
Add metadata	par_metadata	NULL	csv file containing metadata to be added to the Seurat object
Cell-based DGE with all cells	par_run_cell_based_all_cells	Yes	Whether or not to compute cell-based DGE with all cells
Cell-based DGE with cell type groups	par_run_cell_based_cell_type_groups	Yes	Whether or not to compute cell-based DGE with cell type groups
Sample-based DGE with all cells	par_run_sample_based_all_cells	Yes	Whether or not to compute sample-based DGE with all cells
Sample-based DGE with cell type groups	par_run_sample_based_cell_type_groups	Yes	Whether or not to compute sample-based DGE with cell type groups
Cell-based DGE	par_statistical_method	MAST	Which statistical framework to use for computing cell-based DGE

Add metadata

To add metadata to the Seurat object, use the following command:

bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 8 \
--addmeta T

An example of a metadata csv file is available here.
The resulting output files are deposited into ~/working_directory/step8. For a description of the outputs see here.

Contrast matrices

Cell-based DGE using all cells

To perform cell-based DGE using all cells, users must fill in the step8_contrast_cell_based_all_cells.txt file located in ~/working_directory/job_info/parameters. The contrast matrix contains the following columns:

contast_name: An informative name for the contrast. This will appear as the name of the output spreadsheet.
meta_data_variable: The metadata slot containing the Sample IDs defined in group1 and group2
group1: A list of sample IDs to be contrasted against the sample IDs listed in group2
group2:A list of sample IDs to be contrasted against the sample IDs listed in group1

Multiple contrasts can be defined in the same file. In addition, multiple samples can be listed under group1 and group 2. For example:

contrast_name meta_data_variable group1 group2
Design1 orig.ident Control1,Control2,Control3 Case1,Case2,Case3
Design3 DiseaseStatus HealthyControl Disease

Cell-based DGE using cell type groups

To perform cell-based DGE using cell type groups, users must fill in the step8_contrast_cell_based_celltype_groups.txt file located in ~/working_directory/job_info/parameters. The contrast matrix contains the following columns:

contast_name: An informative name for the contrast. This will appear as the name of the output spreadsheet.
meta_data_celltype: The metadata slot containing cell type annotations
cell_type: The cell type used to compute DGE
meta_data_variable: The metadata slot containing the Sample IDs defined in group1 and group2
group1: A list of sample IDs to be contrasted against the sample IDs listed in group2
group2:A list of sample IDs to be contrasted against the sample IDs listed in group1

Multiple contrasts can be defined in the same file. In addition, multiple samples can be listed under group1 and group 2. For example:

contrast_name meta_data_celltype cell_type meta_data_variable group1 group2
Design1 Annotation1 Neuron orig.ident Control1,Control2,Control3, Case1,Case2,Case3,
Design2 Annotation2 Microglia DiseaseStatus HealthyControl Disease

Sample-based DGE using all cells

To perform sample-based DGE using all cells, users must fill in the step8_contrast_sample_based_all_cells.txt file located in ~/working_directory/job_info/parameters. The contrast matrix contains the following columns:

ContrastName: An informative name for the contrast. This will appear as the name of the output spreadsheet.
MainContrast: The metadata slot containing the two groups used for the main contrast (e.g. case and control)
Sample_ID: The metadata slot containing the Sample IDs of the individual subjects (e.g. sample 1, sample 2, etc.)

ContrastName MainContrast SampleID
Design DiseaseStatus orig.ident

In addition, users may add additional columns if they want to further group their samples. For example, users may wich to group samples by experimental batch:

ContrastName MainContrast SampleID Batch
Design DiseaseStatus orig.ident Batch_Id

In this case, Batch is arbitrary, but Batch_ID must be a metadata slot.

Sample-based DGE using cell type groups

To perform sample-based DGE using all cells, users must fill in the step8_contrast_sample_based_celltype_groups.txt file located in ~/working_directory/job_info/parameters. The contrast matrix contains the following columns:

ContrastName: An informative name for the contrast. This will appear as the name of the output spreadsheet.
CellType: The metadata slot containing cell type annotations
MainContrast: The metadata slot containing the two groups used for the main contrast (e.g. case and control)
Sample_ID: The metadata slot containing the Sample IDs of the individual subjects (e.g. sample 1, sample 2, etc.)

ContrastName CellType MainContrast SampleID
Design Annotation1 DiseaseStatus orig.ident

In addition, users may add additional columns if they want to further group their samples. For example, users may wich to group samples by experimental batch:

ContrastName CellType MainContrast SampleID Batch
Design Annotation1 DiseaseStatus orig.ident Batch_ID

In this case, Batch is arbitrary, but Batch_ID must be a metadata slot.

Compute DGE

To compute DGE, use the following command:

bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 8 \
--rundge T

The resulting output files are deposited into ~/working_directory/step8. For a description of the outputs see here.