Standard analysis track: Midbrain dataset
Contents
- Introduction
- Downloading the midbrain dataset
- Installation
- scRNAbox: Standard Analysis Track
- Analysis of differential gene expression outputs
- Publication-ready figures
- Job Configurations
Introduction
This guide illustrates the steps taken to analyze the midbrain dataset (Smajic et al. 2022) that was presented in our pre-print manuscript. This dataset describes single-nuclei transcriptomes from the post-mortem midbrains of five individuals with Parkinson’s disease (PD) and six controls sequenced separately.
Downlaoding the midbrain dataset
In you want to use the midbrain dataset to test the scRNAbox pipeline, please see here for detialed instructions on how to download the publicly available data.
Installation
scrnabox.slurm installation
To download the latest version of scrnabox.slurm
(v0.1.52.50) run the following command:
wget https://github.com/neurobioinfo/scrnabox/releases/download/v0.1.52.5/scrnabox.slurm.zip
unzip scrnabox.slurm.zip
For a description of the options for running scrnabox.slurm
run the following command:
bash /pathway/to/scrnabox.slurm/launch_scrnabox.sh -h
If the scrnabox.slurm
has been installed properly, the above command should return the folllowing:
scrnabox pipeline version 0.1.52.50
-------------------
mandatory arguments:
-d (--dir) = Working directory (where all the outputs will be printed) (give full path)
--steps = Specify what steps, e.g., 2 to run step 2. 2-6, run steps 2 through 6
optional arguments:
-h (--help) = See helps regarding the pipeline arguments.
--method = Select your preferred method: HTO and SCRNA for hashtag, and Standard scRNA, respectively.
--msd = You can get the hashtag labels by running the following code (HTO Step 4).
--markergsea = Identify marker genes for each cluster and run marker gene set enrichment analysis (GSEA) using EnrichR libraries (Step 7).
--knownmarkers = Profile the individual or aggregated expression of known marker genes.
--referenceannotation = Generate annotation predictions based on the annotations of a reference Seurat object (Step 7).
--annotate = Add clustering annotations to Seurat object metadata (Step 7).
--addmeta = Add metadata columns to the Seurat object (Step 8).
--rundge = Perform differential gene expression contrasts (Step 8).
--seulist = You can directly call the list of Seurat objects to the pipeline.
--rcheck = You can identify which libraries are not installed.
-------------------
For a comprehensive help, visit https://neurobioinfo.github.io/scrnabox/site/ for documentation.
CellRanger installation
For information regarding the installation of CellRanger, please visit the 10X Genomics documentation. If CellRanger is already installed on your HPC system, you may skip the CellRanger installation procedures.
For our analysis of the midbrain dataset we used the 10XGenomics GRCh38-3.0.0 reference genome and CellRanger v5.0.1. For more information regarding how to prepare reference genomes for the CellRanger counts pipeline, please see the 10X Genomics documentation.
R library preparation and R package installation
We must prepapre a common R library where we will load all of the required R packages. If the required R packages are already installed on your HPC system in a common R library, you may skip the following procedures.
We will first install R
. The analyses presented in our pre-print manuscript were conducted using v4.2.1.
# install R
module load r/4.2.1
Then, we will run the installation code, which creates a directory where the R packages will be loaded and will install the required R packages:
# Folder for R packages
R_PATH=~/path/to/R/library
mkdir -p $R_PATH
# Install package
Rscript ./scrnabox.slurm/soft/R/install_packages.R $R_PATH
scRNAbox pipeline
Step 0: Set up
Now that scrnabox.slurm
, CellRanger
, R
, and the required R packages have been installed, we can proceed to our analysis with the scRNAbox pipeline. We will create a pipeline
folder designated for the analysis and run Step 0, selecting the standard analysis track (--method SCRNA
), using the following code:
mkdir pipeline
cd pipeline
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 0 \
--method SCRNA
Next, we will navigate to the scrnabox_config.ini
file in ~/pipeline/job_info/configs
to define the HPC account holder (ACCOUNT), the path to the environmental module (MODULEUSE), the path to CellRanger from the environmental module directory (CELLRANGER), CellRanger version (CELLRANGER_VERSION), R version (R_VERSION), and the path to the R library (R_LIB_PATH):
cd ~/pipeline/job_info/configs
nano scrnabox_config.ini
ACCOUNT=account-name
MODULEUSE=/path/to/environmental/module
CELLRANGER=/path/to/cellranger/from/module/directory
CELLRANGER_VERSION=5.0.1
R_VERSION=4.2.1
R_LIB_PATH=/path/to/R/library
Next, we can check to see if all of the required R packages have been properly installed using the following command:
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 0 \
--rcheck
Step 1: FASTQ to gene expression matrix
In Step 1, we will run the CellRanger counts pipeline to generate feature-barcode expression matrices from the FASTQ files. While it is possible to manually prepare the library.csv
files for each of the 11 samples in the experiment prior to running Step 1, we are going to opt for automated library preparation. For more information regarding the manual prepartion of library.csv
files, please see the the CellRanger library preparation tutorial.
For our analysis of the midbrain dataset we set the following execution parameters for Step 1 (~/pipeline/job_info/parameters/step1_par.txt
):
Parameter | Value |
---|---|
par_automated_library_prep | Yes |
par_fastq_directory | /path/to/directory/containing/fastqs |
par_sample_names | PD1, PD2, PD3, PD4, PD5, CTRL1, CTRL2, CTRL3, CTRL4, CTRL5, CTRL6 |
par_rename_samples | Yes |
par_new_sample_names | Parkinson1, Parkinson2, Parkinson3, Parkinson4, Parkinson5, Control1, Control2, Control3, Control4, Control5, Control6 |
par_paired_end_seq | TRUE |
par_ref_dir_grch | ~/genome/10xGenomics/refdata-cellranger-GRCh38-3.0.0 |
par_r1_length | NULL (commented out) |
par_r2_length | NULL (commented out) |
par_mempercode | 30 |
par_include_introns | Yes |
par_no_target_umi_filter | NULL (commented out) |
par_expect_cells | NULL (commented out) |
par_force_cells | NULL (commented out) |
par_no_bam | NULL (commented out) |
Note: The parameters file for each step is located in ~/pipeline/job_info/parameters
. For a comprehensive description of the execution parameters for each step see here.
Given that CellRanger runs a user interface and is not submitted as a job, it is recommended to run Step 1 in a 'screen', which will allow the the task to keep running if the connection is broken. To run Step 1, use the following command:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
screen -S run_Midbrain_application_case
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 1
The outputs of the CellRanger counts pipeline are deposited into ~/pipeline/step1
.
Step 2: Create Seurat object and remove ambient RNA
In Step 2, we are going to use the CellRanger-generated feature-barcode matrices to produce unique Seurat objects for each of the 11 samples. In this step, we have the option to correct the expression matrices for ambient RNA contamination; however, because Smajic et al. did not perform this analytical procedure we will skip it. In addition, we will perform cell cycle scoring. Prior to performing cell cycle scoring, we must normalize and scale the counts matrix.
For our analysis of the midbrain dataset we set the following execution parameters for Step 2 (~/pipeline/job_info/parameters/step2_par.txt
):
Parameter | Value |
---|---|
par_save_RNA | Yes |
par_save_metadata | Yes |
par_ambient_RNA | No |
par_min.cells_L | 1 |
par_normalization.method | LogNormalize |
par_scale.factor | 10000 |
par_selection.method | vst |
par_nfeatures | 2500 |
We can run Step 2 using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 2
Step 2 produces the following outputs for each sample. As an example we show the outputs for sample Control1:
step2
├── figs2
│ ├── ambient_RNA_estimation_Control1.pdf
│ ├── ambient_RNA_markers_Control1.pdf
│ ├── cell_cyle_dim_plot_Control1.pdf
│ ├── vioplot_Control1.pdf
│ └── zoomed_in_vioplot_Control1.pdf
├── info2
│ ├── estimated_ambient_RNA_Control1.txt
│ ├── MetaData_Control1.txt
│ ├── meta_info_Control1.txt
│ ├── Control1_ambient_rna_summary.rds
│ ├── Control1_RNA.txt
│ ├── sessionInfo.txt
│ └── summary_Control1.txt
└── objs2
└── Control1.rds
Note: For a comprehensive description of the outputs for each analytical step, please see the Outputs section of the scRNAbox documentation.
Figure 1. Figures produced by Step 2 of the scRNAbox pipeline. The figures for the Control1 sample are shown as an example. A) Estimated ambient RNA contamination rate (Rho) by SoupX. Estimates of the RNA contamination rate using various estimators are visualized via a frequency distribution; the true contamination rate is assigned as the most frequent estimate (red line; 5.1%). B) Log10 ratios of observed counts to expected counts for marker genes from each cluster. Clusters are defined by the CellRanger counts pipeline. The red line displays the estimated RNA contamination rate if the estimation was based entirely on the corresponding gene. C) Principal component analysis (PCA) of Seurat S and G2M cell cycle reference genes. D) Violin plots showing the distribution of cells according to quality control metrics calculated in Step 2. E) Zoomed in violin plots, from the minimum to the mean, showing the distribution of cells according to quality control metrics calculated in Step 2.
Step 3: Quality control and filtering
In Step 3, we are going to perform quality control procedures and filter out low quality cells. We are going to filter out cells with < 1000 unique RNA transcripts, < 1500 total RNA transcripts, and > 10% mitochondria and ribosomal RNA. In addition, we are going to remove mitochondrial-encoded and ribosomal genes.
For our analysis of the midbrain dataset we set the following execution parameters for Step 3 (~/pipeline/job_info/parameters/step2_par.txt
):
Parameter | Value |
---|---|
par_save_RNA | Yes |
par_save_metadata | Yes |
par_seurat_object | NULL |
par_nFeature_RNA_L | 1000 |
par_nFeature_RNA_U | NULL |
par_nCount_RNA_L | 1500 |
par_nCount_RNA_U | NULL |
par_mitochondria_percent_L | 0 |
par_mitochondria_percent_U | 10 |
par_ribosomal_percent_L | 0 |
par_ribosomal_percent_U | 10 |
par_remove_mitochondrial_genes | Yes |
par_remove_ribosomal_genes | Yes |
par_remove_genes | NULL |
par_regress_cell_cycle_genes | No |
par_regress_custom_genes | No |
par_regress_genes | NULL |
par_normalization.method | LogNormalize |
par_scale.factor | 10000 |
par_selection.method | vst |
par_nfeatures | 2500 |
par_top | 10 |
par_npcs_pca | 30 |
We can run Step 3 using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 3
Step 3 produces the following outputs for each sample. As an example we show the outputs for sample Control1:
step3
├── figs3
│ ├── dimplot_pca_Control1.pdf
│ ├── elbowplot_Control1.pdf
│ ├── filtered_QC_vioplot_Control1.pdf
│ └── VariableFeaturePlot_Control1.pdf
├── info3
│ ├── MetaData_Control1.txt
│ ├── meta_info_Control1.txt
│ ├── most_variable_genes_Control1.txt
│ ├── Control1_RNA.txt
│ ├── sessionInfo.txt
│ └── summary_Control1.txt
└── objs3
└── Control1.rds
Figure 2. Figures produced by Step 3 of the scRNAbox pipeline. The figures for the Control1 sample are shown as an example. A) Violin plots showing the distribution of cells according to quality control metrics after filtering by user-defined thresholds. B) Scatter plot showing the top 2500 most variable features; the top 10 most variable features are labelled. C) Principal component analysis (PCA) visualizing the first two principal component (PC). D) Elbow plot to visualize the percentage of variance explained by each PC.
Step 4: Doublet removal
In this Step, we are going to identify doublets (erroneous barcodes produced by two or more cells) and remove them from downstream analyses using the DoubletFinder tool (McGinnis et al. 2019). For optimal performance, DoubletFinder requires the user to define the following parameters:
- The number of statistically significant PCs (par_PCs)
- The number of artificial doublets to generate (par_pN)
- The expected doublet rate for each sample (par_expected_doublet_rate)
The number of statistically significant PCs can be informed by the elbow plots produced in Step 3; in this case the top 25 PCs should maintain a robust compression of the data across samples. DoubletFinder is largely invariant to the number of artifical doublets generated, therefore we will maintain the default parameter of 0.25. The expected doublet rate can be informed by the number of recovered cells (~8% for ~10,000 cells recovered). The number of recovered cells can be informed by the barcodes.tsv.gz
file produced by the CellRanger counts pipeline, which is located in ~/pipeline/step1/<sample>/output_folder/outs/filtered_feature_bc_matrix
.
The expected doublet rates are approximations obtained from the 10X Genomics Next GEM Single Cell 3' v3.1 documentation, which was used by Smajic et al. for library preparation.
For our analysis of the midbrain dataset we set the following execution parameters for Step 4 (~/pipeline/job_info/parameters/step4_par.txt
):
Parameter | Value |
---|---|
par_save_RNA | Yes |
par_save_metadata | Yes |
par_seurat_object | NULL |
par_RunUMAP_dims | 25 |
par_RunUMAP_n.neighbors | 65 |
par_dropDN | Yes |
par_PCs | 25 |
par_pN | 0.25 |
par_sct | FALSE |
par_sample_names | Control1, Control2, Control3, Control4, Control5, Control6, Parkinson1, Parkinson2, Parkinson3, Parkinson4, Parkinson5 |
par_expected_doublet_rate | 0.042, 0.042, 0.023, 0.04, 0.027, 0.053, 0.023, 0.053, 0.034, 0.023, 0.05 |
We can run Step 4 using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 4
Step 4 produces the following outputs for each sample. As an example we show the outputs for sample Control1:
step4
├── figs4
│ ├── Control1_DF.classifications.pdf
│ └── Control1_doublet_summary.pdf
├── info4
│ ├── MetaData_Control1.txt
│ ├── meta_info_Control1.txt
│ ├── n_predicted_doublets_Control1.txt
│ ├── Control1_RNA.txt
│ └── sessionInfo.txt
└── objs4
└── Control1.rds
Figure 3. Figures produced by Step 4 of the scRNAbox pipeline standard track. The figures for the Control1 sample are shown as an example. A) Results of doublet detection analysis with DoubletFinder. Left: violin plot displaying the distribution of the proportion of artificial nearest neighbours (pANN) across singlets and doublets. Right: a bar plot of the number of predicted singlets and doublets. B) Uniform Manifold Approximation Projection (UMAP) plots coloured by droplet assignments (singlet or doublet).
Step 5: Integration
In Step 5, we are going to integrate the individual Seurat objects to enable joint analyses across all 11 samples. We will then perform normalization, scaling and linear dimensional reduction on the integrated assay. The outputs from Step 5 will inform the optimal clustering parameters for Step 6.
For our analysis of the midbrain dataset we set the following execution parameters for Step 5:
Parameter | Value |
---|---|
par_save_RNA | Yes |
par_save_metadata | Yes |
par_seurat_object | NULL |
par_one_seurat | No |
par_integrate_seurat | Yes |
par_merge_seurat | No |
par_DefaultAssay | RNA |
par_normalization.method | LogNormalize |
par_scale.factor | 10000 |
par_selection.method | vst |
par_nfeatures | 4000 |
par_FindIntegrationAnchors_dim | 25 |
par_RunPCA_npcs | 30 |
par_RunUMAP_dims | 25 |
par_RunUMAP_n.neighbors | 65 |
par_compute_jackstraw | yes |
We can run Step 5 using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 5
Step 5 produces the following outputs:
step5
├── figs5
│ ├── integrated_DimPlot_pca.pdf
│ ├── integrated_DimPlot_umap.pdf
│ ├── integrated_elbow.pdf
│ └── integrated_Jackstraw_plot.pdf
├── info5
│ ├── int_meta_info_seu_step5.csv
│ ├── sessionInfo.txt
│ ├── seu_int_RNA.txt
│ └── seu_int_MetaData.txt
└── objs5
└── seu_step5.rds
Figure 4. Figures produced by Step 5 of the scRNAbox pipeline standard track. A) Principal component analysis (PCA) visualizing the first two principal components (PC) of the integrated assay, colour coded by sample. B) Uniform Manifold Approximation and Projections (UMAP) plot of the integrated assay, colour coded by sample. C) Elbow plot to visualize the percentage of variance explained by each PC. D) Jackstraw plot to visualize the distribution of p-values for each PC.
Step 6: Clustering
In Step 6, we will cluster the cells to indentify groups with similar expression profiles. Based on the Elbow and Jackstraw plots produced in Step 5, we are going to use the first 25 PCs as input. We will cluster the cells at clustering resolutions ranging from 0.0 to 2.0. To determine the stability of clusters at each clustering resolution, we will run the Louvain clustering algorithm 25 times for each resolution, while shuffling the order of the nodes in the graph for each iteration. We will then compute the Adjusted Rand Index (ARI) between pairs of clusters at a given clustering resolution.
For our analysis of the midbrain dataset we set the following execution parameters for Step 6:
Parameter | Value |
---|---|
par_save_RNA | Yes |
par_save_metadata | Yes |
par_seurat_object | NULL |
par_skip_integration | No |
par_FindNeighbors_dims | 25 |
par_RunUMAP_dims | 25 |
par_FindNeighbors_k.param | 30 |
par_FindNeighbors_prune.SNN | 1/15 |
par_FindClusters_resolution | 0, 0.05, 0.2, 0.6, 0.8, 1.0, 1.2, 1.4, 1.5, 1.6, 1.8, 2.0 |
par_compute_ARI | Yes |
par_RI_reps | 25 |
We can run Step 6 using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 6
Step 6 produces the following outputs:
step6
├── ARI
│ ├── ARI.pdf
│ └── clustering_ARI.xlsx
├── figs6
│ ├── clustree_int.pdf
│ ├── integrated_snn_res.0.05.pdf
│ ├── integrated_snn_res.0.2.pdf
│ ├── integrated_snn_res.0.6.pdf
│ ├── integrated_snn_res.0.8.pdf
│ ├── integrated_snn_res.0.pdf
│ ├── integrated_snn_res.1.2.pdf
│ ├── integrated_snn_res.1.4.pdf
│ ├── integrated_snn_res.1.5.pdf
│ ├── integrated_snn_res.1.6.pdf
│ ├── integrated_snn_res.1.8.pdf
│ ├── integrated_snn_res.1.pdf
│ └── integrated_snn_res.2.pdf
├── info6
│ ├── meta_info.csv
│ ├── sessionInfo.txt
│ ├── seu_MetaData.txt
│ └── seu_RNA.txt
└── objs6
└── seu_step6.rds
Figure 5. Figures produced by Step 6 of the scRNAbox pipeline. A) Uniform Manifold Approximation and Projections (UMAP) plot, coloured according to the clusters identified at a resolution of 1.5. UMAP plots are produced for each user-defined clustering resolution. B) Mean (top panel) and standard deviation (sd; middle panel) of the Adjusted RNA Index (ARI) between clustering pairs at each user-defined clustering resolution. The bottom panel shows the number of clusters at each user-defined clustering resolution. C) ClustTree plot to visualize inter-cluster dynamics at varying cluster resolutions.
Step 7: Cluster annotation
In Step 7, we are going to annotate the clusters identified in Step 6 to identify the cell types comprising the midbrain dataset. scRNAbox provides three distinct tools for cluster annotations:
- Tool 1: Cluster marker and gene set enrichment analysis (GSEA)
- Tool 2: Expression profiling of known marker genes
- Tool 3: Reference-based annotation
Additionally, users can add cluster annotations to the Seurat object.
For comprehensive description of each cluster annotation tool, please see the Step 7:Cluster annotation section of the scRNAbox documentation or our pre-print manuscript.
Tool 1: Cluster marker GSEA
Using Tool 1, we are first going to identify differentially expressed marker genes for each cluster. We must define the number of marker genes for each cluster that we want scRNAbox to report and select a clustering resolution that we want to annotate. In this case we will report the top two marker genes for each cluster at a clustering resolution of 1.5.
To identify the marker genes for each cluster, we set the following execution parameters for Step 7 (step7_par.txt
):
Annotation tool | Parameter | Value |
---|---|---|
General | par_save_RNA | No |
General | par_save_metadata | No |
General | par_seurat_object | NULL |
General | par_level_cluster | integrated_snn_res.1.5 |
Tool 1 | par_run_find_marker | Yes |
Tool 1 | par_run_enrichR | No |
Tool 1 | par_top_sel | 2 |
Tool 1 | par_db | NULL |
We can identify the marker genes for each cluster using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 7 \
--markergsea T
The above code produces the following outputs:
step7
├── figs7
│ └── marker
│ └── heatmap.pdf
├── info7
│ ├── marker
│ │ ├── cluster_just_genes.xlsx
│ │ ├── ClusterMarkers.csv
│ │ ├── ClusterMarkers.rds
│ │ ├── cluster_whole.xlsx
│ │ └── top_sel.csv
│ └── sessionInfo_find_marker.txt
└── objs7
└── seu_step7.rds
Figure 6. Figure produced by Tool 1 (marker gene gene set enrichment analysis) of scRNAbox's cluster annotation module (Step 7). A heatmap is produced to visualize the expression of the top markers genes at the cell level, stratified by cluster. The top marker genes for each cluster identified at a clustering resolution of 1.5 is shown.
Now that we have identified the marker genes for each cluster, we will perform a gene set enrichment analysis (GSEA); we will test the differentially expressed genes (DEG) in the positive direction (Log2 fold-change > 0.00) for enrichment across gene set libraries that define cell types using the EnrichR tool. For this analysis, we will use the following libraries:
- Descartes_Cell_Types_and_Tissue_2021;
- CellMarker_Augmented_2021;
- Azimuth_Cell_Types_2021 cell type libraries.
To perform GSEA, we set the following execution parameters for Step 7 (step7_par.txt
):
Annotation tool | Parameter | Value |
---|---|---|
General | par_save_RNA | No |
General | par_save_metadata | No |
General | par_seurat_object | NULL |
General | par_level_cluster | integrated_snn_res.1.5 |
Tool 1 | par_run_find_marker | No |
Tool 1 | par_run_enrichR | Yes |
Tool 1 | par_top_sel | 2 |
Tool 1 | par_db | NULL |
If your HPC allows access to the internet, we can perform GSEA using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 7 \
--markergsea T
Note: If your HPC does not allow access to the internet, you will have to run GSEA locally. For more information, please see the Step 7:Cluster annotation section of the scRNAbox documentation.
The above code produces the following outputs. As an example, we are only showing the outputs for cluster 0:
step7
└── annot_enrich
└── cluster0
├── Er_genes_clust_0_Azimuth_Cell_Types_2021.csv
├── Er_genes_clust_0_CellMarker_Augmented_2021.csv
├── Er_genes_clust_0_Descartes_Cell_Types_and_Tissue_2021.csv
├── plotenrich_clust_0_1.pdf
├── plotenrich_clust_0_2.pdf
└── plotenrich_clust_0_3.pdf
Figure 7. Figures produced by Tool 1 (marker gene gene set enrichment analysis) of scRNAbox's cluster annotation module (Step 7). Upon identifying the top marker genes for each cluster at the user-defined clustering resolution, users can perfrom a gene set enrichment analysis (GSEA) using the EnrichR tool for all marker genes in the positive direction (Log2 fold-change > 0.00). Bar plots are produced to visualize the most enriched terms for each cluster. As an example, the top enrichment results across A) Azimuth Cell Types 2021, B) Descartes Cell Types and Tissue 2021, and C) CellMarker Augmented 2021 cell type libraries for cluster 5 are shown.
Note: It is possible to identify cluster-specific marker genes and perform GSEA at the same time by setting both par_run_find_marker= "Yes"
and par_run_enrichR= "Yes
.
Tool 2: Expression profiling of known marker genes
Using Tool 2, we are going to profile the midbrain dataset for known cell type marker genes. First, we are going to visualize the expression of the marker genes used by Smajic et al. to define their clusters:
Cell type | Gene |
---|---|
Oligodendrocytes | MOBP |
OPC | VCAN |
Astrocytes | AQP4 |
Ependymal | FOXJ1 |
Microglia | CD74 |
Endothelial | CLDN5 |
Pericytes | PDGFRB |
Excitatory neurons | SLC17A6 |
Inhibitory neurons | GAD2 |
GABAergic neurons | GAD2, GRIK1 |
Dopaminergic neurons (DaN) | TH |
Degenerating DaN | CADPS2 |
To visualize these features, we set the following execution parameters for Step 7 (step7_par.txt
):
Annotation tool | Parameter | Value |
---|---|---|
General | par_save_RNA | No |
General | par_save_metadata | No |
General | par_seurat_object | NULL |
General | par_level_cluster | integrated_snn_res.1.5 |
Tool 2 | par_run_module_score | No |
Tool 2 | par_run_visualize_markers | Yes |
Tool 2 | par_module_score | NULL |
Tool 2 | par_select_features_list | MOBP, VCAN, AQP4, FOXJ1, CD74, CLDN5, GFRB, SLC17A6, GAD2, GRIK1, TH, CADPS2 |
Tool 2 | par_select_features_csv | NULL |
We can visualize the expression of these features using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 7 \
--knownmarkers T
The above code produces the following outputs:
step7
├── figs7
└── visualize_features
├── list_dot_plot.pdf
├── list_feature_plot.pdf
└── list_violin_plot.pdf
Figure 8. Figures produced by Tool 2 (profiling the expression of known marker genes) of scRNAbox's cluster annotation module (Step 7). ScRNAbox allows users to visualize the individual and aggregated expression of known marker genes. For the midbrain dataset, we visualized the individual expression of the marker genes used by Smajic et al. to define their clusters. Feature plots visualizing the expression of each individual gene at the cell level are shown. Additionally, violin plots and dot plots are produced to visualize the expression of individual genes at the cluster level.
Next, we are going to use a csv file to defineline multiple lists of cell type marker genes:
da_neurons | NPC_orStemLike | mature_neurons | excitatory_neurons | inhbitory_neurons | astrocytes | oligodendrocytes | radial_glia | epithelial | microglia |
---|---|---|---|---|---|---|---|---|---|
TH | DCX | RBFOX3 | GRIA2 | GAD1 | GFAP | MBP | PTPRC | HES1 | IBA1 |
SLC6A3 | NEUROD1 | SYP | GRIA1 | GAD2 | S100B | MOG | AIF1 | HES5 | P2RY12 |
SLC18A2 | TBR1 | VAMP1 | GRIA4 | GAT1 | AQP4 | OLIG1 | ADGRE1 | SOX2 | P2RY13 |
SOX6 | PCNA | VAMP2 | GRIN1 | PVALB | APOE | OLIG2 | VIM | SOX10 | TREM119 |
NDNF | MKI67 | TUBB3 | GRIN2B | GABR2 | SOX9 | SOX10 | TNC | NES | GPR34 |
SNCG | SOX2 | SYT1 | GRIN2A | GABR1 | SLC1A3 | PTPRZ1 | CDH1 | SIGLECH | |
ALDH1A1 | NES | BSN | GRIN3A | GBRR1 | FAM107A | NOTCH1 | TREM2 | ||
CALB1 | PAX6 | HOMER1 | GRIN3 | GABRB2 | HOPX | CX3CR1 | |||
TACR2 | SLC17A6 | GRIP1 | GABRB1 | LIFR | FCRLS | ||||
SLC17A6 | CAMK2A | GABRB3 | ITGB5 | OLFML3 | |||||
SLC32A1 | GABRA6 | IL6ST | HEXB | ||||||
OTX2 | GABRA1 | SLC1A3 | TGFBR1 | ||||||
GRP | GABRA4 | SALL1 | |||||||
LPL | TRAK2 | MERTK | |||||||
CCK | PROS1 | ||||||||
VIP |
Note: This is csv file is available here.
We will use this csv file to visualize the individual expression of each gene and calculate the module score, which will allow us to visualize the aggregated expression of each gene set.
To visualize the unique and aggregated expression of these features, we set the following execution parameters for Step 7 (step7_par.txt
):
Annotation tool | Parameter | Value |
---|---|---|
General | par_save_RNA | No |
General | par_save_metadata | No |
General | par_seurat_object | NULL |
General | par_level_cluster | integrated_snn_res.1.5 |
Tool 2 | par_run_module_score | Yes |
Tool 2 | par_run_visualize_markers | Yes |
Tool 2 | par_module_score | ~/scrnabox/tutorial/Midbrain_dataset_example_files/module_score.csv |
Tool 2 | par_select_features_list | NULL |
Tool 2 | par_select_features_csv | ~/scrnabox/tutorial/Midbrain_dataset_example_files/module_score.csv |
We can visualize the individual and aggregated expression of these features using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 7 \
--knownmarkers T
The above code produces the following outputs:
step7
├── figs7
│ ├── module_score
│ │ ├── module_score_astrocytes.pdf
│ │ ├── module_score_da_neurons.pdf
│ │ ├── module_score_epithelial.pdf
│ │ ├── module_score_excitatory_neurons.pdf
│ │ ├── module_score_inhbitory_neurons.pdf
│ │ ├── module_score_mature_neurons.pdf
│ │ ├── module_score_microglia.pdf
│ │ ├── module_score_NPC_orStemLike.pdf
│ │ ├── module_score_oligodendrocytes.pdf
│ │ └── module_score_radial_glia.pdf
│ └── visualize_features
│ ├── astrocytes_dot_plot.pdf
│ ├── astrocytes_feature_plot.pdf
│ ├── astrocytes_violin_plot.pdf
│ ├── da_neurons_dot_plot.pdf
│ ├── da_neurons_feature_plot.pdf
│ ├── da_neurons_violin_plot.pdf
│ ├── epithelial_dot_plot.pdf
│ ├── epithelial_feature_plot.pdf
│ ├── epithelial_violin_plot.pdf
│ ├── excitatory_neurons_dot_plot.pdf
│ ├── excitatory_neurons_feature_plot.pdf
│ ├── excitatory_neurons_violin_plot.pdf
│ ├── inhbitory_neurons_dot_plot.pdf
│ ├── inhbitory_neurons_feature_plot.pdf
│ ├── inhbitory_neurons_violin_plot.pdf
│ ├── mature_neurons_dot_plot.pdf
│ ├── mature_neurons_feature_plot.pdf
│ ├── mature_neurons_violin_plot.pdf
│ ├── microglia_dot_plot.pdf
│ ├── microglia_feature_plot.pdf
│ ├── microglia_violin_plot.pdf
│ ├── NPC_orStemLike_dot_plot.pdf
│ ├── NPC_orStemLike_feature_plot.pdf
│ ├── NPC_orStemLike_violin_plot.pdf
│ ├── oligodendrocytes_dot_plot.pdf
│ ├── oligodendrocytes_feature_plot.pdf
│ ├── oligodendrocytes_violin_plot.pdf
│ ├── radial_glia_dot_plot.pdf
│ ├── radial_glia_feature_plot.pdf
│ └── radial_glia_violin_plot.pdf
└── info7
└── module_score
└── geneset_by_cluster.csv
Figure 9. Figures produced by Tool 2 (profiling the expression of known marker genes) of scRNAbox's cluster annotation module (Step 7). ScRNAbox allows users to visualize the individual and aggregated expression of known marker genes. For the midbrain dataset, we used gene sets of well known marker genes for cell types in the human midbrain. The microglia gene set is shown as an example. A) Dot plot visualizing the individual expression of marker genes in the microglia gene set. B) Uniform manifold approximation and projection (UMAP) plot visualizing the module score of the microglia gene set at the cell level. The module score allows for profiling the aggregated expression of multiple genes.
Tool 3: Reference-based annotation
Using Method 3, we are going to leverage the cell-type annotations from a reference Seurat object and generate annotation predictions for the query dataset. For reference-based annotation we must define the path to a our reference Seurat object and the column of the reference Seurat object's metadata that contains the cell type annotations. For the midbrain dataset, we are going to use a reference Seurat object from Kamath et al.. The reference Seurat object was obtain from the Broad Institute Single Cell Portal
To perform reference-based annotations, we set the following execution parameters for Step 7 (step7_par.txt
):
Annotation tool | Parameter | Value |
---|---|---|
General | par_save_RNA | Yes |
General | par_save_metadata | Yes |
General | par_seurat_object | NULL |
General | par_level_cluster | integrated_snn_res.1.5 |
Tool 3 | par_reference | /path/to/Kamath/reference/seurat/object |
Tool 3 | par_reference_name | Kamath |
Tool 3 | par_level_celltype | Cell_Type |
Tool 3 | par_FindTransferAnchors_dim | 10 |
Tool 3 | par_futureglobalsmaxSize | 60000 * 1024^2 |
We can perform reference-based annotations using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 7 \
--referenceannotation T
The above code produces the following outputs:
step7
├── figs7
│ └── reference_based_annotation
│ └── Kammath_UMAP_transferred_labels.pdf
├── info7
│ ├── reference_based_annotation
│ │ └── Kammath_prediction_summary.xlsx
│ ├── sessionInfo_annotate.txt
│ ├── seu_MetaData.txt
│ └── seu_RNA.txt
└── objs7
└── seu_step7.rds
Figure 10. Figure produced by Tool 3 (reference-based annotation) of scRNAbox's cluster annotation module (Step 7). ScRNAbox allows users to predict cell type annotations of their query dataset based on the predictions of a reference dataset. Left: Uniform Manifold Approximation and Projections (UMAP) plot of clustered and annotated reference Seurat object; single-nucleus RNA sequencing (scRRNAseq) of midbrain tissue produced by Kamath et al., coloured by cell type. Right: UMAP of the label transfer predictions for the midbrain dataset, coloured by predicted cell type. Abbreviations: astro, astrocytes; da, dopaminergic neurons; endo, endothelial cells; nonda, non-dopaminergic neurons; mg, microglia; olig, oligodendrocytes; opc, oligodendrocyte precursor cells.
Annotate
Now that we have used the three cluster annotation tools available in the scRNAbox pipeline, we are going to manually curate the results and add cluster annotations to the midbrain dataset.
To add annotations, we set the following execution parameters for Step 7 (step7_par.txt
):
Annotation tool | Parameter | Value |
---|---|---|
General | par_save_RNA | Yes |
General | par_save_metadata | Yes |
General | par_seurat_object | NULL |
General | par_level_cluster | integrated_snn_res.1.5 |
Annotate | par_annotate_resolution | integrated_snn_res.1.5 |
Annotate | par_name_metadata | clustering_annotation |
Annotate | par_annotate_labels | Oligodendrocyte, Oligodendrocyte, Excitatory, Oligodendrocyte, Oligodendrocyte, Microglia, OPC, Oligodendrocyte, Oligodendrocyte , Inhibitory, Atrocyte, Endothlial, Oligodendrocyte, Microglia, Oligodendrocyte, Astrocyte, Oligodendrocyte, Oligodendrocyte, Atrocyte, Oligodendrocyte, Pericyte, Ependymal, OPC, Pericyte, GABA, Inhibitory, Oligodendrocyte, Microglia, Astrocyte, Endothelial, DaN, CADPS2, Microglia |
We can add our annotations using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 7 \
--annotate T
The above code produces the following outputs:
step7
├── figs7
│ └── annotate
│ ├── clustering_annotation_cluster_annotation.pdf
│ └── clustering_annotation_split_cluster_annotation.pdf
├── info7
│ ├── meta_info_seu_step7.txt
│ ├── sessionInfo_annotate.txt
│ ├── seu_MetaData.txt
│ └── seu_RNA.txt
└── objs7
└── seu_step7.rds
Figure 11. Annotated Seurat object produced in Step 7 of the scRNAbox pipeline. Uniform Manifold Approximation and Projections (UMAP) plots showing the final cluster annotations for the midbrain dataset. Abbreviations: astro, astrocytes; DaN, dopaminergic neurons; endo, endothelial cells; excit, excitatory neurons; GABA, GABAergic neurons; inhib, inhibitory neurons; mg, microglia; olig, oligodendrocytes; opc, oligodendrocyte precursor cells; peri, pericytes.
Step 8: Differential gene expression analysis
In Step 8, we are going to perform differential gene expression (DGE) analysis between PD and controls. Prior to computing DGE, we are going to add additional metadata to the Seurat object.
Add metadata
We are going to add metadata to the Seurat object using a csv file containing the additional metadata:
Sample_ID | DiseaseStatus |
---|---|
Parkinson1 | PD |
Parkinson2 | PD |
Parkinson3 | PD |
Parkinson4 | PD |
Parkinson5 | PD |
Control1 | HC |
Control2 | HC |
Control3 | HC |
Control4 | HC |
Control5 | HC |
Control6 | HC |
Note: This is csv file is available here.
To add metadata, we set the following execution parameters for Step 8 (step8_par.txt
):
DGE method | Parameter | Value |
---|---|---|
General | par_save_RNA | Yes |
General | par_save_metadata | Yes |
General | par_seurat_object | NULL |
Add metadata | par_merge_meta | Sample_ID |
Add metadata | par_metadata | ~/scrnabox/tutorial/Midbrain_dataset_example_files/metadata.csv |
We can add metadata to the Seurat object using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 8 \
--addmeta T
The above code produces the following outputs:
step8
├── info8
│ ├── sessionInfo_add_metadata.txt
│ ├── seu_MetaData.txt
│ └── seu_RNA.txt
└── objs8
└── seu_step8.rds
Next, we can perform DGE analysis. ScRNAbox can compute DGE between two condition conditions (in our case PD vs control) using all cells or cell type groups. Furthermore, scRNAbox provides two frameworks for computing DGE:
1) Cell-based DGE
Cells are used as replicates and DGE is computed using the Seurat FindMarkers (Macosko et al. 2015). While FindMarkers supports several statistical frameworks to compute DGE, we set the default method in our implementation to MAST, which is tailored for scRNAseq data (Finak et al. 2015)
2) Sample-based DGE
Samples are used as replicates by applying a pseudo-bulk analysis. The Seurat AggregateExpression function is used to compute the sum of RNA counts for each gene across all cells from a particular sample (Cao et al. 2022). The DESq2 statistical framework is then used to compute DGE between conditions using the aggregated counts. (Love et al. 2014)
Cell-based DGE using all cells
To perform cell-based DGE using all cells we must begin by preparing the contrast matrix (step8_contrast_cell_based_all_cells.txt
):
contrast_name meta_data_variable group1 group2
HCvPD DiseaseStatus HC PD
To perform cell-based DGE using all cells, we set the following execution parameters for Step 8 (step8_par.txt
):
DGE method | Parameter | Value |
---|---|---|
General | par_save_RNA | No |
General | par_save_metadata | No |
General | par_seurat_object | NULL |
Cell-based DGE with all cells | par_run_cell_based_all_cells | Yes |
Cell-based DGE | par_statistical_method | MAST |
We can perform cell-based DGE using all cells using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 8 \
--rundge T
The above code produces the following outputs:
step8
└── Cell_based_all_cells
└── HCvPD
├── HCvPD_DEG.csv
└── HCvPD_volcano_plot.pdf
Cell-based DGE using cell type groups
To perform cell-based DGE using cell type groups we must begin by preparing the contrast matrix (step8_contrast_sample_based_celltype_groups.txt
):
contrast_name meta_data_celltype cell_type meta_data_variable group1 group2
OligocendrocytesPDvHC clustering_annotation Oligodendrocyte DiseaseStatus HC PD
MicrogliaPDvHC clustering_annotation Microglia DiseaseStatus HC PD
ExcitatoryPDvHC clustering_annotation Excitatory DiseaseStatus HC PD
InhibitoryPDvHC clustering_annotation Inhibitory DiseaseStatus HC PD
GABAPDvHC clustering_annotation GABA DiseaseStatus HC PD
OPCPDvHC clustering_annotation OPC DiseaseStatus HC PD
AstrocytesPDvHC clustering_annotation Atrocyte DiseaseStatus HC PD
EndothelialPDvHC clustering_annotation Endothelial DiseaseStatus HC PD
PericytesPDvHC clustering_annotation Pericyte DiseaseStatus HC PD
EpendymalPDvHC clustering_annotation Ependymal DiseaseStatus HC PD
DaNPDvHC clustering_annotation DaN DiseaseStatus HC PD
To perform cell-based DGE using all cells, we set the following execution parameters for Step 8 (step8_par.txt
):
DGE method | Parameter | Value |
---|---|---|
General | par_save_RNA | No |
General | par_save_metadata | No |
General | par_seurat_object | NULL |
Cell-based DGE with cell type groups | par_run_cell_based_cell_type_groups | Yes |
Cell-based DGE | par_statistical_method | MAST |
We can perform cell-based DGE using all cells using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 8 \
--rundge T
The above code produces the following outputs:
step8
└── Cell_based_celltype_groups
├── AstrocytesPDvHC
│ ├── AstrocytesPDvHC_DEG.csv
│ └── AstrocytesPDvHC_volcano_plot.pdf
├── DaNPDvHC
│ ├── DaNPDvHC_DEG.csv
│ └── DaNPDvHC_volcano_plot.pdf
├── EndothelialPDvHC
│ ├── EndothelialPDvHC_DEG.csv
│ └── EndothelialPDvHC_volcano_plot.pdf
├── EpendymalPDvHC
│ ├── EpendymalPDvHC_DEG.csv
│ └── EpendymalPDvHC_volcano_plot.pdf
├── ExcitatoryPDvHC
│ ├── ExcitatoryPDvHC_DEG.csv
│ └── ExcitatoryPDvHC_volcano_plot.pdf
├── GABAPDvHC
│ ├── GABAPDvHC_DEG.csv
│ └── GABAPDvHC_volcano_plot.pdf
├── InhibitoryPDvHC
│ ├── InhibitoryPDvHC_DEG.csv
│ └── InhibitoryPDvHC_volcano_plot.pdf
├── MicrogliaPDvHC
│ ├── MicrogliaPDvHC_DEG.csv
│ └── MicrogliaPDvHC_volcano_plot.pdf
├── OligocendrocytesPDvHC
│ ├── OligocendrocytesPDvHC_DEG.csv
│ └── OligocendrocytesPDvHC_volcano_plot.pdf
├── OPCPDvHC
│ ├── OPCPDvHC_DEG.csv
│ └── OPCPDvHC_volcano_plot.pdf
└── PericytesPDvHC
├── PericytesPDvHC_DEG.csv
└── PericytesPDvHC_volcano_plot.pdf
Sample-based DGE using all cells
To perform sample-based DGE using all cells we must begin by preparing the contrast matrix (step8_contrast_sample_based_all_cells.txt
):
ContrastName MainContrast SampleID
PDvControl DiseaseStatus Sample_ID
To perform sample-based DGE using all cells, we set the following execution parameters for Step 8 (step8_par.txt
):
DGE method | Parameter | Value |
---|---|---|
General | par_save_RNA | No |
General | par_save_metadata | No |
General | par_seurat_object | NULL |
Sample-based DGE with all cells | par_run_sample_based_all_cells | Yes |
We can perform sample-based DGE using all cells using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 8 \
--rundge T
The above code produces the following outputs:
step8
└── Sample_based_all_cells
└── PDvControl
├── Aggregated_expression_summary.csv
├── DGE_AllCellsMainContrast HC vs PD.csv
├── DGE_AllCellsMainContrast HC vs PD.pdf
└── SampleBased_DGEsummarytable.csv
Sample-based DGE using cell type groups
To perform sample-based DGE using cell type groups we must begin by preparing the contrast matrix (step8_contrast_sample_based_celltype_groups.txt
):
ContrastName CellType MainContrast SampleID
PDvControlbulk clustering_annotation DiseaseStatus Sample_ID
To perform sample-based DGE using cell type groups, we set the following execution parameters for Step 8 (step8_par.txt
):
DGE method | Parameter | Value |
---|---|---|
General | par_save_RNA | No |
General | par_save_metadata | No |
General | par_seurat_object | NULL |
Sample-based DGE with cell type groups | par_run_sample_based_cell_type_groups | Yes |
We can perform sample-based DGE using cell type groups using the following code:
export SCRNABOX_HOME=~/scrnabox/scrnabox.slurm
export SCRNABOX_PWD=~/pipeline
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 8 \
--rundge T
The above code produces the following outputs:
step8
└── sample_based_celltype_groups
└── PDvControlbulk
├── Aggregated_expression_summary.csv
├── figs
│ ├── DGE_AstrocytesMainContrast HC vs PD.pdf
│ ├── DGE_DaNMainContrast HC vs PD.pdf
│ ├── DGE_EndothelialMainContrast HC vs PD.pdf
│ ├── DGE_EpendymalMainContrast HC vs PD.pdf
│ ├── DGE_ExcitatoryMainContrast HC vs PD.pdf
│ ├── DGE_GABAMainContrast HC vs PD.pdf
│ ├── DGE_InhibitoryMainContrast HC vs PD.pdf
│ ├── DGE_MicrogliaMainContrast HC vs PD.pdf
│ ├── DGE_OligodendrocytesMainContrast HC vs PD.pdf
│ ├── DGE_OPCMainContrast HC vs PD.pdf
│ └── DGE_PericytesMainContrast HC vs PD.pdf
├── info
│ ├── DGE_AstrocytesMainContrast HC vs PD.csv
│ ├── DGE_DaNMainContrast HC vs PD.csv
│ ├── DGE_EndothelialMainContrast HC vs PD.csv
│ ├── DGE_EpendymalMainContrast HC vs PD.csv
│ ├── DGE_ExcitatoryMainContrast HC vs PD.csv
│ ├── DGE_GABAMainContrast HC vs PD.csv
│ ├── DGE_InhibitoryMainContrast HC vs PD.csv
│ ├── DGE_MicrogliaMainContrast HC vs PD.csv
│ ├── DGE_OligodendrocytesMainContrast HC vs PD.csv
│ ├── DGE_OPCMainContrast HC vs PD.csv
│ └── DGE_PericytesMainContrast HC vs PD.csv
└── SampleBased_DGEsummarytable.csv
Figure 12. Figures produced by Step 8 of the scRNAbox pipeline. Step 8 of the scRNAbox pipeline allows users to compute DGE between groups by two methods: 1) using cells as replicates with MAST (cell-based) and 2) using samples as replicates with DESeq2 (sample-based). Volcano plots are produced for each user-defined DGE contrast. For the midbrain dataset, we leveraged both methods to compute DGE between Parkinson's disease (PD) subjects and controls across A) all cells, B) astrocytes, C) dopaminergic neurons (DaN), D) endothelial cells, E) Ependymal cells, F) excitatory neurons, G) GABAergic neurons, H) Inhibitory neurons, I) microglia, J) oligodendrocytes, K) oligodendrocyte precursor cells (OPC), and H) Pericytes.
Analysis of differential gene expression outputs
For the code used to perform downstream analysis of the differentially expressed genes presented in our pre-print manuscript see here.
Publication-ready figures
The code used to produce the publication-ready figures used in our pre-print manuscript is avaliable here here.
Job Configurations
The following job configurations were used for our analysis of the midbrain dataset. Job Configurations can be modified for each analytical step in the scrnabox_config.ini
file in ~/pipeline/job_info/configs
Step | THREADS_ARRAY | MEM_ARRAY | WALLTIME_ARRAY |
---|---|---|---|
Step2 | 4 | 40g | 00-05:00 |
Step3 | 4 | 40g | 00-05:00 |
Step4 | 4 | 40g | 00-05:00 |
Step5 | 4 | 100g | 00-05:00 |
Step6 | 4 | 100g | 00-05:00 |
Step7 MarkerGSEA | 4 | 40g | 00-05:00 |
Step7 KnownMarkers | 4 | 40g | 00-02:00 |
Step7 ReferenceAnnotation | 4 | 200g | 00-12:00 |
Step7 Annotate | 4 | 40g | 00-01:00 |
Step8 AddMeta | 4 | 40g | 00-02:00 |
Step8 RunDGE | 4 | 40g | 00-12:00 |