Outputs of each step in the scRNAbox pipeline
- Introduction
- Outputs
- Step 1: FASTQ to expression matrix
- Step 2: Create Seurat object and remove ambient RNA
- Step 3: Quality control and generation of filtered data objects
- Step 4: Doublet removal (standard track)
- Step 4: Demultiplexing and doublet detection (HTO track)
- Step 5: Creation of a single Seurat object from all samples
- Step 6: Clustering
- Step 7: Cluster annotation
- Step 8: Differential gene expression
Introduction
The outputs of each Step of the scRNAbox pipeline are deposited into a step-specific folder in the working directory which contains three sub folders:
working_directory
└──step1
├── figs1
├── info1
└── objs1
- The
figs/
folder contains figures; - The
info/
folder contains text files and tables; - The
objs/
folder contains intermediate Seurat RDS objects.
Note: If users re-run an Analytical Step, the outputs from the previous run will automatically be overwritten. If you do not want to lose the outputs from a previous run, it is important to copy the materials to a separate directory. One exception to this is when annotating data in Step 7; users can re-run the annotate step as many times as they wish and each interation will add a new metadata column to the already existing Seurat object.
Outputs
Step 1: FASTQ to gene expression matrix
All of the outputs of the CellRanger counts pipeline are produced. For more information on the outputs, please visit the CellRanger documentation.
Step 2: Create Seurat object and remove ambient RNA
Output type | Name | Description |
---|---|---|
Figure | ambient_RNA_estimation_sample_name.pdf | Sample-specific probability density plot showing the ambient RNA estimation. For more information see here |
Figure | ambient_RNA_markers_sample_name.pdf | Sample-specific figure showing the marker genes used for ambient RNA estimation. For more information see [here]https://cran.r-project.org/web/packages/SoupX/vignettes/pbmcTutorial.html) |
Figure | vioplot_sample_name.pdf | Sample-specific violin plot showing the distribution of cells according to QC metrics |
Figure | zoomed_in_vioplot_sample_name.pdf | Sample-specific violin plot showing the distribution of cells according to QC metrics. The minimum value to the mean is shown. |
Figure | cell_cycle_dim_plot_sample_name.pdf | Sample-specific principal component analysis of cell-cycle genes, colour-coded by the cell cycle score of each cell. |
Info | sample_name_ambient_rna_summary.rds | Sample-specific summary of ambient RNA estimation by SoupX |
Info | sample_name_RNA.txt | Sample-specific sparse matrix of RNA assay |
Info | estimated_ambient_RNA_sample_name.txt | Sample-specific ambient RNA estimation. |
Info | MetaData_sample_name.txt | Sample-specific dataframe showing the Seurat object metadata |
Info | meta_info_sample_name.txt | Sample-specific text file showing the column names of the Seurat object metadata |
Info | summary_sample_name.txt | Sample-specific text file showing the summary of QC metrics (Minimum, 1st Quartile, Median, Mean, 3rd Quartile, Maximum) |
Info | sessionInfo.txt | Session information for the R session |
Data object | sample_name.rds | Sample-specific intermediate Seurat RDS object |
Step 3: Quality control and generation of filtered data objects
Output type | Name | Description |
---|---|---|
Figure | dimplot_pca_sample_name.pdf | Sample-specific PCA showing the first two PCs |
Figure | elbow_sample_name.pdf | Elbow plot to visualize the percentage of variance explained by each PC |
Figure | filtered_QC_vioplot_sample_name.pdf | Sample-specific violin plot showing the distribution of cells according to QC metrics after filtering |
Figure | VariableFeaturePlot_sample_name.pdf | Sample-specific figure showing the most variably expressed genes |
Info | sample_name_RNA.txt | Sample-specific sparse matrix of RNA assay |
Info | MetaData_sample_name.txt | Sample-specific dataframe showing the Seurat object metadata |
Info | meta_info_sample_name.txt | Sample-specific text file showing the column names of the Seurat object metadata |
Info | most_variable_genes_sample_name.txt | Sample-specific text file showing the column names of the Seurat object metadata |
Info | summary_sample_name.txt | Sample-specific text file showing the summary of QC metrics (Minimum, 1st Quartile, Median, Mean, 3rd Quartile, Maximum) |
Info | sessionInfo.txt | Session information for the R session |
Data object | sample_name.rds | Sample-specific intermediate Seurat RDS object |
Step 4: Doublet removal (standard track)
Output type | Name | Description |
---|---|---|
Figure | sample_nameDF.classifications.pdf | Sample-specific UMAP plot showing droplet classifications (singlet or doublet) |
Figure | sample_doublet_summary.pdf | Sample-specific violin plot showing pANN value across singlet and doublet assignments; sample-specific bar plot showing the number of singlets and doublets. |
Info | n_predicted_doublets_sample_name.txt | Sample-specific text file showing the number of identified doublets. |
Info | sample_name_RNA.txt | Sample-specific sparse matrix of RNA assay |
Info | MetaData_sample_name.txt | Sample-specific dataframe showing the Seurat object metadata |
Info | meta_info_sample_name.txt | Sample-specific text file showing the column names of the Seurat object metadata |
Info | sessionInfo.txt | Session information for the R session |
Data object | sample_name.rds | Sample-specific intermediate Seurat RDS object |
Step 4: Demultiplexing and doublet detection (HTO track)
Output type | Name | Description |
---|---|---|
Figure | run_name_DotPlot_HTO_MSD.pdf | Run-specific dot plot showing the enrichment of barcode-labels across cell assignments |
Figure | run_name_Heatmap_HTO_MSD.pdf | Run-specific heatmap showing the enrichment of barcode-labels across cell assignments |
Figure | run_name_Ridgeplot_HTO_MSD.pdf | Run-specific ridge plot showing the enrichment of barcode-labels across cell assignments |
Figure | run_name_HTO_dimplot_pca_.pdf | Run-specific PCA of antibody assay |
Figure | run_name_HTO_dimplot_umap_.pdf | Run-specific UMAP of antibody assay |
Figure | run_name_nCounts_RNA_MSD.pdf | Run-specific violin plot showing the number of unque transcripts across cell assignments |
Info | run_name.rds_old_antibody_label_MULTIseqDemuxHTOcounts.csv | Run-specific list of sample-specific barcode labels used in the experiment |
Info | run_name_MULTIseqDemuxHTOcounts.csv | Run-specific number of cells assigned to each sample |
Info | run_namefiltered_MULTIseqDemuxHTOcounts.csv | Run-specific number of cells assigned to each sample after removal of doublet and negative droplets |
Info | run_name_meta_info_.txt | Run-specific text file showing the column names of the Seurat object metadata |
Info | run_name_MetaData.txt | Run-specific dataframe showing the Seurat object metadata |
Info | run_name_RNA.txt | Run-specific sparse matrix of RNA assay |
Info | sessionInfo.txt | Session information for the R session |
Data object | run_name.rds | Run-specific intermediate Seurat RDS object |
Step 5: Creation of a single Seurat object from all samples
Output type | Name | Description |
---|---|---|
Figure | intergrated_DimPlot_pca.pdf | PCA showing the first two PCs of integrated assay, colour-coded by sample |
Figure | integrated_DimPlot_umap.pdf | UMAP of integrated assay, colour-coded by sample |
Figure | integrated_elbow.pdf | Elbow plot to visualize the percentage of variance explained by each PC for the integrated assay |
Figure | integrated_Jackstraw.pdf | Jackstraw plot to visualize the distribution of p-values for each PC for the integrated assay |
Figure | merge_DimPlot_pca.pdf | PCA showing the first two PCs of merged object, colour-coded by sample |
Figure | merge_DimPlot_umap.pdf | UMAP of merged object, colour-coded by sample |
Figure | merge_elbow.pdf | Elbow plot to visualize the percentage of variance explained by each PC for the merged object |
Figure | merge_Jackstraw.pdf | Jackstraw plot to visualize the distribution of p-values for each PC for the merged object |
Info | seu_int_RNA.txt | Sparse matrix of integrated assay |
Info | seu_int_MetaData.txt | Dataframe showing the integrated object metadata |
Info | integrated_meta_info_seu_step5.csv | Text file showing the column names of the integrated object metadata |
Info | seu_merge_RNA.txt | Sparse matrix of merged data object |
Info | seu_merge_MetaData.txt | Dataframe showing the merged object metadata |
Info | merge_meta_info_seu_step5.csv | Text file showing the column names of the merged object metadata |
Info | sessionInfo.txt | Session information for the R session |
Data object | seu_step5.rds | Integrated intermediate Seurat RDS object |
Step 6: Clustering
Output type | Name | Description |
---|---|---|
Figure | clustree_int.pdf | Clustree plot showing the stability across the user-defied clustering resolutions |
Figure | integrated_snn_res.pdf | UMAP at the user defined clustering-resolution |
Figure | ARI.pdf | Mean and standard deviation of the Adjusted Rand Index (ARI) between clustering pairs at a user-defined resolution |
Info | clustering_ARI.xlsx | Excel file showing the mean and standard deviation of the ARI between clustering pairs at a user-defined resolution |
Info | seu_RNA.txt | Sparse matrix of integrated assay |
Info | seu_MetaData.txt | Dataframe showing the Seurat object metadata |
Info | meta_info.csv | Text file showing the column names of the Seurat object metadata |
Info | sessionInfo.txt | Session information for the R session |
Data object | seu_step6.rds | Intermediate Seurat RDS object |
Step 7: Cluster annotation
Cluster annotation method | Output type | Name | Description |
---|---|---|---|
Tool 1: Cluster marker GSEA | Figure | heatmap.pdf | Heatmap showing the expression of the top marker genes across cells, stratified by cluster |
Tool 1: Cluster marker GSEA | Figure | plotenrich.pdf | Barplot showing the 20 most enriched terms for a particular cluster and cell type library |
Tool 2: Profile known markers | Figure | module_score_gene_set.pdf | UMAP plot showing the module score across cells for user-defined gene sets |
Tool 2: Profile known markers | Figure | select_feature_dot_plot.pdf | Dotplot showing the expression of user-defined features at the cluster level |
Tool 2: Profile known markers | Figure | select_feature_violin_plot.pdf | Violin plot showing the expression of user-defined features at the cluster level |
Tool 2: Profile known markers | Figure | select_feature_feature_plot.pdf | UMAP plots showing the expression of user-defined features at the cell level |
Tool 3: Reference-based annotations | Figure | UMAP_transferred_labels.pdf | UMAP plots showing the cluster annotations from the reference Seurat object projected onto the query Seurat object |
Annotate | Figure | clustering_name_cluster_annotation.pdf | UMAP plot of the integrated assay showing the cluster annotation |
Annotate | Figure | clustering_name_split_cluster_annotation.pdf | UMAP plot of the integrated assay showing the cluster annotation, split by sample |
General | Info | meta_info_seu_step7.txt | Text file showing the column names of the Seurat object metadata |
General | Info | sessionInfo_marker.txt | Session information for the R session |
Tool 1: Cluster marker GSEA | Info | cluster_just_genes.xlsx | Excel file showing the marker genes for each cluster |
Tool 1: Cluster marker GSEA | Info | cluster_whole.xlsx | Excel file showing the marker genes and corresponding summary statistics for each cluster |
Tool 1: Cluster marker GSEA | Info | ClusterMarkers.csv | csv file showing the marker genes and corresponding summary statistics for each cluster |
Tool 1: Cluster marker GSEA | Info | top_sel.csv | csv file showing the top n marker genes for each cluster. The user defined n in the execution parameters |
Tool 1: Cluster marker GSEA | Info | Er.genes.csv | Enrichment terms and the corresponding statistics for a particular cluster and cell type library |
Tool 1: Cluster marker GSEA | Data object | ClusterMarkers.rds | RDS object containing the marker genes for each cluster |
Tool 2: Module score | Info | geneset_by_cluster.csv | Mean module score across clusters for each user-defined gene set |
Tool 3: Reference-based annotations | Info | reference_predictions_summary.xlsx | Number of cells from each cluster assigned a particular annotation based of the reference |
General | Data object | seu_step7.rds | Intermediate Seurat RDS object |
Step 8: Differential gene expression
DGE method | Output type | Name | Description |
---|---|---|---|
general | Figure | contrast_name.pdf | Volcano plot of showing differentially expressed genes |
Cell-based DGE | Info | contrast_name_DEG.csv | Differentially exppresed genes identified for the user-defined contrast |
Sample-based DGE | Info | Aggregated_expression_summary.csv | Aggregated counts across user-defined sample groups |
Sample-based DGE | Info | SampleBased_DGEsummarytable.csv | Number of differentially expressed genes in the positive and negative direction for each user-defined contrast |
Sample-based DGE | Info | DGE_contrast_name.csv | Differentially exppresed genes identified for the user-defined contrast |
General | Info | seu_RNA.txt | Sparse matrix of integrated assay |
General | Info | seu_MetaData.txt | Dataframe showing the Seurat object metadata |
General | Info | meta_info.csv | Text file showing the column names of the Seurat object metadata |
General | Info | sessionInfo.txt | Session information for the R session |
General | Data object | seu_step8.rds | Intermediate Seurat RDS object |