Step 5: Creation of a single Seurat object from all samples

In Step 5, individual Seurat objects from each sample are combined to enable the joint analysis across samples. Users can either merge or integrate their Seurat objects (Stuart et al. 2019). Alternatively, if experiments are limited to a single sequencing run, merging/integration can be bypassed; however, Step 5 must still be run because normalization, scaling, and linear dimensional reduction is then performed to inform the optimal parameters for clustering in Step 6.

Note: For more information regarding the difference between merging and integration, please see our pre-print manuscript here.


The following parameters are adjustable for Step 5 (~/working_directory/job_info/parameters/step5_par.txt):

Parameter Default Description
par_save_RNA Yes Whether or not to export an RNA expression matrix
par_save_metadata Yes Whether or not to export a metadata dataframe
par_seurat_object NULL If users already have a Seurat object(s), they may provide the path to a directory that contains an existing Seurat object(s) to initiate the pipeline at Step 5
par_one_seurat No Whether or not the experiment comprises of only one sequencing run. If this parameter is set to "Yes", set par_integrate_seurat and par_merge_seurat to "No".
par_integrate_seurat Yes Whether or not to integrate the samples. If "Yes", par_merge_seurat must be "No".
par_merge_seurat No Whether or not to merge the samples. If "Yes", par_integrate_seurat must be "No".
par_DefaultAssay RNA The assay to perform normalization, scaling, and linear dimensiona reduction on. For most use cases this will be RNA.
par_normalization.method LogNormalize Method to use for normalization
par_scale.factor 10000 Scale factor for scaling the data
par_selection.method vst Method for detecting top variable features
par_nfeatures 2500 Number of features to select as top variable features
par_FindIntegrationAnchors_dim 25 Which dimensions to use from the canonical correlation analysis (CCA) to specify the neighbor search space
par_RunPCA_npcs 30 Total Number of principal components to compute and store for principal component analysis (PCA)
par_RunUMAP_dims 25 Number of dimensions to use as input features for uniform manifold approximation and projection (UMAP)
par_RunUMAP_n.neighbors 45 Number of neighboring points used in local approximations of manifold structure
par_compute_jackstraw No Whether or not to perform JackStraw computation. This computation takes a long time.

To run Step 5, use the following command:

bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 5 

The resulting output files are deposited into ~/working_directory/step5. For a description of the outputs see here.