Step 4: Doublet removal (standard track)

In Step 4 of the standard analysis track, doublets (barcodes produced by sequencing two or more cells) are identified and optionally removed from downstream analysis using the DoubletFinder tool (McGinnis et al. 2019).


The following parameters are adjustable for Step 4 of the standard track (~/working_directory/job_info/parameters/step4_par.txt):

Parameter Default Description
par_save_RNA Yes Whether or not to export an RNA expression matrix
par_save_metadata Yes Whether or not to export a metadata dataframe
par_seurat_object NULL If users already have a Seurat object(s), they may provide the path to a directory that contains an existing Seurat object(s) to initiate the pipeline at Step 4
par_RunUMAP_dims 25 Number of dimensions to use as input features for uniform manifold approximation and projection (UMAP)
par_RunUMAP_n.neighbors 45 Number of neighboring points used in local approximations of manifold structure
par_dropDN Yes Whether or not to remove predicted doublets from downstream analyses
par_PCs 25 The number of statistically significant principal components. Can be informed by elbow plot produced in Step 3
par_pN 0.25 The number of artificial doublets to generate. DoubletFinderr is largely invariant to this parameter. We suggest keeping 0.25
par_sct FALSE Logical representing whether SCTransform was used during original Seurat object pre-processing
par_sample_names NULL A list of sample names for each sample in the experiement, corresponding to the expected doublet rates listed in the parameter below. Sample names should be the same as those used to produce the samples_info folder during the setup procedures.
par_expected_doublet_rate NULL A vector of expected doublet rates for each sample (e.g. for a 5% expected doublet rate, write 0.05). The expected doublet rates for each sample should be listed in the same order as the sample names in the above parameter. Make sure to have as many expected doublet rates listed as you have samples.

Note: For more information regarding the expected doublet rates, please see the 10X Genomics documentation.


To run Step 4, use the following command:

bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 4 

The resulting output files are deposited into ~/working_directory/step4. For a description of the outputs see here.