Step 4: Demultiplexing and doublet detection (HTO track)

In Step 4 of the HTO track, Seurat’s implementation (MULTIseqDemux) of the tag assignment algorithm outlined in Multi-seq is used to demultiplex pooled samples and identify doublets according to the expression matrices of the sample-specific barcodes (McGinnis et al 2019).

The following parameters are adjustable for Step 4 (~/working_directory/job_info/parameters/step4_par.txt):

Parameter	Default	Description
par_save_RNA	Yes	Whether or not to export an RNA expression matrix
par_save_metadata	Yes	Whether or not to export a metadata dataframe
par_seurat_object	NULL	If users already have a Seurat object(s), they may provide the path to a directory that contains an existing Seurat object(s) to initiate the pipeline at Step 4
par_normalization.method	CLR	Method for normalizing the HTO assay
par_scale.factor	1000	Scale factor for scaling the HTO assay
par_selection.method	vst	Method for selecting the most variable features in the HTO assay
par_nfeatures	5	Number of features to select as top variable features for the HTO assay. This value is dependent on the number of sample specific barcodes used in the experiment
par_dims_umap	5	Number of dimensions to use as input features for uniform manifold approximation and projection (UMAP) of HTO assay
par_n.neighbor	65	Number of neighboring points to use in local approximations of manifold structure
par_dimensionality_reduction	Yes	Whether or not to perform linear dimensionality reduction on the HTO assay
par_npcs_pca	30	Total Number of principal components to compute and store for principal component analysis (PCA) of HTO assay
par_dropDN	Yes	Whether or not to remove predicted doublets and negatives from downstream analyses
par_label_dropDN	Doublet, Negative	Labels used to identify doublet and negative droplets
par_quantile	0.9	The quantile to use for droplet classification using MULTIseqDemux
par_autoThresh	TRUE	Whether or not to perform automated threshold finding to define the best quantile for droplet classification using MULTIseqDemux
par_maxiter	5	Maximum number of iterations to use if autoThresh = TRUE
par_RidgePlot_ncol	3	Number of columns used to display RidgePlots, which visualizes the enrichment of barcode labels across samples
par_old_antibody_label	NULL	If you wish to rename the barcode labels, first list the existing barcode labels in this parameter. old antibody labels can be identified in the "_old_antibody_label_MULTIseqDemuxHTOcounts" file produced by running Step 4 msd
par_new_antibody_label	NULL	If you wish to rename the barcode labels, list the new labels corresponding to the old labels listed in the parameter above

To demultiplex the samples and identify doublets, the first step is to obtain the barcode labels used in the analysis by running the following command:

bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 4 \
--msd T

Note: This step will produce the old_antibody_label_MULTIseqDemuxHTOcounts.csv file, which contains the names of the old HTO labels. The names of the HTO labels can be revised to be more descriptive in the execution parameters of this step (par_old_antibody_label; par_new_antibody_label)

Next, demultiplex the samples and identify doublets by running the following command:

bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 4

The resulting output files are deposited into ~/working_directory/step4. For a description of the outputs see here.