Step 2: Create Seurat object and remove ambient RNA
In Step 2, the CellRanger outputs generated in Step 1 (expression matrix, features, and barcodes) are used to create a Seurat object for each sample. The ambient RNA quantity is estimated and there is an option to correct gene expression profiles for RNA contamination using SoupX (Young et al. 2020). Then, CellRanger (if not removing ambient RNA) or SoupX (if removing ambient RNA) feature-barcode expression matrices are transformed into Seurat objects. Quality control measures are then computed to inform filtering in Step 3, including:
- the number of unique transcripts (genes; nFeaturesRNA);
- the total number of transcripts (nCountsRNA);
- the percentage of mitochondrial-encoded transcripts;
- the percentage of ribosome gene transcripts.
Normalization and scaling is then performed on the individual Seurat objects prior to cell-cycle scoring.
The following parameters are adjustable for Step 2 (~/working_directory/job_info/parameters/step2_par.txt
):
Parameter | Default | Description |
---|---|---|
par_save_RNA | Yes | Whether or not to export an RNA expression matrix |
par_save_metadata | Yes | Whether or not to export a metadata dataframe |
par_ambient_RNA | Yes | Whether or not to correct the feature-barcode expression matrices for ambient RNA contamination |
par_min.cells_L | 3 | Only retain genes expressed in a minimum number of cells |
par_normalization.method | LogNormalize | Method to use for normalization |
par_scale.factor | 10000 | Scale factor for scaling the data |
par_selection.method | vst | Method for choosing the top variable features |
par_nfeatures | 2500 | Number of features to select as top variable features |
To run Step 2, use the following command:
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 2
The resulting output files are deposited into ~/working_directory/step2
. For a description of the outputs see here.