Step 3: Quality control and generation of filtered data objects

In Step 3, low quality cells are filtered based on the user-defined thresholds for:

  • the number of unique transcripts (genes; nFeaturesRNA);
  • the total number of transcripts (nCountsRNA);
  • the percentage of mitochondrial-encoded transcripts;
  • the percentage of ribosome gene transcripts.

In addition, users can remove or regress a custom gene list from the dataset. Finally, normalization and scaling is performed on the filtered Seurat objects.


The following parameters are adjustable for Step 3 (~/working_directory/job_info/parameters/step3_par.txt):

Parameter Default Description
par_save_RNA Yes Whether or not to export an RNA expression matrix
par_save_metadata Yes Whether or not to export a metadata dataframe
par_seurat_object NULL If users already have a Seurat object(s), they may provide the path to a directory that contains an existing Seurat object(s) to initiate the pipeline at Step 3
par_nFeature_RNA_L 300 Only retain cells expressing a minimum number of unique RNA transcripts
par_nFeature_RNA_U 10000 Only retain cells expressing a maximum number of unique RNA transcripts
par_nCount_RNA_L 300 Only retain cells with a minimum number of total RNA transcripts
par_nCount_RNA_U 20000 Only retain cells with a maximum number of total RNA transcripts
par_mitochondria_percent_L 0 Only retain cells with a minimum percentage of mitochondrial-encoded genes
par_mitochondria_percent_U 20 Only retain cells with a maximum percentage of mitochondrial-encoded genes
par_ribosomal_percent_L 0 Only retain cells with a minimum percentage of ribosome genes
par_ribosomal_percent_U 100 Only retain cells with a maximum percentage of ribosome genes
par_remove_mitochondrial_genes No Whether or not to remove mitochondrial genes
par_remove_ribosomal_genes No Whether or not to remove ribosomal genes
par_remove_genes NULL If users want to remove specific genes from their data, they may define a list of gene identifiers
par_regress_cell_cycle_genes No Whether or not to regress cell cycle genes
par_regress_custom_genes No Whether or not to regress a custom list of genes
par_regress_genes NULL List of custom genes to regress
par_normalization.method LogNormalize Method to use for normalization
par_scale.factor 10000 Scale factor for scaling the data
par_selection.method vst Method for choosing the top variable features
par_nfeatures 2500 Number of features to select as top variable features
par_top 10 Number of most variable features to be reported in the csv file
par_npcs_pca 30 Total Number of principal components to compute and store for principal component analysis (PCA)

To run Step 3, use the following command:

bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 3

The resulting output files are deposited into ~/working_directory/step3. For a description of the outputs see here.