Step 3: Quality control and generation of filtered data objects
In Step 3, low quality cells are filtered based on the user-defined thresholds for:
- the number of unique transcripts (genes; nFeaturesRNA);
- the total number of transcripts (nCountsRNA);
- the percentage of mitochondrial-encoded transcripts;
- the percentage of ribosome gene transcripts.
In addition, users can  remove or regress a custom gene list from the dataset. Finally, normalization and scaling is performed on the filtered Seurat objects. 
The following parameters are adjustable for Step 3 (~/working_directory/job_info/parameters/step3_par.txt):
| Parameter | Default | Description | 
|---|---|---|
| par_save_RNA | Yes | Whether or not to export an RNA expression matrix | 
| par_save_metadata | Yes | Whether or not to export a metadata dataframe | 
| par_seurat_object | NULL | If users already have a Seurat object(s), they may provide the path to a directory that contains an existing Seurat object(s) to initiate the pipeline at Step 3 | 
| par_nFeature_RNA_L | 300 | Only retain cells expressing a minimum number of unique RNA transcripts | 
| par_nFeature_RNA_U | 10000 | Only retain cells expressing a maximum number of unique RNA transcripts | 
| par_nCount_RNA_L | 300 | Only retain cells with a minimum number of total RNA transcripts | 
| par_nCount_RNA_U | 20000 | Only retain cells with a maximum number of total RNA transcripts | 
| par_mitochondria_percent_L | 0 | Only retain cells with a minimum percentage of mitochondrial-encoded genes | 
| par_mitochondria_percent_U | 20 | Only retain cells with a maximum percentage of mitochondrial-encoded genes | 
| par_ribosomal_percent_L | 0 | Only retain cells with a minimum percentage of ribosome genes | 
| par_ribosomal_percent_U | 100 | Only retain cells with a maximum percentage of ribosome genes | 
| par_remove_mitochondrial_genes | No | Whether or not to remove mitochondrial genes | 
| par_remove_ribosomal_genes | No | Whether or not to remove ribosomal genes | 
| par_remove_genes | NULL | If users want to remove specific genes from their data, they may define a list of gene identifiers | 
| par_regress_cell_cycle_genes | No | Whether or not to regress cell cycle genes | 
| par_regress_custom_genes | No | Whether or not to regress a custom list of genes | 
| par_regress_genes | NULL | List of custom genes to regress | 
| par_normalization.method | LogNormalize | Method to use for normalization | 
| par_scale.factor | 10000 | Scale factor for scaling the data | 
| par_selection.method | vst | Method for choosing the top variable features | 
| par_nfeatures | 2500 | Number of features to select as top variable features | 
| par_top | 10 | Number of most variable features to be reported in the csv file | 
| par_npcs_pca | 30 | Total Number of principal components to compute and store for principal component analysis (PCA) | 
To run Step 3, use the following command:
bash $SCRNABOX_HOME/launch_scrnabox.sh -d ${SCRNABOX_PWD} --steps 3
The resulting output files are deposited into ~/working_directory/step3. For a description of the outputs see here.