Step 3: Quality control and generation of filtered data objects
In Step 3, low quality cells are filtered based on the user-defined thresholds for:
- the number of unique transcripts (genes; nFeaturesRNA);
- the total number of transcripts (nCountsRNA);
- the percentage of mitochondrial-encoded transcripts;
- the percentage of ribosome gene transcripts.
In addition, users can remove or regress a custom gene list from the dataset. Finally, normalization and scaling is performed on the filtered Seurat objects.
The following parameters are adjustable for Step 3 (~/working_directory/job_info/parameters/step3_par.txt
):
Parameter | Default | Description |
---|---|---|
par_save_RNA | Yes | Whether or not to export an RNA expression matrix |
par_save_metadata | Yes | Whether or not to export a metadata dataframe |
par_seurat_object | NULL | If users already have a Seurat object(s), they may provide the path to a directory that contains an existing Seurat object(s) to initiate the pipeline at Step 3 |
par_nFeature_RNA_L | 300 | Only retain cells expressing a minimum number of unique RNA transcripts |
par_nFeature_RNA_U | 10000 | Only retain cells expressing a maximum number of unique RNA transcripts |
par_nCount_RNA_L | 300 | Only retain cells with a minimum number of total RNA transcripts |
par_nCount_RNA_U | 20000 | Only retain cells with a maximum number of total RNA transcripts |
par_mitochondria_percent_L | 0 | Only retain cells with a minimum percentage of mitochondrial-encoded genes |
par_mitochondria_percent_U | 20 | Only retain cells with a maximum percentage of mitochondrial-encoded genes |
par_ribosomal_percent_L | 0 | Only retain cells with a minimum percentage of ribosome genes |
par_ribosomal_percent_U | 100 | Only retain cells with a maximum percentage of ribosome genes |
par_remove_mitochondrial_genes | No | Whether or not to remove mitochondrial genes |
par_remove_ribosomal_genes | No | Whether or not to remove ribosomal genes |
par_remove_genes | NULL | If users want to remove specific genes from their data, they may define a list of gene identifiers |
par_regress_cell_cycle_genes | No | Whether or not to regress cell cycle genes |
par_regress_custom_genes | No | Whether or not to regress a custom list of genes |
par_regress_genes | NULL | List of custom genes to regress |
par_normalization.method | LogNormalize | Method to use for normalization |
par_scale.factor | 10000 | Scale factor for scaling the data |
par_selection.method | vst | Method for choosing the top variable features |
par_nfeatures | 2500 | Number of features to select as top variable features |
par_top | 10 | Number of most variable features to be reported in the csv file |
par_npcs_pca | 30 | Total Number of principal components to compute and store for principal component analysis (PCA) |
To run Step 3, use the following command:
bash $SCRNABOX_HOME/launch_scrnabox.sh \
-d ${SCRNABOX_PWD} \
--steps 3
The resulting output files are deposited into ~/working_directory/step3
. For a description of the outputs see here.