Downloading the midbrain dataset
The single-nucleus RNA sequencing (snRNAseq) data produced by Smajic et al. is publicly available in the Gene Expression Omnibus with accession code GSE157783. To download the data, we must first install SRAtoolkit (if this is not already installed on your High-Performance Computing (HPC) system). We will create a directory for our raw data and download SRAtoolkit with the following code:
mkdir data_download
cd data_download
wget --output-document sratoolkit.tar.gz https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz
tar -vxzf sratoolkit.tar.gz
export PATH=$PATH:$PWD/sratoolkit.3.0.5-ubuntu64/bin
For more information regarding the SRAtoolkit, please visit the documentation.
The Sequence Read Archive (SRA) run identifiers for each of the 11 samples in the midbrain dataset are:
Sample | SRR |
---|---|
PD1 | SRR12621862 |
PD2 | SRR12621863 |
PD3 | SRR12621864 |
PD4 | SRR12621865 |
PD5 | SRR12621866 |
CTRL1 | SRR12621867 |
CTRL2 | SRR12621868 |
CTRL3 | SRR12621869 |
CTRL4 | SRR12621870 |
CTRL5 | SRR12621871 |
CTRL6 | SRR12621872 |
Note: If you simply want to test scRNAbox, it may be best to only incorportate a subset of samples in a test run, as using all 11 samples will take substantially longer. In this case, we suggest including at least three PD samples and three controls to facilitate differential gene expression (DGE) contrasts in Step 8.
To download the FASTQ files for all 11 samples, run the following code. Please note that this may take a very long time.
export PATH=$PATH:$PWD/sratoolkit.3.0.5-ubuntu64/bin
module load StdEnv/2020 gcc/9.3.0
module load sra-toolkit/3.0.0
#PD1
prefetch SRR12621862
fasterq-dump SRR12621862
#PD2
prefetch SRR12621863
fasterq-dump SRR12621863
#PD3
prefetch SRR12621864
fasterq-dump SRR12621864
#PD4
prefetch SRR12621865
fasterq-dump SRR12621865
#PD5
prefetch SRR12621866
fasterq-dump SRR12621866
#CTRL1
prefetch SRR12621867
fasterq-dump SRR12621867
#CTRL2
prefetch SRR12621868
fasterq-dump SRR12621868
#CTRL3
prefetch SRR12621869
fasterq-dump SRR12621869
#CTRL4
prefetch SRR12621870
fasterq-dump SRR12621870
#CTRL5
prefetch SRR12621871
fasterq-dump SRR12621871
#CTRL6
prefetch SRR12621872
fasterq-dump SRR12621872
If the FASTQ files for all 11 samples have been downloaded properly, the data_download
folder should contain the following:
data_download
├── SRR12621862
│ └── SRR12621862.sra
├── SRR12621862_1.fastq
├── SRR12621862_2.fastq
├── SRR12621863
│ └── SRR12621863.sra
├── SRR12621863_1.fastq
├── SRR12621863_2.fastq
├── SRR12621864
│ └── SRR12621864.sra
├── SRR12621864_1.fastq
├── SRR12621864_2.fastq
├── SRR12621865
│ └── SRR12621865.sra
├── SRR12621865_1.fastq
├── SRR12621865_2.fastq
├── SRR12621866
│ └── SRR12621866.sra
├── SRR12621866_1.fastq
├── SRR12621866_2.fastq
├── SRR12621867
│ └── SRR12621867.sra
├── SRR12621867_1.fastq
├── SRR12621867_2.fastq
├── SRR12621868
│ └── SRR12621868.sra
├── SRR12621868_1.fastq
├── SRR12621868_2.fastq
├── SRR12621869
│ └── SRR12621869.sra
├── SRR12621869_1.fastq
├── SRR12621869_2.fastq
├── SRR12621870
│ └── SRR12621870.sra
├── SRR12621870_1.fastq
├── SRR12621870_2.fastq
├── SRR12621871
│ └── SRR12621871.sra
├── SRR12621871_1.fastq
├── SRR12621871_2.fastq
├── SRR12621872
│ └── SRR12621872.sra
├── SRR12621872_1.fastq
└── SRR12621872_2.fastq
Next, we will rename the FASTQ files according to the CellRanger nomenclature and transfer the FASTQ files to a folder named fastqs
. For more information regarding the nomeclature required by the CellRanger counts pipeline, please visit CellRanger's documentation.
Note: The fastqs
folder should only contain FASTQ files for the experiment.
mkdir fastqs
#PD1
cp ~/data_download/SRR12621862_1.fastq ~/fastqs/PD1_S1_L001_R1_001.fastq
cp ~/data_download/SRR12621862_2.fastq ~/fastqs/PD1_S1_L001_R2_001.fastq
#PD2
cp ~/data_download/SRR12621863_1.fastq ~/fastqs/PD2_S1_L001_R1_001.fastq
cp ~/data_download/SRR12621863_2.fastq ~/fastqs/PD2_S1_L001_R2_001.fastq
#PD3
cp ~/data_download/SRR12621864_1.fastq ~/fastqs/PD3_S1_L001_R1_001.fastq
cp ~/data_download/SRR12621864_2.fastq ~/fastqs/PD3_S1_L001_R2_001.fastq
#PD4
cp ~/data_download/SRR12621865_1.fastq ~/fastqs/PD4_S1_L001_R1_001.fastq
cp ~/data_download/SRR12621865_2.fastq ~/fastqs/PD4_S1_L001_R2_001.fastq
#PD5
cp ~/data_download/SRR12621866_1.fastq ~/fastqs/PD5_S1_L001_R1_001.fastq
cp ~/data_download/SRR12621866_2.fastq ~/fastqs/PD5_S1_L001_R2_001.fastq
#Ctrl1
cp ~/data_download/SRR12621867_1.fastq ~/fastqs/CTRL1_S1_L001_R1_001.fastq
cp ~/data_download/SRR12621867_2.fastq ~/fastqs/CTRL1_S1_L001_R2_001.fastq
#Ctrl2
cp ~/data_download/SRR12621868_1.fastq ~/fastqs/CTRL2_S1_L001_R1_001.fastq
cp ~/data_download/SRR12621868_2.fastq ~/fastqs/CTRL2_S1_L001_R2_001.fastq
#Ctrl3
cp ~/data_download/SRR12621869_1.fastq ~/fastqs/CTRL3_S1_L001_R1_001.fastq
cp ~/data_download/SRR12621869_2.fastq ~/fastqs/CTRL3_S1_L001_R2_001.fastq
#Ctrl4
cp ~/data_download/SRR12621870_1.fastq ~/fastqs/CTRL4_S1_L001_R1_001.fastq
cp ~/data_download/SRR12621870_2.fastq ~/fastqs/CTRL4_S1_L001_R2_001.fastq
#Ctrl5
cp ~/data_download/SRR12621871_1.fastq ~/fastqs/CTRL5_S1_L001_R1_001.fastq
cp ~/data_download/SRR12621871_2.fastq ~/fastqs/CTRL5_S1_L001_R2_001.fastq
#Ctrl6
cp ~/data_download/SRR12621872_1.fastq ~/fastqs/CTRL6_S1_L001_R1_001.fastq
cp ~/data_download/SRR12621872_2.fastq ~/fastqs/CTRL6_S1_L001_R2_001.fastq
If the above steps were conducted properly, the fastqs
folder should contain the following files:
├── CTRL1_S1_L001_R1_001.fastq
├── CTRL1_S1_L001_R2_001.fastq
├── CTRL2_S1_L001_R1_001.fastq
├── CTRL2_S1_L001_R2_001.fastq
├── CTRL3_S1_L001_R1_001.fastq
├── CTRL3_S1_L001_R2_001.fastq
├── CTRL4_S1_L001_R1_001.fastq
├── CTRL4_S1_L001_R2_001.fastq
├── CTRL5_S1_L001_R1_001.fastq
├── CTRL5_S1_L001_R2_001.fastq
├── CTRL6_S1_L001_R1_001.fastq
├── CTRL6_S1_L001_R2_001.fastq
├── PD1_S1_L001_R1_001.fastq
├── PD1_S1_L001_R2_001.fastq
├── PD2_S1_L001_R1_001.fastq
├── PD2_S1_L001_R2_001.fastq
├── PD3_S1_L001_R1_001.fastq
├── PD3_S1_L001_R2_001.fastq
├── PD4_S1_L001_R1_001.fastq
├── PD4_S1_L001_R2_001.fastq
├── PD5_S1_L001_R1_001.fastq
└── PD5_S1_L001_R2_001.fastq