Downloading the PBMC dataset

The scRNAseq data produced by Stoeckius et al. is publicly available in the Gene Expression Omnibus with accession code GSE108313. To download the data, we must first install SRAtoolkit (if this is not already installed on your High-Performance Computing (HPC) system). We will create a directory for our raw data and download SRAtoolkit with the following code:

mkdir data_download
cd data_download
wget --output-document sratoolkit.tar.gz https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz
tar -vxzf sratoolkit.tar.gz
export PATH=$PATH:$PWD/sratoolkit.3.0.5-ubuntu64/bin

For more information regarding the SRAtoolkit, please visit the documentation.

The Sequence Read Archive (SRA) run identifiers for the RNA and antibody assays are:

Assay SRR
RNA SRR8281306
Antibody SRR8281307

To download the FASTQ files for the RNA and antibody assays, run the following code. Please note that this may take a very long time.

export PATH=$PATH:$PWD/sratoolkit.3.0.5-ubuntu64/bin
module load StdEnv/2020 gcc/9.3.0
module load sra-toolkit/3.0.0 

#RNA
prefetch SRR8281306 --max-size 100GB
fasterq-dump SRR8281306

#Antibody
prefetch SRR8281307 --max-size 100GB
fasterq-dump SRR8281307 

If the FASTQ files for the RNA and antibody assays have been downloaded properly, the data_download folder should contain the following:

data_download
├── SRR8281306
│   └── SRR8281306.sra
├── SRR8281306_1.fastq
├── SRR8281306_2.fastq
├── SRR8281307
│   └── SRR8281307.sra
├── SRR8281307_1.fastq
└── SRR8281307_2.fastq

Next, we will rename the FASTQ files according to the CellRanger nomenclature and transfer the FASTQ files to a folder named fastqs. For more information regarding the nomeclature required by the CellRanger counts pipeline, please visit CellRanger's documentation.

Note: The fastqs folder should only contain FASTQ files for the experiment.

mkdir fastqs

# RNA assay
cp ~/data_download/SRR8281306_1.fastq  ~/fastqs/run1GEX_S1_L001_R1_001.fastq
cp ~/data_download/SRR8281306_2.fastq  ~/fastqs/run1GEX_S1_L001_R2_001.fastq

# HTO assay
cp ~/data_download/SRR8281307_1.fastq ~/fastqs/run1HTO_S1_L001_R1_001.fastq
cp ~/data_download/SRR8281307_2.fastq ~/fastqs/run1HTO_S1_L001_R2_001.fastq

If the above steps were conducted properly, the fastqs folder should contain the following files:

fastqs
├── run1GEX_S1_L001_R1_001.fastq
├── run1GEX_S1_L001_R2_001.fastq
├── run1HTO_S1_L001_R1_001.fastq
└── run1HTO_S1_L001_R2_001.fastq