Manual CellRanger Library Preparation
Introduction
Step 1 (FASTQ to gene expression matrix) of the scRNAbox pipeline leverages the CellRanger counts pipeline to generate gene expression matrices from FASTQ files. In order to run the CellRanger counts pipeline, libraries must be generated to define the information of the FASTQ files. For the Standard analysis track, a library.csv
file must be generated for each sample. For the HTO analysis track, a library.csv
and feature_ref.csv
file must be generated for each sequencing run. Athough scRNAbox provides an option to automate the library preparation process, it is important that users understand the information that is required for these files.
In this tutorial we demonstrate how users can manually prepare the library.csv
and feature_ref.csv
files. The information presented in this tutorial can also be used to inform the parameters that must be defined by the user for automated library preparation.
Standard analysis track
library.csv
The library.csv
file defines the necessary information of the FASTQ files for the experiment, including the gene expression and antibody assays. The structure of the library.csv
file should be:
fastqs,sample,library_type
path/to/fastqs/directory/,SampleName,Gene Expression
- The
fastqs
column defines the path to the directory that contains the FASTQ files for the experiment.
- The
sample
column defines the sample name of the corresponding FASTQ file. Please note that FASTQ files must be named according to standard CellRanger nomenclature (e.g. CTRL1_S1_L001_R1_001.fastq). For more information, please visit CellRanger's documentation. - The
library_type
column defines the assay type. For the Standard Analysis Track, thelibrary_type
will be "Gene Expression".
Example: manual library preparation
For the Standard Analysis Track, if the experiment comprises four samples (two case and two controls), the following steps should be taken for manual library preparation:
1) Navigate to the working directory and create a samples_info
folder:
cd ~/working_directory
mkdir samples_info
2) Navigate to the samples_info
folder and create a folder for each sample run:
cd samples_info
mkdir case1
mkdir case2
mkdir control1
mkdir control2
3) Navigate to the folder for each sample and create the library.csv
file.
After performing steps 1-3 above, the structure of the samples_info folder (~working_directory/samples_info
) for an experiment with four samples should be:
working_directory
└── samples_info
├── case1
│ └──library.csv
├── case2
│ └── library.csv
├── control1
│ └── library.csv
└── control2
└── library.csv
HTO analysis track
library.csv
The library.csv
file defines the necessary information of the FASTQ files for the experiment, including the gene expression and antibody assays. The structure of the library.csv
file should be:
fastqs,sample,library_type
path/to/fastqs/directory/,SampleNameGEX,Gene Expression
path/to/fastqs/directory/,SampleNameHTO,Antibody Capture
- The
fastqs
column defines the path to the directory that contains the FASTQ files for the experiment.
- The
sample
column defines the sample name of the corresponding FASTQ file. Please note that FASTQ files must be named according to standard CellRanger nomenclature (e.g. CTRL1_S1_L001_R1_001.fastq). For more information please visit CellRanger's documentation. - The
library_type
column defines the assay type. For the Cell Hashtag Analysis track, each sequencing run should have a "Gene Expression" and "Antibody Capture" assay. For more information, please visit CellRanger's documentation
feature_ref.csv
The feature_ref.csv
file defines the necessary information for processing the sample-specific barcodes that will eventually be used to demultiplex the pooled samples. For example, if there are four samples pooled together with four unique barcode identifiers, the structure of the feature_ref.csv
file should be:
id,name,read,pattern,sequence,feature_type
Hash1,B0251_TotalSeqB,R2,5PNNNNNNNNNN(BC),GTCAACTCTTTAGCG,Antibody Capture
Hash2,B0252_TotalSeqB,R2,5PNNNNNNNNNN(BC),TGATGGCCTATTGGG,Antibody Capture
Hash3,B0253_TotalSeqB,R2,5PNNNNNNNNNN(BC),TTCCGCCTCTCTTTG,Antibody Capture
Hash4,B0254_TotalSeqB,R2,5PNNNNNNNNNN(BC),AGTAAGTTCAGCGTA,Antibody Capture
- The
id
column defines the barcode ID which will be used to track the feature counts.
- The
name
column defines the arbitrary name for the barcode identifier.
- The
read
column defines which RNA sequencing read contains the barcode sequence. This value Will be either R1 or R2.
- The
pattern
column defines the pattern of the barcode identifiers. For more information please visit the 10X Genomics documentation
- The
sequence
column defines nucleotide sequence associated with the barcode identifier.
- The
feature_type
column defines the type of feature used for sample identification. Please ensure that the feature_type in thefeature_ref.csv
file matches a library_type in thelibrary.csv
file.
For more information regarding the preparation of the feature_ref.csv
, please see CellRanger's documentation.
Example: manual library preparation
For the Cell Hashtag Analysis Track, if the experiment comprises four sequencing, the following steps should be taken for manual library preparation:
1) Navigate to the working directory and create a samples_info
folder:
cd ~/working_directory
mkdir samples_info
2) Navigate to the samples_info
folder and create a folder for each sequencing run:
cd samples_info
mkdir run1
mkdir run2
mkdir run3
mkdir run4
3) Navigate to the folder for each sequencing and create the library.csv
file.
4) Navigate to the folder for each sequencing and create the feature_ref.csv
file.
After performing steps 1-4 above, the structure of the samples_info folder (~working_directory/samples_info
) for an experiment with four sequencing runs should be:
working_directory
├── samples_info
├── run1
│ ├── library.csv
│ └── feature_ref.csv
├── run2
│ ├── library.csv
│ └── feature_ref.csv
├── run3
│ ├── library.csv
│ └── feature_ref.csv
└── run4
├── library.csv
└── feature_ref.csv