Step 3: Genetic demultiplexing by constituent demultiplexing tools
In Step 3, we will demultiplex the pooled samples with each of Ensemblex's constituent genetic demultiplexing tools. The constituent genetic demultiplexing tools will vary depending on the version of the Ensemblex pipeline being used:
NOTE: The analytical parameters for each constiuent tool can be adjusted using the the ensemblex_config.ini
file located in ~/working_directory/job_info/configs
. For a comprehensive description of how to adjust the analytical parameters of the Ensemblex pipeline please see Execution parameters.
Demultiplexing with prior genotype information
When demultiplexing with prior genotype information, Ensemblex leverages the sample labels from
Demuxalot
To run Demuxalot use the following code:
ensemblex_HOME=/path/to/ensemblex.pip
ensemblex_PWD=/path/to/working_directory
bash $ensemblex_HOME/launch_ensemblex.sh -d $ensemblex_PWD --step demuxalot
If Demuxalot completed successfully, the following files should be available in ~/working_directory/demuxalot
working_directory
└── demuxalot
├── Demuxalot_result.csv
└── new_snps_single_file.betas
Demuxlet
To run Demuxlet use the following code:
ensemblex_HOME=/path/to/ensemblex.pip
ensemblex_PWD=/path/to/working_directory
bash $ensemblex_HOME/launch_ensemblex.sh -d $ensemblex_PWD --step demuxlet
If Demuxlet completed successfully, the following files should be available in ~/working_directory/demuxlet
working_directory
└── demuxlet
├── outs.best
├── pileup.cel.gz
├── pileup.plp.gz
├── pileup.umi.gz
└── pileup.var.gz
Souporcell
To run Souporcell use the following code:
ensemblex_HOME=/path/to/ensemblex.pip
ensemblex_PWD=/path/to/working_directory
bash $ensemblex_HOME/launch_ensemblex.sh -d $ensemblex_PWD --step souporcell
If Souporcell completed successfully, the following files should be available in ~/working_directory/souporcell
working_directory
└── souporcell
├── alt.mtx
├── cluster_genotypes.vcf
├── clusters_tmp.tsv
├── clusters.tsv
├── fq.fq
├── minimap.sam
├── minitagged.bam
├── minitagged_sorted.bam
├── minitagged_sorted.bam.bai
├── Pool.vcf
├── ref.mtx
└── soup.txt
Vireo-GT
To run Vireo-GT use the following code:
ensemblex_HOME=/path/to/ensemblex.pip
ensemblex_PWD=/path/to/working_directory
bash $ensemblex_HOME/launch_ensemblex.sh -d $ensemblex_PWD --step vireo
If Vireo-GT completed successfully, the following files should be available in ~/working_directory/vireo_gt
working_directory
└── vireo_gt
├── cellSNP.base.vcf.gz
├── cellSNP.cells.vcf.gz
├── cellSNP.samples.tsv
├── cellSNP.tag.AD.mtx
├── cellSNP.tag.DP.mtx
├── cellSNP.tag.OTH.mtx
├── donor_ids.tsv
├── fig_GT_distance_estimated.pdf
├── fig_GT_distance_input.pdf
├── GT_donors.vireo.vcf.gz
├── _log.txt
├── prob_doublet.tsv.gz
├── prob_singlet.tsv.gz
└── summary.tsv
Upon demultiplexing the pooled samples with each of Ensemblex's constituent genetic demultiplexing tools, we can proceed to Step 4 where we will process the output files of the consituent tools with the Ensemblex algorithm to generate the ensemble sample classifications: Application of Ensemblex
Demultiplexing without prior genotype information
When demultiplexing without prior genotype information, Ensemblex leverages the sample labels from
Freemuxlet
To run Freemuxlet use the following code:
ensemblex_HOME=/path/to/ensemblex.pip
ensemblex_PWD=/path/to/working_directory
bash $ensemblex_HOME/launch_ensemblex.sh -d $ensemblex_PWD --step freemuxlet
If Freemuxlet completed successfully, the following files should be available in ~/working_directory/freemuxlet
working_directory
└── freemuxlet
├── outs.clust1.samples.gz
├── outs.clust1.vcf
├── outs.lmix
├── pileup.cel.gz
├── pileup.plp.gz
├── pileup.umi.gz
└── pileup.var.gz
Souporcell
To run Souporcell use the following code:
ensemblex_HOME=/path/to/ensemblex.pip
ensemblex_PWD=/path/to/working_directory
bash $ensemblex_HOME/launch_ensemblex.sh -d $ensemblex_PWD --step souporcell
If Souporcell completed successfully, the following files should be available in ~/working_directory/souporcell
working_directory
└── souporcell
├── alt.mtx
├── cluster_genotypes.vcf
├── clusters_tmp.tsv
├── clusters.tsv
├── fq.fq
├── minimap.sam
├── minitagged.bam
├── minitagged_sorted.bam
├── minitagged_sorted.bam.bai
├── Pool.vcf
├── ref.mtx
└── soup.txt
Vireo
To run Vireo use the following code:
ensemblex_HOME=/path/to/ensemblex.pip
ensemblex_PWD=/path/to/working_directory
bash $ensemblex_HOME/launch_ensemblex.sh -d $ensemblex_PWD --step vireo
If Vireo completed successfully, the following files should be available in ~/working_directory/vireo
working_directory
└── vireo
├── cellSNP.base.vcf.gz
├── cellSNP.cells.vcf.gz
├── cellSNP.samples.tsv
├── cellSNP.tag.AD.mtx
├── cellSNP.tag.DP.mtx
├── cellSNP.tag.OTH.mtx
├── donor_ids.tsv
├── fig_GT_distance_estimated.pdf
├── GT_donors.vireo.vcf.gz
├── _log.txt
├── prob_doublet.tsv.gz
├── prob_singlet.tsv.gz
└── summary.tsv
Demuxalot
NOTE: Because the Demuxalot algorithm requires prior genotype information, the Ensemblex pipeline uses the predicted vcf file generated by Freemuxlet as input into Demuxalot when prior genotype information is not available. Therefore, it is important to wait for Freemuxlet to complete before running Demuxalot. To check if the required Freemuxlet-generated vcf file is available prior to running Demuxalot, you can use the following code:
if test -f /path/to/working_directory/freemuxlet/outs.clust1.vcf; then
echo "File exists."
fi
Upon confirming that the required Freemuxlet-generated file exists, we can run Demuxalot using the following code:
ensemblex_HOME=/path/to/ensemblex.pip
ensemblex_PWD=/path/to/working_directory
bash $ensemblex_HOME/launch_ensemblex.sh -d $ensemblex_PWD --step demuxalot
If Demuxalot completed successfully, the following files should be available in ~/working_directory/demuxalot
working_directory
└── demuxalot
├── Demuxalot_result.csv
└── new_snps_single_file.betas
Upon demultiplexing the pooled samples with each of Ensemblex's constituent genetic demultiplexing tools, we can proceed to Step 4 where we will process the output files of the consituent tools with the Ensemblex algorithm to generate the ensemble sample classifications: Application of Ensemblex