We use the Cell Ranger tool to analyze the transcriptome of
individuals. Cell Ranger carries out several key tasks in processing the
paired-end reads. Firstly, it performs alignment using the reference
genome obtained from 10X Genomics, filtering, barcode counting and
unique molecular identifier (UMI) counting. This ensures accurate
mapping of reads, quality control, and quantification of gene
expression. Next, Cell Ranger utilizes the cellular barcodes obtained
through the Chromium platform to generate feature-barcode matrices. It
determines clusters based on these matrices and conducts comprehensive
gene expression analysis. This enables the identification of distinct
cell populations and the exploration of gene expression patterns. We
aggregate multiple samples using Cell Ranger’s aggr pipeline. This
pipeline combines data from multiple samples, creating an
experiment-wide feature-barcode matrix. This integrated matrix allows
for a unified analysis across all samples, facilitating comparative
analysis and providing a comprehensive view of the cellular landscape.
All of the subsequent analyses were performed on the aggregated data.
To assess the quality of single-cell RNA sequencing (scRNA-seq) data,
we first examine the sequencing statistics presented in Table 1. These
statistics include important metrics such as UMI counts, Q30 rates
(indicating the percentage of high-quality base calls), and mapping
rates to various genomic regions. These measures provide insights into
the quality and reliability of the sequencing data.
Additionally, we review the barcode rank plots for individual samples
(not shown here), which is a useful tool for evaluating the
characteristics of scRNA-seq data based on UMI distribution. By
examining the UMI counts in the plot, we can gain valuable information
about the gene expression patterns within the dataset. A steep initial
slope suggests the presence of a small number of highly expressed genes,
while a more gradual slope indicates a diverse range of gene
expressions. The knee point on the plot represents a significant change
in slope and signifies the transition from high-abundance to
low-abundance barcodes. This point is crucial in determining the
appropriate threshold for filtering out low-quality or background
barcodes from the dataset. In an ideal sample, a steep cliff followed by
a plateaued knee is expected, indicating that the cell calling algorithm
effectively distinguished intact cells from background barcodes.
By considering these sequencing statistics and analyzing the
barcode rank plot, we can thoroughly evaluate the quality and
characteristics of the scRNA-seq data, which is essential for subsequent
analyses and interpretations.
Table 1 Sequencing and Mapping Stats
Estimated Number of Cells | 14,995 |
Aggregation
Pre-Normalization Total Number of Reads | 401,690,202 |
Post-Normalization Total Number of Reads | 389,670,221 |
Pre-Normalization Mean Reads per Cell | 26,788 |
Post-Normalization Mean Reads per Cell | 25,987 |
Fraction of Reads Kept (Sample-01) | 100% |
Fraction of Reads Kept (Sample-02) | 94.4% |
Pre-Normalization Total Reads per Cell (Sample-01) | 25,790 |
Pre-Normalization Total Reads per Cell (Sample-02) | 27,709 |
Pre-Normalization Confidently Mapped Barcoded Reads per Cell (Sampe-01) | 14,874 |
Pre-Normalization Confidently Mapped Barcoded Reads per Cell (Sample-02) | 15,750 |
Cells
Estimated Number of Cells | 14,995 |
Fraction Reads in Cells | 92.4% |
Median Genes per Cell | 1,587 |
Median UMI Counts per Cell | 4,292 |
Chemistry Batch Correction
Batch Effect Score Before Correction | 1.35863 |
Batch Effect Score After Correction | 1.653769 |
Next, we calculated the percentage of mitochondrial genes (%mito)
using the Seurat tool. Mitochondrial genes code for proteins involved in
cellular respiration, and an elevated percentage of mitochondrial gene
expression can indicate various issues, including cell stress, cell
damage, or technical artifacts during library preparation. Therefore, a
higher %mito may suggest lower data quality or the presence of
compromised cells. Conversely, a lower %mito indicates better data
quality, as it suggests a higher proportion of transcripts originating
from nuclear genes involved in cellular functions other than energy
production.
Figure 1 Violin plots with dots representing the number of genes identified per cell, the number of UMIs, and the percent of reads that are indicated to be mitochondrial in origin.
After conducting thorough quality checks on the data, we perform two
common types of dimensionality reduction using Seurat R-package, namely,
t-distributed stochastic neighbor embedding (t-SNE) and Uniform Manifold
Approximation and Projection (UMAP). As shown in Figure 3, both t-SNE
and UMAP generate plots where each data point represents an individual
cell, and cells with similar gene expression profiles located closer to
each other.
Afterward, we employ our algorithm developed with the
assistance of the CellMarker2.0 database to identify gene markers
associated with your gene of interest. Using the Seurat tool, we
identify the corresponding clusters. This approach enables the discovery
of genes that are specifically expressed within cell types or
subpopulations, providing valuable insights into the biological
characteristics and functions of these cells.
Figure 2 Dimensionality Reduction using UMPA (top) and tSNE (bottom)
Post cell-type assignment, Differential Expression (DE) analysis was
performed using Seurat for each cell type. To ensure the reliability of
the results, only genes meeting specific criteria were included for
further analysis. Specifically, genes were retained if they exhibited an
absolute log fold change of 0.2 or greater and were detected in a
minimum of 10% of cells in either population. By applying these rigorous
criteria, a subset of genes was selected for downstream analysis. In
Figure 4 below, it shows Down and Up regulated genes for each cell type.
Figure 3 Up-Regulated (Left) and Down-Regulated (Right), Differentially Expressed Genes
Figure 4 Expression value for top N genes in each cell-type
Figure 5 Enrichment Analysis using KEGG Pathways
Single-cell Pseudotime and trajectory analysis are computational
methods used to infer the developmental progression and lineage
relationships of individual cells within a population. These approaches
leverage single-cell transcriptomic data to unravel the temporal
ordering and spatial relationships of cells, enabling the reconstruction
of developmental trajectories and the identification of key regulatory
events during cellular development. Pseudotime analysis orders cells
along a hypothetical trajectory, representing their progression from an
undifferentiated state to a mature cell type. The left plot in Figure 6
below displays the Pseudotime analysis performed using R-package
Slingshot. Trajectory analysis maps the branching patterns and
interconnections between different cell lineages, providing insights
into cellular differentiation and dynamic processes such as cell fate
decisions and cell state transitions. We performed the Trajectory
analysis using Monocle2 packages which is shown in right plot in Figure
6 below.
Figure 6 Pseudotime (Left) and Trajectory (Right) Analysis
Overall, the analysis was successfully completed. All supporting documents including the raw data have been transferred to you, which we presume will assist greatly with any further validations and pursuit of key research answers.
Table 2: The List of software used in the analysis pipeline.
Software | Version |
---|---|
Cellranger | 7.1.10 |
R-Seurat | 3.2.2 |
R- clusterProfiler | 3.18.1 |
R-slingshot | 1.8.0 |
Reference Genome and annotation | Mm10 |
R | 4.2.0 |
Table 3: Lists of important files from the data analysis
Path | Description |
---|---|
01.CellRange | Cellranger output files |
02.QC-Analysis | QC plots and tables |
03.HeatMap-Analysis | Heatmap analysis of top10 genes |
04.Clustering-Analysis | Plots and Table for Clustering Analysis performed using tSNE and Umap |
05.DE-Analysis | Plots and table related to Differential expression analysis including KEGG |
05.DE-Analysis /*-DE-genes.csv | Differentially expressed genes per cell type |
Address: 126 Corporate Boulevard, South Plainfield, New Jersey 07080
Email: custom-services@admerahealth.com
Phone:
908-222-0533