zhanghao-njmu/SCP

An end-to-end Single-Cell Pipeline designed to facilitate comprehensive analysis and exploration of single-cell data.

RPythonC++scrna-seqshiny-appssingle-cellshinyappssingle-cell-genomicssingle-cell-rna-seqsingle-cell-analysissingle-cell-atac-seqseuratsingle-cell-omicsscrna-seq-analysissingle-cell-sequencingshiny-appscanpyscatac-seqanndatascrna-pipelineseurat-objectssingle-cell-pipeline
This is stars and forks stats for /zhanghao-njmu/SCP repository. As of 29 Apr, 2024 this repository has 201 stars and 40 forks.

SCP: Single-Cell Pipeline SCP provides a comprehensive set of tools for single-cell data processing and downstream analysis. The package includes the following facilities: Integrated single-cell quality control methods. Pipelines embedded with multiple methods for normalization, feature reduction, and cell population identification (standard Seurat workflow). Pipelines embedded with multiple integration methods for scRNA-seq or scATAC-seq data, including Uncorrected, Seurat, scVI, MNN, fastMNN, Harmony, Scanorama, BBKNN, CSS, LIGER, Conos, ComBat. Multiple single-cell downstream analyses such as identification of differential features, enrichment analysis, GSEA analysis, identification of dynamic features, PAGA, RNA velocity, Palantir, Monocle2, Monocle3, etc. Multiple methods for automatic annotation of single-cell data and methods for projection between single-cell datasets. High-quality data visualization methods. Fast deployment of single-cell data into SCExplorer, a shiny app that provides an interactive visualization interface. The functions in the SCP package are all developed around the Seurat object and are compatible with other Seurat functions. R version requirement R >= 4.1.0 Installation in the global R environment You can install the latest version of SCP from GitHub with: if (!require("devtools", quietly = TRUE)) { install.packages("devtools") } devtools::install_github("zhanghao-njmu/SCP") Create SCP python environment To run functions such as RunPAGA or RunSCVELO, SCP requires conda to create a separate python environment. The default environment name is "SCP_env". You can specify the environment name for SCP by setting options(SCP_env_name="new_name") Now, you can run PrepareEnv() to create the python environment for SCP. If the conda binary is not found, it will automatically download and install miniconda. SCP::PrepareEnv() To force SCP to use a specific conda binary, it is recommended to set reticulate.conda_binary R option: options(reticulate.conda_binary = "/path/to/conda") SCP::PrepareEnv() If the download of miniconda or pip packages is slow, you can specify the miniconda repo and PyPI mirror according to your network region. SCP::PrepareEnv( miniconda_repo = "https://mirrors.bfsu.edu.cn/anaconda/miniconda", pip_options = "-i https://pypi.tuna.tsinghua.edu.cn/simple" ) Available miniconda repositories: https://repo.anaconda.com/miniconda (default) http://mirrors.aliyun.com/anaconda/miniconda https://mirrors.bfsu.edu.cn/anaconda/miniconda https://mirrors.pku.edu.cn/anaconda/miniconda https://mirror.nju.edu.cn/anaconda/miniconda https://mirrors.sustech.edu.cn/anaconda/miniconda https://mirrors.xjtu.edu.cn/anaconda/miniconda https://mirrors.hit.edu.cn/anaconda/miniconda Available PyPI mirrors: https://pypi.python.org/simple (default) https://mirrors.aliyun.com/pypi/simple https://pypi.tuna.tsinghua.edu.cn/simple https://mirrors.pku.edu.cn/pypi/simple https://mirror.nju.edu.cn/pypi/web/simple https://mirrors.sustech.edu.cn/pypi/simple https://mirrors.xjtu.edu.cn/pypi/simple https://mirrors.hit.edu.cn/pypi/web/simple Installation in an isolated R environment using renv If you do not want to change your current R environment or require reproducibility, you can use the renv package to install SCP into an isolated R environment. Create an isolated R environment if (!require("renv", quietly = TRUE)) { install.packages("renv") } dir.create("~/SCP_env", recursive = TRUE) # It cannot be the home directory "~" ! renv::init(project = "~/SCP_env", bare = TRUE, restart = TRUE) Option 1: Install SCP from GitHub and create SCP python environment renv::activate(project = "~/SCP_env") renv::install("BiocManager") renv::install("zhanghao-njmu/SCP", repos = BiocManager::repositories()) SCP::PrepareEnv() Option 2: If SCP is already installed in the global environment, copy SCP from the local library renv::activate(project = "~/SCP_env") renv::hydrate("SCP") SCP::PrepareEnv() Activate SCP environment first before use renv::activate(project = "~/SCP_env") library(SCP) data("pancreas_sub") pancreas_sub <- RunPAGA(srt = pancreas_sub, group_by = "SubCellType", linear_reduction = "PCA", nonlinear_reduction = "UMAP") CellDimPlot(pancreas_sub, group.by = "SubCellType", reduction = "draw_graph_fr") Save and restore the state of SCP environment renv::snapshot(project = "~/SCP_env") renv::restore(project = "~/SCP_env") Quick Start Data exploration CellQC Standard pipeline Integration pipeline Cell projection between single-cell datasets Cell annotation using bulk RNA-seq datasets Cell annotation using single-cell datasets PAGA analysis Velocity analysis Differential expression analysis Enrichment analysis(over-representation) Enrichment analysis(GSEA) Trajectory inference Dynamic features Interactive data visualization with SCExplorer Other visualization examples Data exploration The analysis is based on a subsetted version of mouse pancreas data. library(SCP) library(BiocParallel) register(MulticoreParam(workers = 8, progressbar = TRUE)) data("pancreas_sub") print(pancreas_sub) #> An object of class Seurat #> 47874 features across 1000 samples within 3 assays #> Active assay: RNA (15958 features, 3467 variable features) #> 2 other assays present: spliced, unspliced #> 2 dimensional reductions calculated: PCA, UMAP CellDimPlot( srt = pancreas_sub, group.by = c("CellType", "SubCellType"), reduction = "UMAP", theme_use = "theme_blank" ) CellDimPlot( srt = pancreas_sub, group.by = "SubCellType", stat.by = "Phase", reduction = "UMAP", theme_use = "theme_blank" ) FeatureDimPlot( srt = pancreas_sub, features = c("Sox9", "Neurog3", "Fev", "Rbp4"), reduction = "UMAP", theme_use = "theme_blank" ) FeatureDimPlot( srt = pancreas_sub, features = c("Ins1", "Gcg", "Sst", "Ghrl"), compare_features = TRUE, label = TRUE, label_insitu = TRUE, reduction = "UMAP", theme_use = "theme_blank" ) ht <- GroupHeatmap( srt = pancreas_sub, features = c( "Sox9", "Anxa2", # Ductal "Neurog3", "Hes6", # EPs "Fev", "Neurod1", # Pre-endocrine "Rbp4", "Pyy", # Endocrine "Ins1", "Gcg", "Sst", "Ghrl" # Beta, Alpha, Delta, Epsilon ), group.by = c("CellType", "SubCellType"), heatmap_palette = "YlOrRd", cell_annotation = c("Phase", "G2M_score", "Cdh2"), cell_annotation_palette = c("Dark2", "Paired", "Paired"), show_row_names = TRUE, row_names_side = "left", add_dot = TRUE, add_reticle = TRUE ) print(ht$plot) CellQC pancreas_sub <- RunCellQC(srt = pancreas_sub) CellDimPlot(srt = pancreas_sub, group.by = "CellQC", reduction = "UMAP") CellStatPlot(srt = pancreas_sub, stat.by = "CellQC", group.by = "CellType", label = TRUE) CellStatPlot( srt = pancreas_sub, stat.by = c( "db_qc", "outlier_qc", "umi_qc", "gene_qc", "mito_qc", "ribo_qc", "ribo_mito_ratio_qc", "species_qc" ), plot_type = "upset", stat_level = "Fail" ) Standard pipeline pancreas_sub <- Standard_SCP(srt = pancreas_sub) CellDimPlot( srt = pancreas_sub, group.by = c("CellType", "SubCellType"), reduction = "StandardUMAP2D", theme_use = "theme_blank" ) CellDimPlot3D(srt = pancreas_sub, group.by = "SubCellType") FeatureDimPlot3D(srt = pancreas_sub, features = c("Sox9", "Neurog3", "Fev", "Rbp4")) Integration pipeline Example data for integration is a subsetted version of panc8(eight human pancreas datasets) data("panc8_sub") panc8_sub <- Integration_SCP(srtMerge = panc8_sub, batch = "tech", integration_method = "Seurat") CellDimPlot( srt = panc8_sub, group.by = c("celltype", "tech"), reduction = "SeuratUMAP2D", title = "Seurat", theme_use = "theme_blank" ) UMAP embeddings based on different integration methods in SCP: Cell projection between single-cell datasets panc8_rename <- RenameFeatures(srt = panc8_sub, newnames = make.unique(capitalize(rownames(panc8_sub), force_tolower = TRUE)), assays = "RNA") srt_query <- RunKNNMap(srt_query = pancreas_sub, srt_ref = panc8_rename, ref_umap = "SeuratUMAP2D") ProjectionPlot( srt_query = srt_query, srt_ref = panc8_rename, query_group = "SubCellType", ref_group = "celltype" ) Cell annotation using bulk RNA-seq datasets data("ref_scMCA") pancreas_sub <- RunKNNPredict(srt_query = pancreas_sub, bulk_ref = ref_scMCA, filter_lowfreq = 20) CellDimPlot(srt = pancreas_sub, group.by = "KNNPredict_classification", reduction = "UMAP", label = TRUE) Cell annotation using single-cell datasets pancreas_sub <- RunKNNPredict( srt_query = pancreas_sub, srt_ref = panc8_rename, ref_group = "celltype", filter_lowfreq = 20 ) CellDimPlot(srt = pancreas_sub, group.by = "KNNPredict_classification", reduction = "UMAP", label = TRUE) pancreas_sub <- RunKNNPredict( srt_query = pancreas_sub, srt_ref = panc8_rename, query_group = "SubCellType", ref_group = "celltype", return_full_distance_matrix = TRUE ) CellDimPlot(srt = pancreas_sub, group.by = "KNNPredict_classification", reduction = "UMAP", label = TRUE) ht <- CellCorHeatmap( srt_query = pancreas_sub, srt_ref = panc8_rename, query_group = "SubCellType", ref_group = "celltype", nlabel = 3, label_by = "row", show_row_names = TRUE, show_column_names = TRUE ) print(ht$plot) PAGA analysis pancreas_sub <- RunPAGA( srt = pancreas_sub, group_by = "SubCellType", linear_reduction = "PCA", nonlinear_reduction = "UMAP" ) PAGAPlot(srt = pancreas_sub, reduction = "UMAP", label = TRUE, label_insitu = TRUE, label_repel = TRUE) Velocity analysis To estimate RNA velocity, you need to have both “spliced” and “unspliced” assays in your Seurat object. You can generate these matrices using velocyto, bustools, or alevin. pancreas_sub <- RunSCVELO( srt = pancreas_sub, group_by = "SubCellType", linear_reduction = "PCA", nonlinear_reduction = "UMAP" ) VelocityPlot(srt = pancreas_sub, reduction = "UMAP", group_by = "SubCellType") VelocityPlot(srt = pancreas_sub, reduction = "UMAP", plot_type = "stream") Differential expression analysis pancreas_sub <- RunDEtest(srt = pancreas_sub, group_by = "CellType", fc.threshold = 1, only.pos = FALSE) VolcanoPlot(srt = pancreas_sub, group_by = "CellType") DEGs <- pancreas_sub@tools$DEtest_CellType$AllMarkers_wilcox DEGs <- DEGs[with(DEGs, avg_log2FC > 1 & p_val_adj < 0.05), ] # Annotate features with transcription factors and surface proteins pancreas_sub <- AnnotateFeatures(pancreas_sub, species = "Mus_musculus", db = c("TF", "CSPA")) ht <- FeatureHeatmap( srt = pancreas_sub, group.by = "CellType", features = DEGs$gene, feature_split = DEGs$group1, species = "Mus_musculus", db = c("GO_BP", "KEGG", "WikiPathway"), anno_terms = TRUE, feature_annotation = c("TF", "CSPA"), feature_annotation_palcolor = list(c("gold", "steelblue"), c("forestgreen")), height = 5, width = 4 ) print(ht$plot) Enrichment analysis(over-representation) pancreas_sub <- RunEnrichment( srt = pancreas_sub, group_by = "CellType", db = "GO_BP", species = "Mus_musculus", DE_threshold = "avg_log2FC > log2(1.5) & p_val_adj < 0.05" ) EnrichmentPlot( srt = pancreas_sub, group_by = "CellType", group_use = c("Ductal", "Endocrine"), plot_type = "bar" ) EnrichmentPlot( srt = pancreas_sub, group_by = "CellType", group_use = c("Ductal", "Endocrine"), plot_type = "wordcloud" ) EnrichmentPlot( srt = pancreas_sub, group_by = "CellType", group_use = c("Ductal", "Endocrine"), plot_type = "wordcloud", word_type = "feature" ) EnrichmentPlot( srt = pancreas_sub, group_by = "CellType", group_use = "Ductal", plot_type = "network" ) To ensure that labels are visible, you can adjust the size of the viewer panel on Rstudio IDE. EnrichmentPlot( srt = pancreas_sub, group_by = "CellType", group_use = "Ductal", plot_type = "enrichmap" ) EnrichmentPlot(srt = pancreas_sub, group_by = "CellType", plot_type = "comparison") Enrichment analysis(GSEA) pancreas_sub <- RunGSEA( srt = pancreas_sub, group_by = "CellType", db = "GO_BP", species = "Mus_musculus", DE_threshold = "p_val_adj < 0.05" ) GSEAPlot(srt = pancreas_sub, group_by = "CellType", group_use = "Endocrine", id_use = "GO:0007186") GSEAPlot(srt = pancreas_sub, group_by = "CellType", plot_type = "comparison") Trajectory inference pancreas_sub <- RunSlingshot(srt = pancreas_sub, group.by = "SubCellType", reduction = "UMAP") FeatureDimPlot(pancreas_sub, features = paste0("Lineage", 1:3), reduction = "UMAP", theme_use = "theme_blank") CellDimPlot(pancreas_sub, group.by = "SubCellType", reduction = "UMAP", lineages = paste0("Lineage", 1:3), lineages_span = 0.1) Dynamic features pancreas_sub <- RunDynamicFeatures(srt = pancreas_sub, lineages = c("Lineage1", "Lineage2"), n_candidates = 200) ht <- DynamicHeatmap( srt = pancreas_sub, lineages = c("Lineage1", "Lineage2"), use_fitted = TRUE, n_split = 6, reverse_ht = "Lineage1", species = "Mus_musculus", db = "GO_BP", anno_terms = TRUE, anno_keys = TRUE, anno_features = TRUE, heatmap_palette = "viridis", cell_annotation = "SubCellType", separate_annotation = list("SubCellType", c("Nnat", "Irx1")), separate_annotation_palette = c("Paired", "Set1"), feature_annotation = c("TF", "CSPA"), feature_annotation_palcolor = list(c("gold", "steelblue"), c("forestgreen")), pseudotime_label = 25, pseudotime_label_color = "red", height = 5, width = 2 ) print(ht$plot) DynamicPlot( srt = pancreas_sub, lineages = c("Lineage1", "Lineage2"), group.by = "SubCellType", features = c("Plk1", "Hes1", "Neurod2", "Ghrl", "Gcg", "Ins2"), compare_lineages = TRUE, compare_features = FALSE ) FeatureStatPlot( srt = pancreas_sub, group.by = "SubCellType", bg.by = "CellType", stat.by = c("Sox9", "Neurod2", "Isl1", "Rbp4"), add_box = TRUE, comparisons = list( c("Ductal", "Ngn3 low EP"), c("Ngn3 high EP", "Pre-endocrine"), c("Alpha", "Beta") ) ) Interactive data visualization with SCExplorer PrepareSCExplorer(list(mouse_pancreas = pancreas_sub, human_pancreas = panc8_sub), base_dir = "./SCExplorer") app <- RunSCExplorer(base_dir = "./SCExplorer") list.files("./SCExplorer") # This directory can be used as site directory for Shiny Server. if (interactive()) { shiny::runApp(app) } Other visualization examples CellDimPlot CellStatPlot FeatureStatPlot GroupHeatmap You can also find more examples in the documentation of the function: Integration_SCP, RunKNNMap, RunMonocle3, RunPalantir, etc.
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
cdcepi/Flusight-forecast-dataR370630
cran-task-views/MixedModelsRMakefile22070
sr1919/RR0000
Homebrew/homebrew-cask-versionsRuby1.1k01.1k0
AmbiML/sparrow-kata-fullRustPythonC2400190
hananbeer/sqliditySolidityPython121080
aws-amplify/amplify-swiftSwiftCShell378+2174+2
jefferyf/dotfilesVim ScriptRubyShell0000
baldsealion/Steamdeck-Ultimate-Windows11-GuideRich Text Format248030
joeycastillo/Sensor-WatchCHTMLMakefile88401650