The primary objective of MGPfact is to deconstruct complex cellular trajectories into a series of interpretable bifurcation events, each uniquely characterizing a specific biological process, defined and represented by a group of influential genes. This targeted embedding approach endows MGPfact with a significant advantage in capturing subtle variations within biological pathways. Building on this, we have further developed differential gene analysis for various biological branching processes, as well as consensus trajectories that depict the intricate processes of biological development.
In this vignette, we demonstrate how to perform trajectory reconstruction using MGPfact.
Prior to running this vignette, please install MGPfactR and MGPfact.jl.
library(MGPfactR)
# Alternatively, you can set your julia_home directory.
julia_home = gsub("/julia$", "", system("which julia", intern = TRUE))
In principle, MGPfact accepts gene expression matrices in all forms. To align with the majority of existing analysis workflows, we utilize data that has been normalized for presentation. The test dataset originates from Saelens et al., and can be downloaded via a website link. `
The parameter dir
indicates the path where the analysis
results will be stored. CreateMGPfactObject
will construct
a folder under dir; if not set, it will default to the current
directory.
We can control the amount of downsampling using the parameter
max_murp
. By default, we believe that 50 to 150 MURP
subsets are sufficient to demonstrate the variations in differentiation
states.
ct = MURPDownsampling(ct, plot = T, max_murp = 50)
After downsampling, each MURP represents a MURP subset, and we can map the labels of the cells onto the downsampled MURP subsets.
ct = GetMURPMapLabel(ct, labels = c("time_point", "experiment"))
PlotLabelMerge(ct, labels = "time_point", width = 6)
# Store the PC results of MURP locally
SaveMURPDatToJulia(ct, murp_pc_number = 10)
MGPfact allows for the selection of initial nodes, and we offer two methods:
If there is no starting point, you can set start_murp
to
999, which is an index that does not exist in MURP.
# 1) Find start murp using cells start_cell = c('1_iN2_C82') sapply(start_cell, function(st){ ind = which(names(ct@MURP$Recommended_K_cl$cluster)==st) x =
# as.vector(ct@MURP$Recommended_K_cl$cluster[ind]) return(x) }) -> start_murp
# 2) choose start_murp using murp label
start_murp = c(24, 2, 35)
# Set settings The parameter 'chains_number' represents the number of Markov chains, and it is generally set to no more than 10. The parameter 'pse_optim_iterations' refers to the number of sampling
# iterations. Generally speaking, the greater the number of sampling iterations, the more stable the convergence of the results, but this will be accompanied by increased time consumption.
ct = SetSettings(ct, murp_pc_number = 3, trajectory_number = 3, pse_optim_iterations = 200, start_murp = start_murp, chains_number = 5)
save(ct, file = "ct.rda")
writeSettings(ct)
# 1. pse_optim
ct = RunningmodMGPpseudoT(ct, julia_home = julia_home, cores = 1)
ct <- GetIterCor(ct, iteration_list = list(c(1, getParams(ct, "pse_optim_iterations"))))
ct <- GetPredT(object = ct, chains = 1:getParams(ct, "chains_number"))
ct <- GetPseSdf(ct)
# This function can be used to obtain trajectory branch labels and pseudo time information.
sdf <- GetMURPInfo(ct)
ct <- GetBinTree(object = ct)
ct <- GetTbTree(object = ct)
ct <- GetTbTreeAllpoint(object = ct, save = T, labels = getParams(ct, "label"))
PlotBinBif(ct, plot_label = TRUE)
PlotPieBinLabel(ct, labels = "time_point")
PlotPieTbLabel(ct, labels = "time_point")
PlotPieConsensusMainLabel(ct, labels = "time_point")
PlotPieConsensusAllLabel(ct, labels = "time_point", cell_size = 5, cell_alpha = 0.2)
ct = GetSigmaFromJulia(object = ct, load = F, julia_home = julia_home)
ct = RunningmodMGPtrack(object = ct, iter = 800, cores = 10)
ct = GetTrackSdfFromOptim(object = ct)
TrajPlot(ct, save = TRUE, legend_title = "Phase", rug = F)
TrajPlot(ct, col = "time_point", save = TRUE, legend_title = "time_point")
Extract the high-weight gene sets for each trajectory, which are the decisive embedding features we referred to for each trajectory.
# filter highly weight genes
ct = WeightGeneFilter(ct, method = "weight_cut", weight_cut = 0.03)
# get gene weight
gene_weight = GetGeneWeight(ct)
# extract gene module for each trajectory
filter_g = WriteWeightGene(ct, "weight_cut")
x = read.csv("gene_weight_weight_cut_6923_6923.csv")
head(x)
Trajectory.1 | Weight.1 | Trajectory.2 | Weight.2 | Trajectory.3 | Weight.3 |
---|---|---|---|---|---|
Sparc | 0.120 | S100a6 | 0.113 | Cox6a1 | 0.085 |
Lgals1 | 0.120 | Malat1 | 0.106 | S100a6 | 0.083 |
1810020D17Rik | 0.110 | Acta2 | 0.096 | Actb | 0.081 |
Tpm2 | 0.103 | Ascl1 | 0.090 | Ftl1 | 0.078 |
Id3 | 0.099 | Id2 | 0.088 | Acta2 | 0.075 |
Id2 | 0.098 | Cox8b | 0.088 | Eif5a | 0.074 |
Differences of top 1 weight gene of each trajectory at different trajectory phase.
GeneBoxPhase(object = ct, gene = c("Lgals1", "S100a6", "Cox6a1"), aspect = "cell")
Additionally, we employed Gaussian process regression to design the estimation of the branching expression trend for any given gene across different trajectories, which we then visualized. The smooth curves represent the fitted values of expression, while the points indicate the average expression of the gene within the MURP subset.
ct = GetMURPGene(ct)
ct = GetCurveSdf(ct)
ct = GetGeneCurve(object = ct, genes = c("Lgals1", "S100a6", "Cox6a1"), n.samples = 100)
GeneCurvePlot(object = ct, gene = c("Lgals1", "S100a6", "Cox6a1"), col = "time_point")
We also offer various methods for calculating differential genes,
such as mean differences between branches
(ComputeMeanDiff
), statistical models for the interaction
effects of time and branches
(ComputeInteractionEffectGene
), and significant
correlations with time (ComputeCorGeneT
).
ct = ComputeInteractionEffectGene(ct, save_mod = FALSE, adjust_method = "fdr")
ct@BeautGene = ComputeSigGene(ct@BeautGene, "lm", "fdr", 0.05)
# write significant genes to txt
WriteGene2TXT(ct@BeautGene, "lm", "bonferroni", "c", 0.05)
save(ct, file = "ct.rda")
writeSettings(ct)
sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=zh_CN.UTF-8
## [5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=zh_CN.UTF-8
## [7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Asia/Shanghai
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.31 R6_2.5.1 fastmap_1.1.1 xfun_0.39
## [5] cachem_1.0.8 knitr_1.43 htmltools_0.5.5 rmarkdown_2.22
## [9] cli_3.6.1 sass_0.4.6 jquerylib_0.1.4 compiler_4.3.0
## [13] rstudioapi_0.14 tools_4.3.0 evaluate_0.21 bslib_0.5.0
## [17] yaml_2.3.7 formatR_1.14 rlang_1.1.1 jsonlite_1.8.5