The primary objective of MGPfact is to deconstruct complex cellular trajectories into a series of interpretable bifurcation events, each uniquely characterizing a specific biological process, defined and represented by a group of influential genes. This targeted embedding approach endows MGPfact with a significant advantage in capturing subtle variations within biological pathways. Building on this, we have further developed differential gene analysis for various biological branching processes, as well as consensus trajectories that depict the intricate processes of biological development.

In this vignette, we demonstrate how to perform trajectory reconstruction using MGPfact.

Prior to running this vignette, please install MGPfactR and MGPfact.jl.

library(MGPfactR)
# Alternatively, you can set your julia_home directory.
julia_home = gsub("/julia$", "", system("which julia", intern = TRUE))

Create MGPfact Object

In principle, MGPfact accepts gene expression matrices in all forms. To align with the majority of existing analysis workflows, we utilize data that has been normalized for presentation. The test dataset originates from Saelens et al., and can be downloaded via a website link. `

The parameter dir indicates the path where the analysis results will be stored. CreateMGPfactObject will construct a folder under dir; if not set, it will default to the current directory.

MURP downsampling

We can control the amount of downsampling using the parameter max_murp. By default, we believe that 50 to 150 MURP subsets are sufficient to demonstrate the variations in differentiation states.

ct = MURPDownsampling(ct, plot = T, max_murp = 50)

After downsampling, each MURP represents a MURP subset, and we can map the labels of the cells onto the downsampled MURP subsets.

ct = GetMURPMapLabel(ct, labels = c("time_point", "experiment"))
PlotLabelMerge(ct, labels = "time_point", width = 6)

Visualization of attribute tags on cells.

Visualization of attribute tags on MURP.

Initialize parameters

# Store the PC results of MURP locally
SaveMURPDatToJulia(ct, murp_pc_number = 10)

MGPfact allows for the selection of initial nodes, and we offer two methods:

If there is no starting point, you can set start_murp to 999, which is an index that does not exist in MURP.

# 1) Find start murp using cells start_cell = c('1_iN2_C82') sapply(start_cell, function(st){ ind = which(names(ct@MURP$Recommended_K_cl$cluster)==st) x =
# as.vector(ct@MURP$Recommended_K_cl$cluster[ind]) return(x) }) -> start_murp

# 2) choose start_murp using murp label
start_murp = c(24, 2, 35)

# Set settings The parameter 'chains_number' represents the number of Markov chains, and it is generally set to no more than 10.  The parameter 'pse_optim_iterations' refers to the number of sampling
# iterations. Generally speaking, the greater the number of sampling iterations, the more stable the convergence of the results, but this will be accompanied by increased time consumption.
ct = SetSettings(ct, murp_pc_number = 3, trajectory_number = 3, pse_optim_iterations = 200, start_murp = start_murp, chains_number = 5)
save(ct, file = "ct.rda")
writeSettings(ct)

Decomposable cellular trajectories.

1) Estimate model parameters through MCMC sampling
# 1. pse_optim
ct = RunningmodMGPpseudoT(ct, julia_home = julia_home, cores = 1)
2) Organize the results of parameter optimization.
ct <- GetIterCor(ct, iteration_list = list(c(1, getParams(ct, "pse_optim_iterations"))))
ct <- GetPredT(object = ct, chains = 1:getParams(ct, "chains_number"))
ct <- GetPseSdf(ct)

# This function can be used to obtain trajectory branch labels and pseudo time information.
sdf <- GetMURPInfo(ct)
3) Construct trajectory
ct <- GetBinTree(object = ct)
ct <- GetTbTree(object = ct)
ct <- GetTbTreeAllpoint(object = ct, save = T, labels = getParams(ct, "label"))
4) Trajectory visualization
PlotBinBif(ct, plot_label = TRUE)

PlotPieBinLabel(ct, labels = "time_point")

PlotPieTbLabel(ct, labels = "time_point")

PlotPieConsensusMainLabel(ct, labels = "time_point")

PlotPieConsensusAllLabel(ct, labels = "time_point", cell_size = 5, cell_alpha = 0.2)

Streamlined trajectory and factorizing gene modules

1) Obtain the covariance matrix corresponding to each bifurcation event.
ct = GetSigmaFromJulia(object = ct, load = F, julia_home = julia_home)
2) Streamlined trajectory
ct = RunningmodMGPtrack(object = ct, iter = 800, cores = 10)
ct = GetTrackSdfFromOptim(object = ct)
3) visualization
TrajPlot(ct, save = TRUE, legend_title = "Phase", rug = F)
TrajPlot(ct, col = "time_point", save = TRUE, legend_title = "time_point")

4) Factorizing gene modules

Extract the high-weight gene sets for each trajectory, which are the decisive embedding features we referred to for each trajectory.

# filter highly weight genes
ct = WeightGeneFilter(ct, method = "weight_cut", weight_cut = 0.03)

# get gene weight
gene_weight = GetGeneWeight(ct)

# extract gene module for each trajectory
filter_g = WriteWeightGene(ct, "weight_cut")
x = read.csv("gene_weight_weight_cut_6923_6923.csv")
head(x)
Trajectory.1 Weight.1 Trajectory.2 Weight.2 Trajectory.3 Weight.3
Sparc 0.120 S100a6 0.113 Cox6a1 0.085
Lgals1 0.120 Malat1 0.106 S100a6 0.083
1810020D17Rik 0.110 Acta2 0.096 Actb 0.081
Tpm2 0.103 Ascl1 0.090 Ftl1 0.078
Id3 0.099 Id2 0.088 Acta2 0.075
Id2 0.098 Cox8b 0.088 Eif5a 0.074

Differences of top 1 weight gene of each trajectory at different trajectory phase.

GeneBoxPhase(object = ct, gene = c("Lgals1", "S100a6", "Cox6a1"), aspect = "cell")

Additionally, we employed Gaussian process regression to design the estimation of the branching expression trend for any given gene across different trajectories, which we then visualized. The smooth curves represent the fitted values of expression, while the points indicate the average expression of the gene within the MURP subset.

ct = GetMURPGene(ct)
ct = GetCurveSdf(ct)
ct = GetGeneCurve(object = ct, genes = c("Lgals1", "S100a6", "Cox6a1"), n.samples = 100)
GeneCurvePlot(object = ct, gene = c("Lgals1", "S100a6", "Cox6a1"), col = "time_point")

Differentially expressed genes

We also offer various methods for calculating differential genes, such as mean differences between branches (ComputeMeanDiff), statistical models for the interaction effects of time and branches (ComputeInteractionEffectGene), and significant correlations with time (ComputeCorGeneT).

ct = ComputeInteractionEffectGene(ct, save_mod = FALSE, adjust_method = "fdr")
ct@BeautGene = ComputeSigGene(ct@BeautGene, "lm", "fdr", 0.05)

# write significant genes to txt
WriteGene2TXT(ct@BeautGene, "lm", "bonferroni", "c", 0.05)
save(ct, file = "ct.rda")
writeSettings(ct)
Session Info
sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=zh_CN.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=zh_CN.UTF-8        LC_COLLATE=zh_CN.UTF-8    
##  [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=zh_CN.UTF-8   
##  [7] LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Asia/Shanghai
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.31   R6_2.5.1        fastmap_1.1.1   xfun_0.39      
##  [5] cachem_1.0.8    knitr_1.43      htmltools_0.5.5 rmarkdown_2.22 
##  [9] cli_3.6.1       sass_0.4.6      jquerylib_0.1.4 compiler_4.3.0 
## [13] rstudioapi_0.14 tools_4.3.0     evaluate_0.21   bslib_0.5.0    
## [17] yaml_2.3.7      formatR_1.14    rlang_1.1.1     jsonlite_1.8.5