quick start for MGPfact

The primary objective of MGPfact is to deconstruct complex cellular trajectories into a series of interpretable bifurcation events, each uniquely characterizing a specific biological process, defined and represented by a group of influential genes. This targeted embedding approach endows MGPfact with a significant advantage in capturing subtle variations within biological pathways. Building on this, we have further developed differential gene analysis for various biological branching processes, as well as consensus trajectories that depict the intricate processes of biological development.

In this vignette, we demonstrate how to perform trajectory reconstruction using MGPfact.

Performed MURP downsampling of the preprocessed scRNA-seq data.
Decomposable cellular trajectories
Streamlined trajectory and factorizing gene modules
Differentially expressed genes.

Prior to running this vignette, please install MGPfactR and MGPfact.jl.

library(MGPfactR)
# Alternatively, you can set your julia_home directory.
julia_home = gsub("/julia$", "", system("which julia", intern = TRUE))

Create MGPfact Object

In principle, MGPfact accepts gene expression matrices in all forms. To align with the majority of existing analysis workflows, we utilize data that has been normalized for presentation. The test dataset originates from Saelens et al., and can be downloaded via a website link. `

The parameter dir indicates the path where the analysis results will be stored. CreateMGPfactObject will construct a folder under dir; if not set, it will default to the current directory.

MURP downsampling

We can control the amount of downsampling using the parameter max_murp. By default, we believe that 50 to 150 MURP subsets are sufficient to demonstrate the variations in differentiation states.

ct = MURPDownsampling(ct, plot = T, max_murp = 50)

After downsampling, each MURP represents a MURP subset, and we can map the labels of the cells onto the downsampled MURP subsets.

ct = GetMURPMapLabel(ct, labels = c("time_point", "experiment"))
PlotLabelMerge(ct, labels = "time_point", width = 6)

Visualization of attribute tags on cells.

Visualization of attribute tags on MURP.

Initialize parameters

# Store the PC results of MURP locally
SaveMURPDatToJulia(ct, murp_pc_number = 10)

MGPfact allows for the selection of initial nodes, and we offer two methods:

Provide specific cell names and determine their corresponding MURP through the names.
Select the starting point through the index labels of MURP in the figure above, and multiple starting points can be set.

If there is no starting point, you can set start_murp to 999, which is an index that does not exist in MURP.

# 1) Find start murp using cells start_cell = c('1_iN2_C82') sapply(start_cell, function(st){ ind = which(names(ct@MURP$Recommended_K_cl$cluster)==st) x =
# as.vector(ct@MURP$Recommended_K_cl$cluster[ind]) return(x) }) -> start_murp

# 2) choose start_murp using murp label
start_murp = c(24, 2, 35)

# Set settings The parameter 'chains_number' represents the number of Markov chains, and it is generally set to no more than 10.  The parameter 'pse_optim_iterations' refers to the number of sampling
# iterations. Generally speaking, the greater the number of sampling iterations, the more stable the convergence of the results, but this will be accompanied by increased time consumption.
ct = SetSettings(ct, murp_pc_number = 3, trajectory_number = 3, pse_optim_iterations = 200, start_murp = start_murp, chains_number = 5)
save(ct, file = "ct.rda")
writeSettings(ct)

Decomposable cellular trajectories.

1) Estimate model parameters through MCMC sampling

# 1. pse_optim
ct = RunningmodMGPpseudoT(ct, julia_home = julia_home, cores = 1)

2) Organize the results of parameter optimization.

ct <- GetIterCor(ct, iteration_list = list(c(1, getParams(ct, "pse_optim_iterations"))))
ct <- GetPredT(object = ct, chains = 1:getParams(ct, "chains_number"))
ct <- GetPseSdf(ct)

# This function can be used to obtain trajectory branch labels and pseudo time information.
sdf <- GetMURPInfo(ct)

3) Construct trajectory

ct <- GetBinTree(object = ct)
ct <- GetTbTree(object = ct)
ct <- GetTbTreeAllpoint(object = ct, save = T, labels = getParams(ct, "label"))

4) Trajectory visualization

PlotBinBif(ct, plot_label = TRUE)

PlotPieBinLabel(ct, labels = "time_point")

PlotPieTbLabel(ct, labels = "time_point")

PlotPieConsensusMainLabel(ct, labels = "time_point")

PlotPieConsensusAllLabel(ct, labels = "time_point", cell_size = 5, cell_alpha = 0.2)

Streamlined trajectory and factorizing gene modules

1) Obtain the covariance matrix corresponding to each bifurcation event.

ct = GetSigmaFromJulia(object = ct, load = F, julia_home = julia_home)

2) Streamlined trajectory

ct = RunningmodMGPtrack(object = ct, iter = 800, cores = 10)
ct = GetTrackSdfFromOptim(object = ct)

3) visualization

TrajPlot(ct, save = TRUE, legend_title = "Phase", rug = F)
TrajPlot(ct, col = "time_point", save = TRUE, legend_title = "time_point")

4) Factorizing gene modules

Extract the high-weight gene sets for each trajectory, which are the decisive embedding features we referred to for each trajectory.

# filter highly weight genes
ct = WeightGeneFilter(ct, method = "weight_cut", weight_cut = 0.03)

# get gene weight
gene_weight = GetGeneWeight(ct)

# extract gene module for each trajectory
filter_g = WriteWeightGene(ct, "weight_cut")

x = read.csv("gene_weight_weight_cut_6923_6923.csv")
head(x)

Trajectory.1	Weight.1	Trajectory.2	Weight.2	Trajectory.3	Weight.3
Sparc	0.120	S100a6	0.113	Cox6a1	0.085
Lgals1	0.120	Malat1	0.106	S100a6	0.083
1810020D17Rik	0.110	Acta2	0.096	Actb	0.081
Tpm2	0.103	Ascl1	0.090	Ftl1	0.078
Id3	0.099	Id2	0.088	Acta2	0.075
Id2	0.098	Cox8b	0.088	Eif5a	0.074

Differences of top 1 weight gene of each trajectory at different trajectory phase.

GeneBoxPhase(object = ct, gene = c("Lgals1", "S100a6", "Cox6a1"), aspect = "cell")

Additionally, we employed Gaussian process regression to design the estimation of the branching expression trend for any given gene across different trajectories, which we then visualized. The smooth curves represent the fitted values of expression, while the points indicate the average expression of the gene within the MURP subset.

ct = GetMURPGene(ct)
ct = GetCurveSdf(ct)
ct = GetGeneCurve(object = ct, genes = c("Lgals1", "S100a6", "Cox6a1"), n.samples = 100)

GeneCurvePlot(object = ct, gene = c("Lgals1", "S100a6", "Cox6a1"), col = "time_point")

Differentially expressed genes

We also offer various methods for calculating differential genes, such as mean differences between branches (ComputeMeanDiff), statistical models for the interaction effects of time and branches (ComputeInteractionEffectGene), and significant correlations with time (ComputeCorGeneT).

ct = ComputeInteractionEffectGene(ct, save_mod = FALSE, adjust_method = "fdr")
ct@BeautGene = ComputeSigGene(ct@BeautGene, "lm", "fdr", 0.05)

# write significant genes to txt
WriteGene2TXT(ct@BeautGene, "lm", "bonferroni", "c", 0.05)

save(ct, file = "ct.rda")
writeSettings(ct)

Session Info

sessionInfo()

## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=zh_CN.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=zh_CN.UTF-8        LC_COLLATE=zh_CN.UTF-8    
##  [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=zh_CN.UTF-8   
##  [7] LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Asia/Shanghai
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.31   R6_2.5.1        fastmap_1.1.1   xfun_0.39      
##  [5] cachem_1.0.8    knitr_1.43      htmltools_0.5.5 rmarkdown_2.22 
##  [9] cli_3.6.1       sass_0.4.6      jquerylib_0.1.4 compiler_4.3.0 
## [13] rstudioapi_0.14 tools_4.3.0     evaluate_0.21   bslib_0.5.0    
## [17] yaml_2.3.7      formatR_1.14    rlang_1.1.1     jsonlite_1.8.5