muon brings multimodal data objects and multimodal integration methods together.
Multi-omic factor analysis (MOFA) is a group factor analysis method that allows to learn an interpretable latent space jointly on multiple modalities. Intuitively, it can be viewed as a generalisation of PCA for multi-omics data. More information about this method can be found on the MOFA website.
While there are quite a few options to configure the method for the task in question, running it with default options is simple with
>>> mu.tl.mofa(mdata) >>> "X_mofa" in mdata.obsm True
For example, the number of factors to learn can be adjusted with
n_factors, and training can be launched on the GPU when available with
By default, only highly variable features are used with
use_var='highly_variable'. If there’s no column
.var['highly_variable'], all features are used. Any other feature selection can be provided with
use_var as long as it’s defined for all the assays as a boolean value.
If the variability inside groups of observations (samples or cells) is of interest, and not between them,
groups_label can be provided to account for that during the training. For instance, the variability between batches can be accounted for in the MOFA framework. See more details about the multi-group fuctionality in the MOFA+ FAQ.
Some observations might not be present in some modalities. While
muon.pp.intersect_obs() can be used to make sure only common observations are preserved in all the modalities, MOFA+ also provides an interface to deal with missing data. There are two strategies:
use_obs='intersection' to only use common observations for the training and
use_obs='union' to use all the observations filling the missing piecies with missing values.
mu.tl.mofa(mdata, use_obs='union') # or mu.tl.mofa(mdata, use_obs='intersection')
Traning a factor model on a GPU would allow to reduce the computational time. MOFA+ uses CuPy in order to take advantage of NVIDIA CUDA acceleration:
Familiar clustering algorithms can be run based on neighbours information from different modalities with
muon.tl.louvain(). Resolution can be set for each modality individually. More than that, contribution of each modality can also be weighted.
>>> mu.tl.leiden(mdata, resolution=[2., .5]) >>> mu.tl.louvain(mdata, mod_weights=[1., .5])
Weighted nearest neighbours (WNN) is a procedure to define a neighbourhood graph for the samples across different feature sets (modalities). It has been described in Hao et al., 2020 and Swanson et al., 2020. As other neighbourhood detection methods, it is available in the preprocessing module
muon.pp.neighbors(). These learned distances can further be used e.g. to construct a latent space:
>>> mu.pp.neighbors(mdata) >>> mu.tl.umap(mdata)
To manage the complexity of having to deal with multiple modalities, there is a handful of utility functions in muon. This includes in-place filtering: just as it works for a single modality,
muon.pp.filter_var() will filter observations or variables in each modality as well as in the attributes of the
In order to keep observations present in all the modalities, there is
muon.pp.intersect_obs(). Using the in-place filtering under the hood, it will modify the
MuData object and the contained modalities to only have the common observations: