Multi omics analysis by MOFA and GLUE¶

MOFA is a factor analysis model that provides a general framework for the integration of multi-omic data sets in an unsupervised fashion.

Most of the time, however, we did not get paired cells in the multi-omics analysis. Here, we can pair cells using GLUE, a dimensionality reduction algorithm that can integrate different histological layers, and it can efficiently merge data from different histological layers.

This tutorial focuses on how to perform mofa in multi-omics using GLUE.

Paper: MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data and Multi-omics single-cell data integration and regulatory inference with graph-linked embedding

Code: https://github.com/bioFAM/mofapy2 and https://github.com/gao-lab/GLUE

Colab_Reproducibility：https://colab.research.google.com/drive/1zlakFf20IoBdyuOQDocwFQHu8XOVizRL?usp=sharing

We used the result anndata object rna-emb.h5ad and atac.emb.h5ad from GLUE'tutorial

In [1]:

Copied!

import omicverse as ov
ov.utils.ov_plot_set()
import omicverse as ov
ov.utils.ov_plot_set()

2023-05-19 03:10:24.139649: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-19 03:10:24.226148: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-05-19 03:10:24.226183: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-05-19 03:10:24.729457: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-05-19 03:10:24.729553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-05-19 03:10:24.729559: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

/mnt/data/env/pyomic/lib/python3.8/site-packages/phate/__init__.py

Load the data¶

We use ov.utils.read to read the h5ad files

In [2]:

Copied!

rna=ov.utils.read("chen_rna-emb.h5ad")
atac=ov.utils.read("chen_atac-emb.h5ad")
rna=ov.utils.read("chen_rna-emb.h5ad")
atac=ov.utils.read("chen_atac-emb.h5ad")

Pair the omics¶

Each cell in our rna and atac data has a feature vector, X_glue, based on which we can calculate the Pearson correlation coefficient to perform cell matching.

In [3]:

Copied!

pair_obj=ov.single.GLUE_pair(rna,atac)
pair_obj.correlation()
pair_obj=ov.single.GLUE_pair(rna,atac)
pair_obj.correlation()

......Extract GLUE layer from obs
......Prepare for pair
......Start to calculate the Pearson coef

Now Pearson block is 1/2: 100%|██████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.45it/s]
Now rna_index is 4999/9999, all is 9190: 100%|█████████████████████████████████████████████████████████████| 5000/5000 [01:00<00:00, 82.59it/s]

Now epoch is 0, 5000/9190

Now Pearson block is 1/2: 100%|██████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.70it/s]
Now rna_index is 9189/13379, all is 9190: 100%|████████████████████████████████████████████████████████████| 4190/4190 [00:58<00:00, 71.04it/s]

Now epoch is 1, 9190/9190

We counted the top 50 highly correlated cells in another histology layer for each cell in one of the histology layers to avoid missing data due to one cell being highly correlated with multiple cells. The default minimum threshold for high correlation is 0.9. We can obtain more paired cells by increasing the depth, but note that increasing the depth may lead to higher errors in cell matching

In [4]:

Copied!

res_pair=pair_obj.find_neighbor_cell(depth=20)
res_pair.to_csv('models/chen_pair_res.csv')
res_pair=pair_obj.find_neighbor_cell(depth=20)
res_pair.to_csv('models/chen_pair_res.csv')

Now depth is 19/20: 100%|██████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:01<00:00, 19.49it/s]

We filter to get paired cells

In [14]:

Copied!





rna1=rna[res_pair['omic_1']]
atac1=atac[res_pair['omic_2']]
rna1.obs.index=res_pair.index
atac1.obs.index=res_pair.index
rna1,atac1
rna1=rna[res_pair['omic_1']]
atac1=atac[res_pair['omic_2']]
rna1.obs.index=res_pair.index
atac1.obs.index=res_pair.index
rna1,atac1

Out[14]:

(View of AnnData object with n_obs × n_vars = 5802 × 28930
     obs: 'domain', 'cell_type', 'balancing_weight'
     var: 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'mean', 'std', 'chrom', 'chromStart', 'chromEnd', 'name', 'score', 'strand', 'thickStart', 'thickEnd', 'itemRgb', 'blockCount', 'blockSizes', 'blockStarts', 'gene_id', 'gene_type', 'mgi_id', 'havana_gene', 'tag'
     uns: '__scglue__', 'cell_type_colors', 'hvg', 'log1p', 'neighbors', 'pca', 'umap'
     obsm: 'X_glue', 'X_pca', 'X_umap'
     varm: 'PCs', 'X_glue'
     layers: 'counts'
     obsp: 'connectivities', 'distances',
 View of AnnData object with n_obs × n_vars = 5802 × 241757
     obs: 'domain', 'cell_type', 'balancing_weight'
     var: 'chrom', 'chromStart', 'chromEnd', 'highly_variable'
     uns: '__scglue__', 'cell_type_colors', 'neighbors', 'umap'
     obsm: 'X_glue', 'X_lsi', 'X_umap'
     varm: 'X_glue'
     obsp: 'connectivities', 'distances')

We can use mudata to store the multi-omics

In [6]:

Copied!

from mudata import MuData

mdata = MuData({'rna': rna1, 'atac': atac1})
mdata
from mudata import MuData

mdata = MuData({'rna': rna1, 'atac': atac1})
mdata

Out[6]:

MuData object with n_obs × n_vars = 5802 × 270687
  var:	'highly_variable', 'chrom', 'chromStart', 'chromEnd'
  2 modalities
    rna:	5802 x 28930
      obs:	'domain', 'cell_type', 'balancing_weight'
      var:	'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'mean', 'std', 'chrom', 'chromStart', 'chromEnd', 'name', 'score', 'strand', 'thickStart', 'thickEnd', 'itemRgb', 'blockCount', 'blockSizes', 'blockStarts', 'gene_id', 'gene_type', 'mgi_id', 'havana_gene', 'tag'
      uns:	'__scglue__', 'cell_type_colors', 'hvg', 'log1p', 'neighbors', 'pca', 'umap'
      obsm:	'X_glue', 'X_pca', 'X_umap'
      varm:	'PCs', 'X_glue'
      layers:	'counts'
      obsp:	'connectivities', 'distances'
    atac:	5802 x 241757
      obs:	'domain', 'cell_type', 'balancing_weight'
      var:	'chrom', 'chromStart', 'chromEnd', 'highly_variable'
      uns:	'__scglue__', 'cell_type_colors', 'neighbors', 'umap'
      obsm:	'X_glue', 'X_lsi', 'X_umap'
      varm:	'X_glue'
      obsp:	'connectivities', 'distances'

In [7]:

Copied!

mdata.write("chen_mu.h5mu",compression='gzip')
mdata.write("chen_mu.h5mu",compression='gzip')

MOFA prepare¶

In the MOFA analysis, we only need to use highly variable genes, for which we perform one filter

In [22]:

Copied!





rna1=mdata['rna']
rna1=rna1[:,rna1.var['highly_variable']==True]
atac1=mdata['atac']
atac1=atac1[:,atac1.var['highly_variable']==True]
rna1.obs.index=res_pair.index
atac1.obs.index=res_pair.index
rna1=mdata['rna']
rna1=rna1[:,rna1.var['highly_variable']==True]
atac1=mdata['atac']
atac1=atac1[:,atac1.var['highly_variable']==True]
rna1.obs.index=res_pair.index
atac1.obs.index=res_pair.index

In [23]:

Copied!

import random
random_obs_index=random.sample(list(rna1.obs.index),5000)
import random
random_obs_index=random.sample(list(rna1.obs.index),5000)

In [25]:

Copied!





from sklearn.metrics import adjusted_rand_score as ari
ari_random=ari(rna1[random_obs_index].obs['cell_type'], atac1[random_obs_index].obs['cell_type'])
ari_raw=ari(rna1.obs['cell_type'], atac1.obs['cell_type'])
print('raw ari:{}, random ari:{}'.format(ari_raw,ari_random))
from sklearn.metrics import adjusted_rand_score as ari
ari_random=ari(rna1[random_obs_index].obs['cell_type'], atac1[random_obs_index].obs['cell_type'])
ari_raw=ari(rna1.obs['cell_type'], atac1.obs['cell_type'])
print('raw ari:{}, random ari:{}'.format(ari_raw,ari_random))

raw ari:0.7497091922584931, random ari:0.7491244560264247

In [26]:

Copied!

#rna1=rna1[random_obs_index]
#atac1=atac1[random_obs_index]
#rna1=rna1[random_obs_index]
#atac1=atac1[random_obs_index]

MOFA analysis¶

In this part, we construct a model of mofa by scRNA-seq and scATAC-seq

In [28]:

Copied!

test_mofa=ov.single.pyMOFA(omics=[rna1,atac1],
                             omics_name=['RNA','ATAC'])
test_mofa=ov.single.pyMOFA(omics=[rna1,atac1],
                             omics_name=['RNA','ATAC'])

In [29]:

Copied!

test_mofa.mofa_preprocess()
test_mofa.mofa_run(outfile='models/chen_rna_atac.hdf5')
test_mofa.mofa_preprocess()
test_mofa.mofa_run(outfile='models/chen_rna_atac.hdf5')

        #########################################################
        ###           __  __  ____  ______                    ### 
        ###          |  \/  |/ __ \|  ____/\    _             ### 
        ###          | \  / | |  | | |__ /  \ _| |_           ### 
        ###          | |\/| | |  | |  __/ /\ \_   _|          ###
        ###          | |  | | |__| | | / ____ \|_|            ###
        ###          |_|  |_|\____/|_|/_/    \_\              ###
        ###                                                   ### 
        ######################################################### 
       
 
        
Groups names not provided, using default naming convention:
- group1, group2, ..., groupG

Successfully loaded view='RNA' group='group0' with N=5802 samples and D=2000 features...
Successfully loaded view='ATAC' group='group0' with N=5802 samples and D=25488 features...


Warning: 15 features(s) in view 0 have zero variance, consider removing them before training the model...

Warning: 204 features(s) in view 1 have zero variance, consider removing them before training the model...

Model options:
- Automatic Relevance Determination prior on the factors: True
- Automatic Relevance Determination prior on the weights: True
- Spike-and-slab prior on the factors: False
- Spike-and-slab prior on the weights: True
Likelihoods:
- View 0 (RNA): gaussian
- View 1 (ATAC): gaussian



GPU mode is activated



######################################
## Training the model with seed 112 ##
######################################


ELBO before training: -2115272814.25 

Iteration 1: time=19.35, ELBO=191213094.17, deltaELBO=2306485908.422 (109.03964221%), Factors=19
Iteration 2: time=16.58, ELBO=196818457.51, deltaELBO=5605363.336 (0.26499482%), Factors=18
Iteration 3: time=15.64, ELBO=196891451.69, deltaELBO=72994.187 (0.00345082%), Factors=17
Iteration 4: time=14.82, ELBO=196995605.43, deltaELBO=104153.733 (0.00492389%), Factors=16
Iteration 5: time=14.06, ELBO=197054469.84, deltaELBO=58864.412 (0.00278283%), Factors=15
Iteration 6: time=13.33, ELBO=197108040.24, deltaELBO=53570.400 (0.00253255%), Factors=14
Iteration 7: time=12.57, ELBO=197166806.53, deltaELBO=58766.287 (0.00277819%), Factors=13
Iteration 8: time=11.82, ELBO=197221227.29, deltaELBO=54420.761 (0.00257275%), Factors=12
Iteration 9: time=11.02, ELBO=197274435.32, deltaELBO=53208.035 (0.00251542%), Factors=11
Iteration 10: time=10.27, ELBO=197321461.12, deltaELBO=47025.800 (0.00222316%), Factors=10
Iteration 11: time=9.44, ELBO=197307721.09, deltaELBO=-13740.036 (0.00064956%), Factors=9
Warning, lower bound is decreasing...
Iteration 12: time=8.66, ELBO=197345997.01, deltaELBO=38275.920 (0.00180950%), Factors=8
Iteration 13: time=8.12, ELBO=197354473.50, deltaELBO=8476.491 (0.00040073%), Factors=8
Iteration 14: time=8.14, ELBO=197361554.07, deltaELBO=7080.568 (0.00033474%), Factors=8

Converged!



#######################
## Training finished ##
#######################


Saving model in models/chen_rna_atac.hdf5...

MOFA Visualization¶

In this part, we provide a series of function to visualize the result of mofa.

In [30]:

Copied!

pymofa_obj=ov.single.pyMOFAART(model_path='models/chen_rna_atac.hdf5')
pymofa_obj=ov.single.pyMOFAART(model_path='models/chen_rna_atac.hdf5')

In [31]:

Copied!

pymofa_obj.get_factors(rna1)
rna1
pymofa_obj.get_factors(rna1)
rna1

......Add factors to adata and store to adata.obsm["X_mofa"]

Out[31]:

AnnData object with n_obs × n_vars = 5802 × 2000
    obs: 'domain', 'cell_type', 'balancing_weight', 'factor1', 'factor2', 'factor3', 'factor4', 'factor5', 'factor6', 'factor7', 'factor8'
    var: 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'mean', 'std', 'chrom', 'chromStart', 'chromEnd', 'name', 'score', 'strand', 'thickStart', 'thickEnd', 'itemRgb', 'blockCount', 'blockSizes', 'blockStarts', 'gene_id', 'gene_type', 'mgi_id', 'havana_gene', 'tag'
    uns: '__scglue__', 'cell_type_colors', 'hvg', 'log1p', 'neighbors', 'pca', 'umap'
    obsm: 'X_glue', 'X_pca', 'X_umap', 'X_mofa'
    varm: 'PCs', 'X_glue'
    layers: 'counts'
    obsp: 'connectivities', 'distances'

Visualize the varience of each view¶

In [32]:

Copied!

pymofa_obj.plot_r2()
pymofa_obj.plot_r2()

Out[32]:

(<Figure size 160x240 with 2 Axes>,
 <AxesSubplot: title={'center': 'Varience'}, xlabel='View', ylabel='Factor'>)

No description has been provided for this image

In [33]:

Copied!

pymofa_obj.get_r2()
pymofa_obj.get_r2()

Out[33]:

	RNA	ATAC
0	1.531975	0.020675
1	-0.036567	1.056309
2	0.751953	0.039987
3	0.060023	0.674215
4	0.364469	0.251295
5	0.141428	0.051458
6	0.114254	0.053613
7	0.119133	0.034315

Visualize the correlation between factor and celltype¶

In [37]:

Copied!

pymofa_obj.plot_cor(rna1,'cell_type',figsize=(4,6))
pymofa_obj.plot_cor(rna1,'cell_type',figsize=(4,6))

Out[37]:

(<Figure size 320x480 with 2 Axes>,
 <AxesSubplot: title={'center': 'Correlation'}, xlabel='Factor', ylabel='cell_type'>)

In [38]:

Copied!

pymofa_obj.get_cor(rna1,'cell_type')
pymofa_obj.get_cor(rna1,'cell_type')

Out[38]:

	factor1	factor2	factor3	factor4	factor5	factor6	factor7	factor8
InV	0.665011	20.657980	9.248108	17.089494	13.422602	32.132796	24.124465	44.255535
Clau	2.826121	5.808516	0.811940	0.268856	0.263320	0.261479	0.105971	0.908162
E5Parm1	7.532620	17.957307	1.763991	27.139456	84.494624	0.502745	0.651795	0.132044
OliM	6.513966	17.629268	500.000000	24.141743	10.914886	23.621465	25.395877	4.926897
E4Thsd7a	16.938637	237.595015	10.175490	13.044958	45.402276	4.448403	1.102767	5.383185
Mic	18.721245	0.863540	97.697177	1.925362	1.950863	1.756524	0.071363	0.533441
InP	3.293589	38.579983	26.957391	25.219145	15.585743	161.657573	250.170196	141.766482
OliI	2.847773	3.391799	141.002571	4.200228	2.875140	228.418422	0.743355	6.744291
E5Tshz2	2.709568	7.122963	1.657839	18.795893	2.700763	0.392799	0.462671	2.154666
E5Galnt14	14.255253	4.202316	9.685683	15.230670	46.138543	0.958545	1.906899	3.441217
OPC	23.475918	18.455308	32.199756	14.478467	10.162953	82.261963	5.433134	3.726258
Peri	0.658162	1.418508	0.884916	1.978547	3.519710	0.687741	0.444232	0.211925
InN	1.333910	14.820248	14.356910	10.069938	6.115415	45.999768	31.705707	51.326725
E3Rmst	10.885583	20.024174	11.668498	4.168142	17.267523	0.959733	4.561186	10.930971
E3Rorb	36.750410	38.807227	48.466332	81.247901	285.221588	14.008255	36.506786	69.163363
E4Il1rapl2	18.191082	80.250141	11.056312	1.086077	6.790889	2.651834	1.754884	5.952761
E5Sulf1	5.240678	1.483262	3.790759	6.827697	135.848121	1.140704	0.166260	0.283728
E6Tle4	29.553202	39.878014	32.630424	104.177836	500.000000	16.817520	5.015146	0.432723
InS	3.389144	31.927599	6.973416	17.450485	6.059428	81.276278	74.919020	125.230026
Endo	0.638567	1.657187	0.245722	1.829868	0.430014	0.024523	0.465449	0.896513
E2Rasgrf2	68.612084	10.432200	43.132372	500.000000	99.122213	23.616769	7.072289	4.607651
Ast	500.000000	72.170444	9.236814	67.929809	42.193385	5.108831	0.439351	4.559738

In [46]:

Copied!

pymofa_obj.plot_factor(rna1,'cell_type','Ast',figsize=(3,3),
                    factor1=1,factor2=3,)
pymofa_obj.plot_factor(rna1,'cell_type','Ast',figsize=(3,3),
                    factor1=1,factor2=3,)

Out[46]:

(<Figure size 240x240 with 1 Axes>,
 <AxesSubplot: title={'center': 'Ast'}, xlabel='X_mofa1', ylabel='X_mofa3'>)

Visualize the factor in UMAP¶

To visualize the GLUE’s learned embeddings, we use the pymde package wrapperin scvi-tools. This is an alternative to UMAP that is GPU-accelerated.

You can use sc.tl.umap insteaded.

In [41]:

Copied!





from scvi.model.utils import mde
import scanpy as sc
sc.pp.neighbors(rna1, use_rep="X_glue", metric="cosine")
rna1.obsm["X_mde"] = mde(rna1.obsm["X_glue"])
from scvi.model.utils import mde
import scanpy as sc
sc.pp.neighbors(rna1, use_rep="X_glue", metric="cosine")
rna1.obsm["X_mde"] = mde(rna1.obsm["X_glue"])

computing neighbors
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:12)

In [47]:

Copied!





sc.pl.embedding(
    rna1,
    basis="X_mde",
    color=["factor1","factor3","cell_type"],
    frameon=False,
    ncols=3,
    #palette=ov.utils.pyomic_palette(),
    show=False,
    cmap='Greens',
    vmin=0,
)
sc.pl.embedding(
    rna1,
    basis="X_mde",
    color=["factor1","factor3","cell_type"],
    frameon=False,
    ncols=3,
    #palette=ov.utils.pyomic_palette(),
    show=False,
    cmap='Greens',
    vmin=0,
)

Out[47]:

[<AxesSubplot: title={'center': 'factor1'}, xlabel='X_mde1', ylabel='X_mde2'>,
 <AxesSubplot: title={'center': 'factor3'}, xlabel='X_mde1', ylabel='X_mde2'>,
 <AxesSubplot: title={'center': 'cell_type'}, xlabel='X_mde1', ylabel='X_mde2'>]

Weights ranked¶

A visualization of factor weights familiar to MOFA and MOFA+ users is implemented with some modifications in plot_weight_gene_d1, plot_weight_gene_d2, and plot_weights.

In [48]:

Copied!

pymofa_obj.plot_weight_gene_d1(view='RNA',factor1=1,factor2=3,)
pymofa_obj.plot_weight_gene_d1(view='RNA',factor1=1,factor2=3,)

Out[48]:

(<Figure size 240x240 with 1 Axes>,
 <AxesSubplot: xlabel='factor_1', ylabel='factor_3'>)

In [50]:

Copied!

pymofa_obj.plot_weights(view='RNA',factor=1,
                        ascending=False)
pymofa_obj.plot_weights(view='RNA',factor=1,
                        ascending=False)

Out[50]:

(<Figure size 240x320 with 1 Axes>,
 <AxesSubplot: title={'center': 'factor_1'}, xlabel='Feature rank', ylabel='Weight'>)

Weights heatmap¶

While trying to annotate factors, a global overview of top features defining them could be helpful.

In [51]:

Copied!

pymofa_obj.plot_top_feature_heatmap(view='RNA')
pymofa_obj.plot_top_feature_heatmap(view='RNA')

ranking genes
    finished: added to `.uns['rank_genes_groups']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:00)
WARNING: dendrogram data not found (using key=dendrogram_Factor). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
WARNING: You’re trying to run this on 2000 dimensions of `.X`, if you really want this, set `use_rep='X'`.
         Falling back to preprocessing with `sc.pp.pca` and default params.
computing PCA
    with n_comps=15
    finished (0:00:00)
Storing dendrogram info using `.uns['dendrogram_Factor']`

Out[51]:

{'mainplot_ax': <AxesSubplot: >,
 'group_extra_ax': <AxesSubplot: >,
 'gene_group_ax': <AxesSubplot: >,
 'color_legend_ax': <AxesSubplot: title={'center': 'Mean expression\nin group'}>}

In [ ]: