Multi omics analysis by MOFA¶

MOFA is a factor analysis model that provides a general framework for the integration of multi-omic data sets in an unsupervised fashion.

This tutorial focuses on how to perform mofa in multi-omics like scRNA-seq and scATAC-seq

Paper: MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data

Code: https://github.com/bioFAM/mofapy2

Colab_Reproducibility：https://colab.research.google.com/drive/1UPGQA3BenrC-eLIGVtdKVftSnOKIwNrP?usp=sharing

Part.1 MOFA Model¶

In this part, we construct a model of mofa by scRNA-seq and scATAC-seq

import omicverse as ov
rna=ov.utils.read('data/sample/rna_p_n_raw.h5ad')
atac=ov.utils.read('data/sample/atac_p_n_raw.h5ad')

/Users/fernandozeng/miniforge3/envs/scbasset/lib/python3.8/site-packages/phate/__init__.py

rna,atac

(AnnData object with n_obs × n_vars = 1876 × 25596
     obs: 'Type', 'sample',
 AnnData object with n_obs × n_vars = 1876 × 71272
     obs: 'Type', 'sample')

We only need to add anndata to ov.single.mofa to construct the base model

test_mofa=ov.single.pyMOFA(omics=[rna,atac],
                             omics_name=['RNA','ATAC'])

test_mofa.mofa_preprocess()
test_mofa.mofa_run(outfile='models/brac_rna_atac.hdf5')

        #########################################################
        ###           __  __  ____  ______                    ### 
        ###          |  \/  |/ __ \|  ____/\    _             ### 
        ###          | \  / | |  | | |__ /  \ _| |_           ### 
        ###          | |\/| | |  | |  __/ /\ \_   _|          ###
        ###          | |  | | |__| | | / ____ \|_|            ###
        ###          |_|  |_|\____/|_|/_/    \_\              ###
        ###                                                   ### 
        ######################################################### 
       
 
        
Groups names not provided, using default naming convention:
- group1, group2, ..., groupG

Successfully loaded view='RNA' group='group0' with N=1876 samples and D=25596 features...
Successfully loaded view='ATAC' group='group0' with N=1876 samples and D=71272 features...


Warning: 8795 features(s) in view 0 have zero variance, consider removing them before training the model...

Warning: 34 features(s) in view 1 have zero variance, consider removing them before training the model...

Model options:
- Automatic Relevance Determination prior on the factors: True
- Automatic Relevance Determination prior on the weights: True
- Spike-and-slab prior on the factors: False
- Spike-and-slab prior on the weights: True
Likelihoods:
- View 0 (RNA): gaussian
- View 1 (ATAC): gaussian



GPU mode is activated, but GPU not found... switching to CPU mode
For GPU mode, you need:
1 - Make sure that you are running MOFA+ on a machine with an NVIDIA GPU
2 - Install CUPY following instructions on https://docs-cupy.chainer.org/en/stable/install.html



######################################
## Training the model with seed 112 ##
######################################


ELBO before training: -2415164577.49 

Iteration 1: time=55.65, ELBO=-21568895.23, deltaELBO=2393595682.263 (99.10693890%), Factors=19
Iteration 2: time=55.70, ELBO=53947093.49, deltaELBO=75515988.721 (3.12674297%), Factors=18
Iteration 3: time=55.13, ELBO=55272104.20, deltaELBO=1325010.708 (0.05486213%), Factors=17
Iteration 4: time=50.87, ELBO=55846669.26, deltaELBO=574565.064 (0.02378989%), Factors=16
Iteration 5: time=50.76, ELBO=56054695.49, deltaELBO=208026.230 (0.00861334%), Factors=16
Iteration 6: time=46.63, ELBO=56193608.97, deltaELBO=138913.480 (0.00575172%), Factors=16

Part.2 MOFA Analysis¶

After get the model by mofa, we need to analysis the factor about different omics, we provide some method to do this

load data¶

import omicverse as ov
ov.utils.ov_plot_set()

/Users/fernandozeng/miniforge3/envs/scbasset/lib/python3.8/site-packages/phate/__init__.py

rna=ov.utils.read('data/sample/rna_test.h5ad')

add value of factor to anndata¶

rna=ov.single.factor_exact(rna,hdf5_path='data/sample/MOFA_POS.hdf5')
rna

AnnData object with n_obs × n_vars = 1792 × 3000
    obs: 'Type', 'sample', 'cell_type', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'factor1', 'factor2', 'factor3', 'factor4', 'factor5', 'factor6', 'factor7', 'factor8', 'factor9', 'factor10', 'factor11'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'hvg'

analysis of the correlation between factor and cell type¶

ov.single.factor_correlation(adata=rna,cluster='cell_type',factor_list=[1,2,3,4,5])

	factor1	factor2	factor3	factor4	factor5
B cell-2	9.078459	5.625337	8.424565	7.768767	0.268084
B cell-1	0.489567	3.898347	0.115866	0.854649	0.686837
B cell-4	0.279002	3.701674	0.487245	0.631280	0.205712
Natural killer cell	1.127901	27.896554	0.235219	0.241705	0.282893
B cell-3	0.706249	40.881023	2.267589	0.382177	5.123728
T cell-1	0.704842	3.701813	0.464042	0.396243	0.459828
Endothelial cell	NaN	NaN	NaN	NaN	NaN
Monocyte	0.248902	0.284968	0.147168	0.399443	0.132685
Epithelial cell	0.435818	2.957245	0.183706	0.366285	0.225828
Natural killer T (NKT) cell-1	0.119266	0.276104	0.111210	0.030626	0.097991
Plasmacytoid dendritic cell	0.019003	0.052997	0.431551	0.024616	0.067029
Regulatory T (Treg) cell	0.553421	7.303644	0.323052	0.252761	0.187171
B cell	1.096582	12.407881	0.154599	0.827906	0.495789
Natural killer T (NKT) cell	0.039153	0.765500	0.243345	0.035279	0.093851
T cell	1.763401	32.202576	0.854274	0.870854	0.627218

Get the gene/feature weights of different factor¶

ov.single.get_weights(hdf5_path='data/sample/MOFA_POS.hdf5',view='RNA',factor=1)

	feature	weights	abs_weights	sig
0	b'FAM174B'	-2.911107e-10	2.911107e-10	-
1	b'SYNE2'	-3.744153e-09	3.744153e-09	-
2	b'LUZP1'	-1.840838e-10	1.840838e-10	-
3	b'GGT1'	-3.171818e-10	3.171818e-10	-
4	b'LINC02210'	-4.588388e-10	4.588388e-10	-
...	...	...	...	...
2995	b'TREM2'	4.046089e-11	4.046089e-11	+
2996	b'CPVL'	1.353563e-10	1.353563e-10	+
2997	b'MIPEP'	-1.360450e-11	1.360450e-11	-
2998	b'INO80E'	-4.721143e-09	4.721143e-09	-
2999	b'RABAC1'	2.386149e-03	2.386149e-03	+

3000 rows × 4 columns

Part.3 MOFA Visualize¶

To visualize the result of mofa, we provide a series of function to do this.

pymofa_obj=ov.single.pyMOFAART(model_path='data/sample/MOFA_POS.hdf5')

We get the factor of each cell at first

pymofa_obj.get_factors(rna)
rna

......Add factors to adata and store to adata.obsm["X_mofa"]

AnnData object with n_obs × n_vars = 1792 × 3000
    obs: 'Type', 'sample', 'cell_type', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'factor1', 'factor2', 'factor3', 'factor4', 'factor5', 'factor6', 'factor7', 'factor8', 'factor9', 'factor10', 'factor11'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'hvg'
    obsm: 'X_mofa'

We can also plot the varience in each View

pymofa_obj.plot_r2()

(<Figure size 160x240 with 2 Axes>,
 <Axes: title={'center': 'Varience'}, xlabel='View', ylabel='Factor'>)

../_images/eab62365a8b0a6f9ccbb0f3dfadaf8c825f481fbb06903fa33264e12c99a54fe.png

pymofa_obj.get_r2()

	RNA	Enhancer	Promoter
0	52.653359	0.020433	0.030287
1	-0.427822	11.423973	10.044684
2	7.485323	0.042972	0.044618
3	5.582933	0.029609	0.040193
4	2.447518	0.084279	0.102553
5	1.264806	0.379936	0.315503
6	0.142767	0.743871	0.637848
7	0.066169	0.717554	0.653633
8	0.061457	0.451830	0.363793
9	0.322288	0.275757	0.240563
10	0.132726	0.044343	0.059752

Visualize the correlation between factor and celltype¶

pymofa_obj.plot_cor(rna,'cell_type')

(<Figure size 480x240 with 2 Axes>,
 <Axes: title={'center': 'Correlation'}, xlabel='Factor', ylabel='cell_type'>)

../_images/a1aa96a95a97f0cc4f2b9c950acf24248b1a3de3f644d189f9fcde718cfb4e3b.png

We found that factor6 is correlated to Epithelial

pymofa_obj.plot_factor(rna,'cell_type','Epi',figsize=(3,3),
                    factor1=6,factor2=10,)

../_images/44bfbcd52fbefe09a4ede9537f213ac5eb2bf5650b0adbcc4f8ec740c5105ebe.png

(<Figure size 240x240 with 1 Axes>,
 <Axes: title={'center': 'Epi'}, xlabel='X_mofa6', ylabel='X_mofa10'>)

import scanpy as sc
sc.pp.neighbors(rna)
sc.tl.umap(rna)
sc.pl.embedding(
    rna,
    basis="X_umap",
    color=["factor6","cell_type"],
    frameon=False,
    ncols=2,
    #palette=ov.utils.pyomic_palette(),
    show=False,
    cmap='Greens',
    vmin=0,
)
#plt.savefig("figures/umap_factor6.png",dpi=300,bbox_inches = 'tight')

computing neighbors
    using 'X_pca' with n_pcs = 50
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:00)
computing UMAP
    finished: added
    'X_umap', UMAP coordinates (adata.obsm) (0:00:01)

[<Axes: title={'center': 'factor6'}, xlabel='X_umap1', ylabel='X_umap2'>,
 <Axes: title={'center': 'cell_type'}, xlabel='X_umap1', ylabel='X_umap2'>]

../_images/a18a53356f4329974a717314c19c433b5e54bb3408e255344209bb984ff2baca.png

pymofa_obj.plot_weight_gene_d1(view='RNA',factor1=6,factor2=10,)

(<Figure size 240x240 with 1 Axes>,
 <Axes: xlabel='factor_6', ylabel='factor_10'>)

../_images/edc799ec726e1e53f6c77d383eeddb7fc848d42b62488887fbb7b9c3d69d40a1.png

pymofa_obj.plot_weights(view='RNA',factor=6,color='#5de25d',
                        ascending=True)

(<Figure size 240x320 with 1 Axes>,
 <Axes: title={'center': 'factor_6'}, xlabel='Feature rank', ylabel='Weight'>)

../_images/705f3c430ce2fe27cd96714db2554943f88653f135da8eb54afabad0b3118105.png

pymofa_obj.plot_top_feature_heatmap(view='RNA')

ranking genes
    finished: added to `.uns['rank_genes_groups']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:00)
WARNING: dendrogram data not found (using key=dendrogram_Factor). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
WARNING: You’re trying to run this on 3000 dimensions of `.X`, if you really want this, set `use_rep='X'`.
         Falling back to preprocessing with `sc.pp.pca` and default params.
computing PCA
    with n_comps=21
    finished (0:00:00)
Storing dendrogram info using `.uns['dendrogram_Factor']`

{'mainplot_ax': <Axes: >,
 'group_extra_ax': <Axes: >,
 'gene_group_ax': <Axes: >,
 'color_legend_ax': <Axes: title={'center': 'Mean expression\nin group'}>}

../_images/de51eadc7eb82d565e99fdfb480d382748808fa0b671dc21eae9438885f7fc11.png