Single RNA-seq to Spatial RNA-seq¶

Single2Spatial is used for scRNA-seq transformat to stRNA-seq. We extracted the deep-forest part of the Bulk2Space algorithm and constructed an algorithm that can calculate the spatial RNA-seq from scRNA-seq. In addition, we have redesigned the input and output of the data so that it can be more compatible with the analysis conventions in the Python environment.

Paper: De novo analysis of bulk RNA-seq data at spatially resolved single-cell resolution

Code: https://github.com/ZJUFanLab/bulk2space

Colab_Reproducibility：https://colab.research.google.com/drive/1E3TlGUW2bxEew8JYCLE9qA6saYbIjBnY?usp=sharing

This tutorial walks through how to read, set-up and train the model from scRNA-seq and reference stRNA-seq data. We use the pdac datasets as example

In [ ]:

Copied!





import scanpy as sc
import pandas as pd
import numpy as np
import omicverse as ov
import matplotlib.pyplot as plt

ov.utils.ov_plot_set()
import scanpy as sc
import pandas as pd
import numpy as np
import omicverse as ov
import matplotlib.pyplot as plt

ov.utils.ov_plot_set()

loading data¶

Single2Spatial calcualted the the spot of celltype in spatial RNA-seq from scRNA-seq by deep-forest tree, we need an average normalize, log and scale the expression matrix of scRNA-seq,

We also need an anndata object of stRNA-seq as reference to calculate. The data can be downloaded from here. The processed the PDAC scRNA-seq data and ST data (GSE111672)

In [3]:

Copied!





import anndata
raw_data=pd.read_csv('data/pdac/sc_data.csv', index_col=0)
single_data=anndata.AnnData(raw_data.T)
single_data.obs = pd.read_csv('data/pdac/sc_meta.csv', index_col=0)[['Cell_type']]
single_data
import anndata
raw_data=pd.read_csv('data/pdac/sc_data.csv', index_col=0)
single_data=anndata.AnnData(raw_data.T)
single_data.obs = pd.read_csv('data/pdac/sc_meta.csv', index_col=0)[['Cell_type']]
single_data

Out[3]:

AnnData object with n_obs × n_vars = 1926 × 19104
    obs: 'Cell_type'

In [4]:

Copied!





raw_data=pd.read_csv('data/pdac/st_data.csv', index_col=0)
spatial_data=anndata.AnnData(raw_data.T)
spatial_data.obs = pd.read_csv('data/pdac/st_meta.csv', index_col=0)
spatial_data
raw_data=pd.read_csv('data/pdac/st_data.csv', index_col=0)
spatial_data=anndata.AnnData(raw_data.T)
spatial_data.obs = pd.read_csv('data/pdac/st_meta.csv', index_col=0)
spatial_data

Out[4]:

AnnData object with n_obs × n_vars = 428 × 19104
    obs: 'Spot', 'xcoord', 'ycoord'

set up, training, saving, and loading¶

We can now set up the Single2Spatial object, which will ensure everything the model needs is in place for training. We need to specify the cell type of the scRNA-seq and the spot_key of the stRNA-seq. And specify the number of marker genes for each cell type for training.

In [10]:

Copied!





st_model=ov.bulk2single.Single2Spatial(single_data=single_data,
                                           spatial_data=spatial_data,
                                           celltype_key='Cell_type',
                                           spot_key=['xcoord','ycoord'],
                                           #gpu='mps',
                                          )
st_model=ov.bulk2single.Single2Spatial(single_data=single_data,
                                           spatial_data=spatial_data,
                                           celltype_key='Cell_type',
                                           spot_key=['xcoord','ycoord'],
                                           #gpu='mps',
                                          )

...loading data

Now we can start to train our Single2Spatial model.

In [11]:

Copied!





%%time
sp_adata=st_model.train(spot_num=500,
                cell_num=10,
                df_save_dir='data/pdac/predata_net/save_model',
                df_save_name='pdac_df',
                k=10,num_epochs=1000,batch_size=1000,predicted_size=32)
%%time
sp_adata=st_model.train(spot_num=500,
                cell_num=10,
                df_save_dir='data/pdac/predata_net/save_model',
                df_save_name='pdac_df',
                k=10,num_epochs=1000,batch_size=1000,predicted_size=32)

ranking genes
    finished: added to `.uns['rank_genes_groups']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:03)
sucessfully create positive data
sucessfully create negative data
select top 500 marker genes of each cell type...
ranking genes
    finished: added to `.uns['rank_genes_groups']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:03)
...The model size is 14478
...Net training

Epoch [1000/1000], Loss: 0.5809: 100%|██████| 1000/1000 [05:31<00:00,  3.02it/s]

...Net training done!
...save trained net in data/pdac/predata_net/save_model/pdac_df.pth.
Calculating scores...

Now calculating scores for spot 428/428: 100%|██| 14/14 [01:07<00:00,  4.84s/it]

Calculating scores done.
...save trained net in data/pdac/predata_net/save_model/pdac_df.pth.
CPU times: user 7min 42s, sys: 3min 1s, total: 10min 43s
Wall time: 7min 7s

We can also load our previously trained model directly

In [6]:

Copied!

%%time
sp_adata=st_model.load(modelsize=14478,df_load_dir='data/pdac/predata_net/save_model/pdac_df.pth',
                k=10,predicted_size=32)
%%time
sp_adata=st_model.load(modelsize=14478,df_load_dir='data/pdac/predata_net/save_model/pdac_df.pth',
                k=10,predicted_size=32)

select top 500 marker genes of each cell type...
ranking genes
    finished: added to `.uns['rank_genes_groups']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:03)
Calculating scores...

Now calculating scores for spot 428/428: 100%|██| 14/14 [01:02<00:00,  4.43s/it]

Calculating scores done.
CPU times: user 36.7 s, sys: 51.4 s, total: 1min 28s
Wall time: 1min 13s

Investigate deconvolution results at spot-level resolution¶

We also evaluated the spatial deconvolution results at spot-level resolution. We firstly calculated cell type proportion per spot and then aggregated gene expression of cells per spot.

In [12]:

Copied!

sp_adata_spot=st_model.spot_assess()
sp_adata_spot=st_model.spot_assess()

Plot spatial expression patterns of marker genes with single cell resolution¶

The spatial expression patterns of marker genes were also consistent with the spatial distribution of cell types.

In [13]:

Copied!





sc.pl.embedding(
    sp_adata,
    basis="X_spatial",
    color=['REG1A', 'CLDN1', 'KRT16', 'MUC5B'],
    frameon=False,
    ncols=4,
    #save='_figure1_gene.png',
    show=False,
)
sc.pl.embedding(
    sp_adata,
    basis="X_spatial",
    color=['REG1A', 'CLDN1', 'KRT16', 'MUC5B'],
    frameon=False,
    ncols=4,
    #save='_figure1_gene.png',
    show=False,
)

Out[13]:

[<Axes: title={'center': 'REG1A'}, xlabel='X_spatial1', ylabel='X_spatial2'>,
 <Axes: title={'center': 'CLDN1'}, xlabel='X_spatial1', ylabel='X_spatial2'>,
 <Axes: title={'center': 'KRT16'}, xlabel='X_spatial1', ylabel='X_spatial2'>,
 <Axes: title={'center': 'MUC5B'}, xlabel='X_spatial1', ylabel='X_spatial2'>]

No description has been provided for this image

We also can plot gene in spot level

In [14]:

Copied!





sc.pl.embedding(
    sp_adata_spot,
    basis="X_spatial",
    color=['REG1A', 'CLDN1', 'KRT16', 'MUC5B'],
    frameon=False,
    ncols=4,
    #save='_figure1_gene.png',
    show=False,
)
sc.pl.embedding(
    sp_adata_spot,
    basis="X_spatial",
    color=['REG1A', 'CLDN1', 'KRT16', 'MUC5B'],
    frameon=False,
    ncols=4,
    #save='_figure1_gene.png',
    show=False,
)

Out[14]:

[<Axes: title={'center': 'REG1A'}, xlabel='X_spatial1', ylabel='X_spatial2'>,
 <Axes: title={'center': 'CLDN1'}, xlabel='X_spatial1', ylabel='X_spatial2'>,
 <Axes: title={'center': 'KRT16'}, xlabel='X_spatial1', ylabel='X_spatial2'>,
 <Axes: title={'center': 'MUC5B'}, xlabel='X_spatial1', ylabel='X_spatial2'>]

Plot cell type proportion per spot¶

We can visualize the cell type proportion per spot by sc.pl.embedding

In [15]:

Copied!





sc.pl.embedding(
    sp_adata_spot,
    basis="X_spatial",
    color=['Acinar cells','Cancer clone A','Cancer clone B','Ductal'],
    frameon=False,
    ncols=4,
    show=False,
    #save='_figure1_celltype_spot.png',
)
sc.pl.embedding(
    sp_adata_spot,
    basis="X_spatial",
    color=['Acinar cells','Cancer clone A','Cancer clone B','Ductal'],
    frameon=False,
    ncols=4,
    show=False,
    #save='_figure1_celltype_spot.png',
)

Out[15]:

[<Axes: title={'center': 'Acinar cells'}, xlabel='X_spatial1', ylabel='X_spatial2'>,
 <Axes: title={'center': 'Cancer clone A'}, xlabel='X_spatial1', ylabel='X_spatial2'>,
 <Axes: title={'center': 'Cancer clone B'}, xlabel='X_spatial1', ylabel='X_spatial2'>,
 <Axes: title={'center': 'Ductal'}, xlabel='X_spatial1', ylabel='X_spatial2'>]

In [16]:

Copied!





sc.pl.embedding(
    sp_adata,
    basis="X_spatial",
    color=['Cell_type'],
    frameon=False,
    ncols=4,
    show=False,
    palette=ov.utils.ov_palette()[11:]
)
sc.pl.embedding(
    sp_adata,
    basis="X_spatial",
    color=['Cell_type'],
    frameon=False,
    ncols=4,
    show=False,
    palette=ov.utils.ov_palette()[11:]
)

Out[16]:

<Axes: title={'center': 'Cell_type'}, xlabel='X_spatial1', ylabel='X_spatial2'>