Spatial deconvolution with reference scRNA-seq

This notebook presents a fully reproducible workflow for spot-based spatial deconvolution using the unified API omicverse.space.Deconvolution, integrating Tangram and cell2location. It follows a Nature Protocol–style structure with clear purpose, inputs/outputs, parameters, timing, saving, and troubleshooting.

  • Audience: Bioinformatics practitioners with basic Python/Scanpy familiarity.

  • Data types: 10x Visium or similar spot-based spatial transcriptomics; a matched scRNA-seq reference from the same tissue/region.

  • Outcomes: Spot-level cell-type composition/intensity maps, trained models for reuse, and publication-ready figures.

Inputs and Outputs

  • Inputs:

    • scRNA-seq reference with clear cell-type annotations (gene IDs harmonized, preferably ENSEMBL)

    • Spatial transcriptomics counts (e.g., 10x Space Ranger outputs) from matched tissue/region

  • Outputs:

    • Deconvolution matrices and cell-type spatial intensity/probability maps

    • Saved models/parameters for quick reload and reuse

    • Key figures: spatial heatmaps, multi-target overlays, local pie charts

Workflow Overview with Estimated Timing

  1. Prepare scRNA-seq reference (10–20 min)

  2. Prepare spatial transcriptomics (10–20 min)

  3. Tangram deconvolution: preprocess → fit → save/reuse (15–30 min)

  4. cell2location: reference learning → spatial mapping → save/reuse (30–120 min; GPU faster)

  5. Visualization and export (5–15 min)

Tip: Each step below documents purpose, inputs/outputs, and critical parameters to support reproducibility and adaptation to your data.

import squidpy as sq

import omicverse as ov
#print(f"omicverse version: {ov.__version__}")
import scanpy as sc
#print(f"scanpy version: {sc.__version__}")
ov.plot_set(font_path='Arial')

# Enable auto-reload for development
%load_ext autoreload
%autoreload 2
🔬 Starting plot initialization...
Using already downloaded Arial font from: /tmp/omicverse_arial.ttf
Registered as: Arial
🧬 Detecting GPU devices…
✅ NVIDIA CUDA GPUs detected: 1
    • [CUDA 0] NVIDIA H100 80GB HBM3
      Memory: 79.1 GB | Compute: 9.0

   ____            _     _    __                  
  / __ \____ ___  (_)___| |  / /__  _____________ 
 / / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \ 
/ /_/ / / / / / / / /__ | |/ /  __/ /  (__  )  __/ 
\____/_/ /_/ /_/_/\___/ |___/\___/_/  /____/\___/                                              

🔖 Version: 1.7.8rc1   📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.

Step 1: Prepare the scRNA-seq reference (1 min)

Purpose: load and standardize the single-cell reference ensuring consistent cell-type annotations and harmonized gene IDs (prefer ENSEMBL).

  • Inputs: public lymph node/spleen/tonsil scRNA-seq or your own dataset

  • Outputs: AnnData object (adata_ref) with normalized variable names and cell-type annotations

  • Key points:

    • The reference should cover major cell types/states expected in the spatial sample.

    • Harmonize gene IDs with the spatial data (ENSEMBL or symbols) to avoid failed mappings.

Data Link: https://cell2location.cog.sanger.ac.uk/paper/integrated_lymphoid_organ_scrna/RegressionNBV4Torch_57covariates_73260cells_10237genes/sc.h5ad

adata_sc = ov.datasets.sc_ref_Lymph_Node()
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(3,3))
ov.utils.embedding(
    adata_sc,
    basis="X_umap",
    color=['Subset'],
    title='Subset',
    frameon='small',
    #ncols=1,
    wspace=0.65,
    #palette=ov.utils.pyomic_palette()[11:],
    show=False,
    ax=ax
)
<Axes: title={'center': 'Subset'}, xlabel='X_umap1', ylabel='X_umap2'>
../_images/02b753ab04992dfbd6352bd3fdd2ef69ef21ad60188be852d903c180fd427e26.png

Step 2: Prepare spatial transcriptomics (1 min)

Purpose: load 10x Visium (Space Ranger outputs) or similar to obtain a coordinate-aware spatial AnnData (adata_sp).

  • Inputs: Visium count matrix and spatial coordinates (from the spatial folder)

  • Outputs: AnnData object (adata_sp) with spot coordinates and counts

  • Key points:

    • Ensure maximal gene overlap with the scRNA-seq reference; map gene IDs if necessary.

    • For multiple samples, keep batch labels explicit to support merging and visualization.

adata_sp = sc.datasets.visium_sge(sample_id="V1_Human_Lymph_Node")
adata_sp.obs['sample'] = list(adata_sp.uns['spatial'].keys())[0]
adata_sp.var_names_make_unique()
reading /scratch/users/steorra/analysis/omic_test/data/V1_Human_Lymph_Node/filtered_feature_bc_matrix.h5
 (0:00:00)

Step 3: Tangram deconvolution (15–30 min)

Tangram maps scRNA-seq expression into spatial coordinates to infer cell-type distribution and proportions. We use omicverse.space.Deconvolution for a consistent interface.

decov_obj=ov.space.Deconvolution(
    adata_sc=adata_sc,
    adata_sp=adata_sp
)

Step 3.1 Tangram preprocessing

Purpose: prepare scRNA-seq and spatial data with necessary filtering/transformations to enable robust fitting (prefer raw counts).

decov_obj.preprocess_sc(
    mode='shiftlog|pearson',n_HVGs=3000,target_sum=1e4,
)
decov_obj.preprocess_sp(
    mode='pearsonr',n_svgs=3000,target_sum=1e4,
)
🔍 [2025-09-20 03:25:56] Running preprocessing in 'cpu' mode...
Begin robust gene identification
    After filtration, 10237/10237 genes are kept.
    Among 10237 genes, 9838 genes are robust.
✅ Robust gene identification completed successfully.
Begin size normalization: shiftlog and HVGs selection pearson

🔍 Count Normalization:
   Target sum: 10000.0
   Exclude highly expressed: True
   Max fraction threshold: 0.2
   ⚠️ Excluding 17 highly-expressed genes from normalization computation

✅ Count Normalization Completed Successfully!
   ✓ Processed: 73,260 cells × 9,838 genes
   ✓ Runtime: 2.70s

🔍 Highly Variable Genes Selection (Experimental):
   Method: pearson_residuals
   Target genes: 3,000
   Theta (overdispersion): 100

✅ Experimental HVG Selection Completed Successfully!
   ✓ Selected: 3,000 highly variable genes out of 9,838 total (30.5%)
   ✓ Results added to AnnData object:
     • 'highly_variable': Boolean vector (adata.var)
     • 'highly_variable_rank': Float vector (adata.var)
     • 'highly_variable_nbatches': Int vector (adata.var)
     • 'highly_variable_intersection': Boolean vector (adata.var)
     • 'means': Float vector (adata.var)
     • 'variances': Float vector (adata.var)
     • 'residual_variances': Float vector (adata.var)
    Time to analyze data in cpu: 10.15 seconds.
✅ Preprocessing completed successfully.
    Added:
        'highly_variable_features', boolean vector (adata.var)
        'means', float vector (adata.var)
        'variances', float vector (adata.var)
        'residual_variances', float vector (adata.var)
        'counts', raw counts layer (adata.layers)
    End of size normalization: shiftlog and HVGs selection pearson
✓ scRNA-seq data is preprocessed
🔍 [2025-09-20 03:26:10] Running preprocessing in 'cpu' mode...
Begin robust gene identification
    After filtration, 25187/36601 genes are kept.
    Among 25187 genes, 22411 genes are robust.
✅ Robust gene identification completed successfully.
Begin size normalization: shiftlog and HVGs selection pearson

🔍 Count Normalization:
   Target sum: 10000.0
   Exclude highly expressed: True
   Max fraction threshold: 0.2
   ⚠️ Excluding 1 highly-expressed genes from normalization computation
   Excluded genes: ['IGKC']

✅ Count Normalization Completed Successfully!
   ✓ Processed: 4,035 cells × 22,411 genes
   ✓ Runtime: 0.44s

🔍 Highly Variable Genes Selection (Experimental):
   Method: pearson_residuals
   Target genes: 3,000
   Theta (overdispersion): 100

✅ Experimental HVG Selection Completed Successfully!
   ✓ Selected: 3,000 highly variable genes out of 22,411 total (13.4%)
   ✓ Results added to AnnData object:
     • 'highly_variable': Boolean vector (adata.var)
     • 'highly_variable_rank': Float vector (adata.var)
     • 'highly_variable_nbatches': Int vector (adata.var)
     • 'highly_variable_intersection': Boolean vector (adata.var)
     • 'means': Float vector (adata.var)
     • 'variances': Float vector (adata.var)
     • 'residual_variances': Float vector (adata.var)
    Time to analyze data in cpu: 2.04 seconds.
✅ Preprocessing completed successfully.
    Added:
        'highly_variable_features', boolean vector (adata.var)
        'means', float vector (adata.var)
        'variances', float vector (adata.var)
        'residual_variances', float vector (adata.var)
        'counts', raw counts layer (adata.layers)
    End of size normalization: shiftlog and HVGs selection pearson
✓ spatial transcriptomics data is preprocessed

Step 3.2 Run Tangram deconvolution (see above for I/O and parameters)

decov_obj.deconvolution(
    method='Tangram',celltype_key_sc='Subset',
    tangram_kwargs={'mode':'cells','num_epochs':500,'device':'cuda:0'}
)
tangram have been install version: 1.0.4
ranking genes
    finished: added to `.uns['Subset_rank_genes_groups']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:04)
...Calculate The Number of Markers: 1290
INFO:root:832 training genes are saved in `uns``training_genes` of both single cell and spatial Anndatas.
INFO:root:1291 overlapped genes are saved in `uns``overlap_genes` of both single cell and spatial Anndatas.
INFO:root:uniform based density prior is calculated and saved in `obs``uniform_density` of the spatial Anndata.
INFO:root:rna count based density prior is calculated and saved in `obs``rna_count_based_density` of the spatial Anndata.
...Model prepared successfully
INFO:root:Allocate tensors for mapping.
INFO:root:Begin training with 832 genes and rna_count_based density_prior in cells mode...
INFO:root:Printing scores every 100 epochs.
Score: 0.814, KL reg: 0.002
Score: 0.911, KL reg: 0.000
Score: 0.929, KL reg: 0.000
Score: 0.934, KL reg: 0.000
Score: 0.935, KL reg: 0.000
INFO:root:Saving results..
AnnData object with n_obs × n_vars = 73260 × 4035
    obs: 'Age', 'BCELL_CLONE', 'BCELL_CLONE_SIZE', 'Donor', 'ID', 'IGH_MU_FREQ', 'ISOTYPE', 'LibraryID', 'Method', 'Population', 'PrelimCellType', 'Sample', 'Sex', 'Study', 'Tissue', 'barcode', 'batch', 'doublet_score', 'index', 'predicted_doublet', 'percent_mito', 'n_counts', 'n_genes', 'S_score', 'G2M_score', 'phase', 'VDJsum', 'cell_cycle_diff', 'PrelimCellType_new', 'leiden', 'leiden_1', 'leiden_2', 'leiden_3', 'leiden_4', 'CellType', 'CellType2', 'Subset', 'Subset_Broad', 'Subset_all', 'new_celltype', 'Subset_int', 'Subset_print'
    var: 'in_tissue', 'array_row', 'array_col', 'sample', 'uniform_density', 'rna_count_based_density'
    uns: 'train_genes_df', 'training_history'
INFO:root:spatial prediction dataframe is saved in `obsm` `tangram_ct_pred` of the spatial AnnData.
...Model train successfully
✓ Tangram cell2location is done
The cell2location result is saved in self.adata_cell2location
<omicverse.space._tangram.Tangram at 0x7f8bad6cbca0>

Step 3.3 Save Tangram model and results (<1 min)

Recommended artifacts to save:

  • Fitted mapping/probability matrices

  • Core hyperparameters and software versions (for reproducibility)

  • Result AnnData/DataFrame for downstream plotting

ov.utils.save(
    decov_obj.tangram_model,
    'result/model/tangram_model.pkl'
)
💾 Save Operation:
   Target path: result/model/tangram_model.pkl
   Object type: Tangram
   Using: pickle
   Pickle failed, switching to: cloudpickle
   ✅ Successfully saved using cloudpickle!
────────────────────────────────────────────────────────────
decov_obj.adata_sc.write(f"result/model/tangram_adata_sc.h5ad")
decov_obj.adata_sp.write(f"result/model/tangram_adata_sp.h5ad")

Step 3.4 Result object: cell locations

In omicverse, cell location/intensity outputs are accessible via the decov_obj container (e.g., decov_obj.adata_cell2location or related fields). Check the actual attributes in your session for where Tangram exports are stored.

decov_obj.adata_cell2location
AnnData object with n_obs × n_vars = 4035 × 3000
    obs: 'in_tissue', 'array_row', 'array_col', 'sample', 'uniform_density', 'rna_count_based_density', 'T_CD4+_naive', 'DC_cDC1', 'T_TIM3+', 'Macrophages_M2', 'T_TfR', 'T_CD8+_naive', 'Endo', 'B_activated', 'T_CD8+_CD161+', 'T_CD4+_TfH', 'FDC', 'T_Treg', 'B_plasma', 'T_CD4+', 'B_GC_LZ', 'T_CD8+_cytotoxic', 'B_GC_DZ', 'VSMC', 'B_IFN', 'B_preGC', 'B_naive', 'Macrophages_M1', 'DC_CCR7+', 'B_mem', 'ILC', 'DC_cDC2', 'B_GC_prePB', 'Mast', 'DC_pDC', 'NKT', 'Monocytes', 'B_Cycling', 'T_CD4+_TfH_GC', 'NK'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'percent_cells', 'robust', 'means', 'variances', 'residual_variances', 'highly_variable_rank', 'highly_variable_features', 'space_variable_features', 'highly_variable', 'sparsity'
    uns: 'spatial', 'log1p', 'hvg', 'status', 'status_args', 'REFERENCE_MANU', 'training_genes', 'overlap_genes'
    obsm: 'spatial', 'tangram_ct_pred'
    layers: 'counts'

Step 3.5 Reuse: reload a saved Tangram model and impute

Purpose: skip retraining by loading a previously saved model/matrix and performing inference on the same or similar data.

decov_obj=ov.space.Deconvolution(
    adata_sc=ov.read(f"result/model/tangram_adata_sc.h5ad"),
    adata_sp=ov.read(f"result/model/tangram_adata_sp.h5ad")
)
decov_obj.load_tangram_model(
    'result/model/tangram_model.pkl'
)
decov_obj.tangram_inference()
decov_obj.impute(method='Tangram')
decov_obj.adata_impute
✓ Existing 'counts' layer in scRNA-seq data
✓ Existing 'counts' layer in spatial transcriptomics data
✓ spatial transcriptomics data is log-normalized by 1e4
⚠️ 1e4 is the standardized target sum for `scanpy`
✓ scRNA-seq data is log-normalized by 50*1e4
✓ spatial transcriptomics data is log-normalized by 50*1e4
⚠️ 50*1e4 is the standardized target sum for `omicverse`
📂 Load Operation:
   Source path: result/model/tangram_model.pkl
   Using: pickle
   ✅ Successfully loaded!
   Loaded object type: Tangram
────────────────────────────────────────────────────────────
✓ Tangram model is loaded
The Tangram model is saved in self.tangram
✓ Tangram is done
The Tangram result is saved in self.adata_cell2location
✓ Tangram impute is done
The impute result is saved in self.adata_impute
AnnData object with n_obs × n_vars = 4035 × 3000
    obs: 'in_tissue', 'array_row', 'array_col', 'sample', 'uniform_density', 'rna_count_based_density'
    var: 'GeneID-2', 'GeneName-2', 'feature_types', 'feature_types-0', 'feature_types-1', 'gene_ids-1', 'gene_ids-4861STDY7135913-0', 'gene_ids-4861STDY7135914-0', 'gene_ids-4861STDY7208412-0', 'gene_ids-4861STDY7208413-0', 'gene_ids-Human_colon_16S7255677-0', 'gene_ids-Human_colon_16S7255678-0', 'gene_ids-Human_colon_16S8000484-0', 'gene_ids-Pan_T7935494-0', 'genome-1', 'n_cells', 'nonz_mean', 'mean_cov_effect_Subset_B_Cycling', 'mean_cov_effect_Subset_B_GC_DZ', 'mean_cov_effect_Subset_B_GC_LZ', 'mean_cov_effect_Subset_B_GC_prePB', 'mean_cov_effect_Subset_B_IFN', 'mean_cov_effect_Subset_B_activated', 'mean_cov_effect_Subset_B_mem', 'mean_cov_effect_Subset_B_naive', 'mean_cov_effect_Subset_B_plasma', 'mean_cov_effect_Subset_B_preGC', 'mean_cov_effect_Subset_DC_CCR7+', 'mean_cov_effect_Subset_DC_cDC1', 'mean_cov_effect_Subset_DC_cDC2', 'mean_cov_effect_Subset_DC_pDC', 'mean_cov_effect_Subset_Endo', 'mean_cov_effect_Subset_FDC', 'mean_cov_effect_Subset_ILC', 'mean_cov_effect_Subset_Macrophages_M1', 'mean_cov_effect_Subset_Macrophages_M2', 'mean_cov_effect_Subset_Mast', 'mean_cov_effect_Subset_Monocytes', 'mean_cov_effect_Subset_NK', 'mean_cov_effect_Subset_NKT', 'mean_cov_effect_Subset_T_CD4+', 'mean_cov_effect_Subset_T_CD4+_TfH', 'mean_cov_effect_Subset_T_CD4+_TfH_GC', 'mean_cov_effect_Subset_T_CD4+_naive', 'mean_cov_effect_Subset_T_CD8+_CD161+', 'mean_cov_effect_Subset_T_CD8+_cytotoxic', 'mean_cov_effect_Subset_T_CD8+_naive', 'mean_cov_effect_Subset_T_TIM3+', 'mean_cov_effect_Subset_T_TfR', 'mean_cov_effect_Subset_T_Treg', 'mean_cov_effect_Subset_VSMC', 'mean_sample_effectSample_4861STDY7135913', 'mean_sample_effectSample_4861STDY7135914', 'mean_sample_effectSample_4861STDY7208412', 'mean_sample_effectSample_4861STDY7208413', 'mean_sample_effectSample_4861STDY7462253', 'mean_sample_effectSample_4861STDY7462254', 'mean_sample_effectSample_4861STDY7462255', 'mean_sample_effectSample_4861STDY7462256', 'mean_sample_effectSample_4861STDY7528597', 'mean_sample_effectSample_4861STDY7528598', 'mean_sample_effectSample_4861STDY7528599', 'mean_sample_effectSample_4861STDY7528600', 'mean_sample_effectSample_BCP002_Total', 'mean_sample_effectSample_BCP003_Total', 'mean_sample_effectSample_BCP004_Total', 'mean_sample_effectSample_BCP005_Total', 'mean_sample_effectSample_BCP006_Total', 'mean_sample_effectSample_BCP008_Total', 'mean_sample_effectSample_BCP009_Total', 'mean_sample_effectSample_Human_colon_16S7255677', 'mean_sample_effectSample_Human_colon_16S7255678', 'mean_sample_effectSample_Human_colon_16S8000484', 'mean_sample_effectSample_Pan_T7935494', 'percent_cells', 'robust', 'means', 'variances', 'residual_variances', 'highly_variable_rank', 'highly_variable_features', 'sparsity', 'is_training'
    uns: 'Age_colors', 'Donor_colors', 'LibraryID_colors', 'Method_colors', 'Study_colors', 'Subset_Broad_colors', 'Subset_colors', 'Tissue_colors', 'leiden', 'neighbors', 'pca', 'phase_colors', 'regression_mod', 'umap', 'log1p', 'hvg', 'status', 'status_args', 'REFERENCE_MANU', 'Subset_rank_genes_groups', 'training_genes', 'overlap_genes'
decov_obj.adata_impute.uns=decov_obj.adata_sp.uns.copy()
decov_obj.adata_impute.obsm=decov_obj.adata_sp.obsm.copy()
#fig = ov.plt.figure(figsize=(4, 4))
fig, axes = ov.plt.subplots(1,2,figsize=(8, 4))
sc.pl.spatial(
    decov_obj.adata_sp, 
    cmap='magma',
    color='MS4A1',
    ncols=4, size=1.3,ax=axes[0],
    img_key='hires',show=False,
)
axes[0].set_title('Raw: MS4A1')

sc.pl.spatial(
    decov_obj.adata_impute, 
    cmap='magma',
    color='ms4a1',
    ncols=4, size=1.3,ax=axes[1],
    img_key='hires',show=False,
)
axes[1].set_title('Impute: MS4A1')
Text(0.5, 1.0, 'Impute: MS4A1')
../_images/4e21bd6337e18baaa6011bcb06b187a76c294b81849b8961e7f35e24c39b65c0.png

Step 5: Visualization (5–15 min)

We provide multiple views: single-target spatial heatmaps, multi-target overlays, and local pie charts. Start with global inspection, then zoom into biologically relevant regions for higher-resolution assessment.

5.1 Spatial value dotplot

5.1.1 Tangram

annotation_list=['B_Cycling', 'B_GC_LZ', 'T_CD4+_TfH_GC', 'FDC',
                'B_naive', 'T_CD4+_naive', 'B_plasma', 'Endo']
sc.pl.spatial(
    decov_obj.adata_cell2location, 
    cmap='magma',
    # show first 8 cell types
    color=annotation_list,
    ncols=4, size=1.3,
    img_key='hires',
    # limit color scale at 99.2% quantile of cell abundance
    #vmin=0, vmax='p99.2'
)

5.1.2 cell2location

annotation_list=['B_Cycling', 'B_GC_LZ', 'T_CD4+_TfH_GC', 'FDC',
                'B_naive', 'T_CD4+_naive', 'B_plasma', 'Endo']
sc.pl.spatial(
    decov_obj.adata_cell2location, 
    cmap='magma',
    # show first 8 cell types
    color=annotation_list,
    ncols=4, size=1.3,
    img_key='hires',
    # limit color scale at 99.2% quantile of cell abundance
    #vmin=0, vmax='p99.2'
)

5.2 Multi-target overlay

color_dict=dict(zip(adata_sc.obs['Subset'].cat.categories,
                   adata_sc.uns['Subset_colors']))

5.2.1 Tangram

import matplotlib as mpl
clust_labels = annotation_list[:5]
clust_col = ['' + str(i) for i in clust_labels] # in case column names differ from labels

with mpl.rc_context({'figure.figsize': (6, 6),'axes.grid': False}):
    fig = ov.pl.plot_spatial(
        adata=decov_obj.adata_cell2location,
        # labels to show on a plot
        color=clust_col, labels=clust_labels,
        show_img=True,
        # 'fast' (white background) or 'dark_background'
        style='fast',
        # limit color scale at 99.2% quantile of cell abundance
        max_color_quantile=0.992,
        # size of locations (adjust depending on figure size)
        circle_diameter=4,
        reorder_cmap = [#0,
            1,2,3,4,6], #['yellow', 'orange', 'blue', 'green', 'purple', 'grey', 'white'],
        colorbar_position='right',
        palette=color_dict
    )
    

5.2.2 cell2location

import matplotlib as mpl
clust_labels = annotation_list[:5]
clust_col = ['' + str(i) for i in clust_labels] # in case column names differ from labels

with mpl.rc_context({'figure.figsize': (6, 6),'axes.grid': False}):
    fig = ov.pl.plot_spatial(
        adata=decov_obj.adata_cell2location,
        # labels to show on a plot
        color=clust_col, labels=clust_labels,
        show_img=True,
        # 'fast' (white background) or 'dark_background'
        style='fast',
        # limit color scale at 99.2% quantile of cell abundance
        max_color_quantile=0.992,
        # size of locations (adjust depending on figure size)
        circle_diameter=4,
        reorder_cmap = [#0,
            1,2,3,4,6], #['yellow', 'orange', 'blue', 'green', 'purple', 'grey', 'white'],
        colorbar_position='right',
        palette=color_dict
    )
    

5.3 Pie plot

We recommend cropping a region of interest before plotting to avoid overly dense pie charts on whole slides.

adata_s = ov.space.crop_space_visium(
    decov_obj.adata_cell2location, 
    crop_loc=(0, 0),      
    crop_area=(500, 1000), 
    library_id=list(decov_obj.adata_cell2location.uns['spatial'].keys())[0] , 
    scale=1
)
Adding image layer `image`
sc.pl.spatial(adata_s, cmap='magma',
                  # show first 8 cell types
                  color=annotation_list[0],
                  ncols=4, size=1.3,
                  img_key='hires',
                  # limit color scale at 99.2% quantile of cell abundance
                  #vmin=0, vmax='p99.2'
                 )
color_dict=dict(zip(adata_sc.obs['Subset'].cat.categories,
                   adata_sc.uns['Subset_colors']))

5.3.1 Tangram

fig, ax = plt.subplots(figsize=(8, 4))
sc.pl.spatial(
    adata_s, 
    basis='spatial',
    color=None,  
    size=1.3,
    img_key='hires',
    ax=ax,      
    show=False
)

ov.pl.add_pie2spatial(
    adata_s,
    img_key='hires',
    cell_type_columns=annotation_list[:],
    ax=ax,
    colors=color_dict,
    pie_radius=10,
    remainder='gap',
    legend_loc=(0.5, -0.25),
    ncols=4,
    alpha=0.8
)

plt.show()

5.3.2 cell2location

fig, ax = plt.subplots(figsize=(8, 4))
sc.pl.spatial(
    adata_s, 
    basis='spatial',
    color=None,  
    size=1.3,
    img_key='hires',
    ax=ax,      
    show=False
)

ov.pl.add_pie2spatial(
    adata_s,
    img_key='hires',
    cell_type_columns=annotation_list[:],
    ax=ax,
    colors=color_dict,
    pie_radius=10,
    remainder='gap',
    legend_loc=(0.5, -0.25),
    ncols=4,
    alpha=0.8
)

plt.show()

Extensions and Further Reading

from omicverse.external.space.cell2location.models import Cell2location, RegressionModel
from omicverse.external.space.cell2location.plt import plot_spatial
from omicverse.external.space.cell2location.utils import select_slide
from omicverse.external.space.cell2location.utils.filtering import filter_genes

Citations and Acknowledgements

Please cite:

  • OmicVerse toolkit (this notebook’s implementation)

  • Tangram: original publication and software

  • cell2location: original publication and software

  • The datasets used (scRNA-seq reference and spatial transcriptomics)

We thank the original tool authors and dataset providers for making their resources available to the community.

import tangram as tg

Troubleshooting

  • Gene ID mismatch:

    • Symptom: many NaNs or empty outputs; very few overlapping genes.

    • Fix: harmonize gene IDs between scRNA-seq and spatial data (ENSEMBL/symbols), drop non-overlapping genes and log counts.

  • Reference coverage insufficient:

    • Symptom: expected cell types missing in known tissue regions.

    • Fix: augment the scRNA-seq reference with tissue/age/pathology-matched data; integrate multiple sources and correct batch effects.

  • Hyperparameters:

    • Tangram: pay attention to regularization and gene selection; small grid search can help.

    • cell2location: prefer GPU; adjust training epochs/priors to dataset size; monitor convergence diagnostics.

  • Reproducibility:

    • Fix random seeds and package versions; save models and key intermediate artifacts; record environment details at the top of the notebook.