Nicheformer — Foundation Model Tutorial

Nicheformer — Niche-aware spatial transformer, jointly models spatial coordinates and gene expression

Property

Value

Tasks

embed, integrate, spatial

Species

human, mouse

Gene IDs

symbol

GPU Required

Yes

Min VRAM

16 GB

Embedding Dim

512

Repository

https://github.com/theislab/nicheformer

Note: Nicheformer is spatial-aware and works best with spatial transcriptomics data (Visium, MERFISH, Slide-seq). It can also be used on dissociated scRNA-seq, but its key strength is modeling spatial context.

This tutorial demonstrates how to use Nicheformer through the unified ov.fm API.

Cite: Zeng, Z. et al. (2024). OmicVerse: a framework for bridging and deepening insights across bulk and single-cell sequencing. Nature Communications, 15(1), 5983.

import omicverse as ov
import scanpy as sc
import os
import warnings
warnings.filterwarnings('ignore')

ov.plot_set()

When to use Nicheformer vs. standard models

Scenario

Recommended Model

Dissociated scRNA-seq (no spatial info)

scGPT, Geneformer

Spatial transcriptomics (Visium, MERFISH)

Nicheformer

Cross-species spatial comparison

Nicheformer (human + mouse)

Niche/microenvironment analysis

Nicheformer

Nicheformer jointly models gene expression and spatial coordinates, capturing tissue organization that dissociated models miss.

Step 1: Inspect Model Specification

Use ov.fm.describe_model() to get the full spec for Nicheformer.

info = ov.fm.describe_model("nicheformer")

print("=== Model Info ===")
print(f"Name: {info['model']['name']}")
print(f"Version: {info['model']['version']}")
print(f"Tasks: {info['model']['tasks']}")
print(f"Species: {info['model']['species']}")
print(f"Embedding dim: {info['model']['embedding_dim']}")
print(f"Differentiator: {info['model']['differentiator']}")

print("\n=== Input Contract ===")
print(f"Gene ID scheme: {info['input_contract']['gene_id_scheme']}")
print(f"Preprocessing: {info['input_contract']['preprocessing']}")

print("\n=== Output Contract ===")
print(f"Embedding key: {info['output_contract']['embedding_key']}")
print(f"Embedding dim: {info['output_contract']['embedding_dim']}")

Step 2: Prepare Data

Load a dataset and save it for the ov.fm workflow. Most foundation models expect raw counts (non-negative values).

# Nicheformer excels with spatial transcriptomics data.
# For best results, use Visium / MERFISH / Slide-seq data with spatial coordinates.
# Here we demonstrate with PBMC3k (RNA-only) — the model still works for embedding.

adata = sc.datasets.pbmc3k()
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
print(f'Dataset: {adata.n_obs} cells x {adata.n_vars} genes')

# For spatial data, ensure adata.obsm['spatial'] exists:
# adata = sc.read_visium('path/to/visium/')

adata.write_h5ad('pbmc3k_nicheformer.h5ad')

Step 3: Profile Data & Validate Compatibility

Check whether your data is compatible with Nicheformer before running inference.

profile = ov.fm.profile_data("pbmc3k_nicheformer.h5ad")

print("=== Data Profile ===")
print(f"Species: {profile['species']}")
print(f"Gene scheme: {profile['gene_scheme']}")
print(f"Modality: {profile['modality']}")
print(f"Cells: {profile['n_cells']:,}")
print(f"Genes: {profile['n_genes']:,}")

# Validate compatibility
validation = ov.fm.preprocess_validate("pbmc3k_nicheformer.h5ad", "nicheformer", "embed")
print(f"\n=== Validation: {validation['status']} ===")
for d in validation.get("diagnostics", []):
    print(f"  [{d['severity']}] {d['message']}")
if validation.get("auto_fixes"):
    print("\nSuggested fixes:")
    for fix in validation["auto_fixes"]:
        print(f"  - {fix}")

Step 4: Run Nicheformer Inference

Execute Nicheformer through ov.fm.run(). The function handles preprocessing, model loading, inference, and output writing.

result = ov.fm.run(
    task="embed",
    model_name="nicheformer",
    adata_path="pbmc3k_nicheformer.h5ad",
    output_path="pbmc3k_nicheformer_out.h5ad",
    device="auto",
)

if "error" in result:
    print(f"Error: {result['error']}")
    if "suggestion" in result:
        print(f"Suggestion: {result['suggestion']}")
else:
    print(f"Status: {result['status']}")
    print(f"Output keys: {result.get('output_keys', [])}")
    print(f"Cells processed: {result.get('n_cells', 0)}")

Step 5: Visualize & Interpret Results

Load the output, compute UMAP from Nicheformer embeddings, and evaluate quality.

if os.path.exists("pbmc3k_nicheformer_out.h5ad"):
    adata_out = sc.read_h5ad("pbmc3k_nicheformer_out.h5ad")
    emb_key = "X_nicheformer"
    
    if emb_key in adata_out.obsm:
        print(f"Embedding shape: {adata_out.obsm[emb_key].shape}")
        
        # UMAP visualization
        sc.pp.neighbors(adata_out, use_rep=emb_key)
        sc.tl.umap(adata_out)
        sc.tl.leiden(adata_out, resolution=0.5)
        sc.pl.umap(adata_out, color=["leiden"],
                   title="Nicheformer Embedding (PBMC 3k)")
        
        # QA metrics
        interpretation = ov.fm.interpret_results("pbmc3k_nicheformer_out.h5ad", task="embed")
        if "embeddings" in interpretation["metrics"]:
            for k, v in interpretation["metrics"]["embeddings"].items():
                print(f"\n{k}: dim={v['dim']}", end="")
                if "silhouette" in v:
                    print(f", silhouette={v['silhouette']:.4f}", end="")
                print()
    else:
        print(f"Embedding key {emb_key} not found.")
        print(f"Available keys: {list(adata_out.obsm.keys())}")
else:
    print("Output file not found — check model installation and adapter status.")
    print("See the Guide page for installation instructions.")

Summary

Step

Function

What it does

1

ov.fm.describe_model("nicheformer")

Inspect model spec and I/O contract

2

sc.datasets.pbmc3k()

Prepare input data

3

ov.fm.profile_data() + preprocess_validate()

Check compatibility

4

ov.fm.run()

Execute Nicheformer inference

5

ov.fm.interpret_results()

Evaluate embedding quality

For the full model catalog, see ov.fm.list_models() or the ov.fm API Overview. For detailed Nicheformer specifications, see the Nicheformer Guide.