{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# FlashDeconv: Fast Spatial Deconvolution via Structure-Preserving Sketching\n", "\n", "This notebook demonstrates how to use FlashDeconv for spatial transcriptomics cell type deconvolution through the unified `omicverse.space.Deconvolution` API.\n", "\n", "## Why FlashDeconv?\n", "\n", "- **Scalability**: Handles millions of spots (Visium HD, Slide-seq) without GPU requirement\n", "- **Speed**: Uses randomized sketching for O(n) complexity instead of O(n²)\n", "- **Spatial awareness**: Incorporates graph Laplacian regularization for spatially smooth results\n", "- **Integration**: scanpy-style API, seamlessly works with AnnData objects\n", "\n", "## Inputs and Outputs\n", "\n", "- **Inputs**:\n", " - Spatial transcriptomics data (10x Visium, Visium HD, Slide-seq, etc.)\n", " - Single-cell reference with cell type annotations\n", "- **Outputs**:\n", " - Cell type proportions per spot (stored in `adata.obsm['flashdeconv']`)\n", " - Dominant cell type per spot\n", " - Compatible `adata_cell2location` object for downstream analysis\n", "\n", "## Workflow Overview\n", "\n", "1. Load scRNA-seq reference and spatial data (~1 min)\n", "2. Run FlashDeconv deconvolution (~2-5 min for standard Visium)\n", "3. Visualize results (~5 min)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔬 Starting plot initialization...\n", "Using already downloaded Arial font from: /tmp/omicverse_arial.ttf\n", "Registered as: Arial\n", "🧬 Detecting GPU devices…\n", "✅ NVIDIA CUDA GPUs detected: 1\n", " • [CUDA 0] NVIDIA H100 80GB HBM3\n", " Memory: 79.1 GB | Compute: 9.0\n", "\n", " ____ _ _ __ \n", " / __ \\____ ___ (_)___| | / /__ _____________ \n", " / / / / __ `__ \\/ / ___/ | / / _ \\/ ___/ ___/ _ \\ \n", "/ /_/ / / / / / / / /__ | |/ / __/ / (__ ) __/ \n", "\\____/_/ /_/ /_/_/\\___/ |___/\\___/_/ /____/\\___/ \n", "\n", "🔖 Version: 1.7.9rc1 📚 Tutorials: https://omicverse.readthedocs.io/\n", "✅ plot_set complete.\n", "\n" ] } ], "source": [ "import omicverse as ov\n", "import scanpy as sc\n", "import matplotlib.pyplot as plt\n", "\n", "ov.plot_set(font_path='Arial')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Load Data\n", "\n", "### 1.1 Load scRNA-seq reference\n", "\n", "The reference should contain cell type annotations in `.obs`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Subset\n", "B_mem 13476\n", "B_naive 8924\n", "T_CD4+_naive 6012\n", "B_Cycling 4765\n", "T_CD4+_TfH 4690\n", "T_CD8+_cytotoxic 3890\n", "T_CD4+_TfH_GC 3653\n", "B_activated 3575\n", "B_GC_LZ 3298\n", "T_CD4+ 3059\n", "T_Treg 2958\n", "B_GC_DZ 2500\n", "T_CD8+_CD161+ 2294\n", "T_CD8+_naive 2253\n", "NK 1372\n", "B_plasma 1094\n", "T_TfR 1065\n", "NKT 896\n", "Endo 622\n", "ILC 617\n", "B_preGC 404\n", "T_TIM3+ 357\n", "Monocytes 306\n", "DC_pDC 226\n", "B_IFN 199\n", "DC_cDC2 173\n", "Macrophages_M1 121\n", "Macrophages_M2 110\n", "DC_cDC1 101\n", "FDC 76\n", "B_GC_prePB 74\n", "DC_CCR7+ 42\n", "VSMC 40\n", "Mast 18\n", "Name: count, dtype: int64\n" ] } ], "source": [ "# Load your scRNA-seq reference\n", "# Example: Human lymph node reference\n", "adata_sc = ov.datasets.sc_ref_Lymph_Node()\n", "\n", "# Check cell type annotations\n", "print(adata_sc.obs['Subset'].value_counts())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2 Load spatial transcriptomics data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "reading /scratch/users/steorra/analysis/omic_test/data/V1_Human_Lymph_Node/filtered_feature_bc_matrix.h5\n", " (0:00:00)\n", "Spatial data: 4035 spots, 36601 genes\n" ] } ], "source": [ "# Load spatial data (example: Visium human lymph node)\n", "adata_sp = sc.datasets.visium_sge(sample_id=\"V1_Human_Lymph_Node\")\n", "adata_sp.obs['sample'] = list(adata_sp.uns['spatial'].keys())[0]\n", "adata_sp.var_names_make_unique()\n", "\n", "print(f\"Spatial data: {adata_sp.n_obs} spots, {adata_sp.n_vars} genes\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Run FlashDeconv Deconvolution\n", "\n", "FlashDeconv is integrated into the `omicverse.space.Deconvolution` class. Simply set `method='FlashDeconv'`.\n", "\n", "### Key Parameters\n", "\n", "- `sketch_dim`: Dimension of sketched space (default: 512). Higher values preserve more information.\n", "- `lambda_spatial`: Spatial regularization strength (default: 5000). Higher values encourage smoother spatial patterns.\n", "- `n_hvg`: Number of highly variable genes to use (default: 2000).\n", "- `n_markers_per_type`: Number of marker genes per cell type (default: 50)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Initialize the Deconvolution object\n", "decov_obj = ov.space.Deconvolution(\n", " adata_sc=adata_sc,\n", " adata_sp=adata_sp\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running FlashDeconv with parameters: {'sketch_dim': 512, 'lambda_spatial': 10.0, 'n_hvg': 3000, 'n_markers_per_type': 50}\n", "\u001b[92m✓ FlashDeconv deconvolution is done\u001b[0m\n", "The deconvolution result is saved in self.adata_cell2location\n", "Cell type proportions are also stored in self.adata_sp.obsm['flashdeconv']\n" ] } ], "source": [ "# Run FlashDeconv deconvolution\n", "decov_obj.deconvolution(\n", " method='FlashDeconv',\n", " celltype_key_sc='Subset', # Column containing cell type annotations\n", " flashdeconv_kwargs={\n", " 'sketch_dim': 512, # Sketch dimension\n", " 'lambda_spatial': 10.0, # Spatial regularization\n", " 'n_hvg': 3000, # Number of HVGs\n", " 'n_markers_per_type': 50, # Markers per cell type\n", " }\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Access Results\n", "\n", "Results are stored in multiple locations for compatibility:\n", "- `decov_obj.adata_cell2location`: AnnData with cell type proportions as X matrix\n", "- `decov_obj.adata_sp.obsm['flashdeconv']`: DataFrame of proportions\n", "- `decov_obj.adata_sp.obs['flashdeconv_dominant']`: Dominant cell type per spot" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 4035 × 34\n", " obs: 'in_tissue', 'array_row', 'array_col', 'sample', 'flashdeconv_B_Cycling', 'flashdeconv_B_GC_DZ', 'flashdeconv_B_GC_LZ', 'flashdeconv_B_GC_prePB', 'flashdeconv_B_IFN', 'flashdeconv_B_activated', 'flashdeconv_B_mem', 'flashdeconv_B_naive', 'flashdeconv_B_plasma', 'flashdeconv_B_preGC', 'flashdeconv_DC_CCR7+', 'flashdeconv_DC_cDC1', 'flashdeconv_DC_cDC2', 'flashdeconv_DC_pDC', 'flashdeconv_Endo', 'flashdeconv_FDC', 'flashdeconv_ILC', 'flashdeconv_Macrophages_M1', 'flashdeconv_Macrophages_M2', 'flashdeconv_Mast', 'flashdeconv_Monocytes', 'flashdeconv_NK', 'flashdeconv_NKT', 'flashdeconv_T_CD4+', 'flashdeconv_T_CD4+_TfH', 'flashdeconv_T_CD4+_TfH_GC', 'flashdeconv_T_CD4+_naive', 'flashdeconv_T_CD8+_CD161+', 'flashdeconv_T_CD8+_cytotoxic', 'flashdeconv_T_CD8+_naive', 'flashdeconv_T_TIM3+', 'flashdeconv_T_TfR', 'flashdeconv_T_Treg', 'flashdeconv_VSMC', 'flashdeconv_dominant'\n", " uns: 'spatial', 'flashdeconv_params'\n", " obsm: 'spatial', 'flashdeconv'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# View the result object\n", "decov_obj.adata_cell2location" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | B_Cycling | \n", "B_GC_DZ | \n", "B_GC_LZ | \n", "B_GC_prePB | \n", "B_IFN | \n", "B_activated | \n", "B_mem | \n", "B_naive | \n", "B_plasma | \n", "B_preGC | \n", "... | \n", "T_CD4+_TfH | \n", "T_CD4+_TfH_GC | \n", "T_CD4+_naive | \n", "T_CD8+_CD161+ | \n", "T_CD8+_cytotoxic | \n", "T_CD8+_naive | \n", "T_TIM3+ | \n", "T_TfR | \n", "T_Treg | \n", "VSMC | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AAACAAGTATCTCCCA-1 | \n", "0.000000 | \n", "0.142163 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.085327 | \n", "0.390464 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.000000 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.000000 | \n", "0.00000 | \n", "0.0 | \n", "0.040701 | \n", "
| AAACAATCTACTAGCA-1 | \n", "0.000000 | \n", "0.163051 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.116814 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.049438 | \n", "0.176642 | \n", "0.0 | \n", "0.0 | \n", "0.005135 | \n", "0.000000 | \n", "0.15238 | \n", "0.0 | \n", "0.026961 | \n", "
| AAACACCAATAACTGC-1 | \n", "0.016557 | \n", "0.130161 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.046586 | \n", "0.232831 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.000000 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.013086 | \n", "0.00000 | \n", "0.0 | \n", "0.054721 | \n", "
| AAACAGAGCGACTCCT-1 | \n", "0.000000 | \n", "0.172508 | \n", "0.0 | \n", "0.0 | \n", "0.225865 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.078707 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.000000 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.126967 | \n", "0.00000 | \n", "0.0 | \n", "0.000000 | \n", "
| AAACAGCTTTCAGAAG-1 | \n", "0.000000 | \n", "0.150919 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.403057 | \n", "0.133602 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.000000 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.000000 | \n", "0.00000 | \n", "0.0 | \n", "0.000000 | \n", "
5 rows × 34 columns
\n", "