{ "cells": [ { "cell_type": "markdown", "id": "1d171114-ad09-4986-9f3b-adf40fc61f46", "metadata": {}, "source": [ "# Spatial deconvolution without reference scRNA-seq\n", "\n", "This is a tutorial on an example real Spatial Transcriptomics (ST) data (CID44971_TNBC) from Wu et al., 2021. Raw tutorial could be found in https://starfysh.readthedocs.io/en/latest/notebooks/Starfysh_tutorial_real.html\n", "\n", "\n", "Starfysh performs cell-type deconvolution followed by various downstream analyses to discover spatial interactions in tumor microenvironment. Specifically, Starfysh looks for anchor spots (presumably with the highest compositions of one given cell type) informed by user-provided gene signatures ([see example](https://drive.google.com/file/d/1AXWQy_mwzFEKNjAdrJjXuegB3onxJoOM/view?usp=share_link)) as priors to guide the deconvolution inference, which further enables downstream analyses such as sample integration, spatial hub characterization, cell-cell interactions, etc. This tutorial focuses on the deconvolution task. Overall, Starfysh provides the following options:\n", "\n", "At omicverse, we have made the following improvements:\n", "- Easier visualization, you can use omicverse unified visualization for scientific mapping\n", "- Reduce installation dependency errors, we optimized the automatic selection of different packages, you don't need to install too many extra packages and cause conflicts.\n", "\n", "**Base feature**:\n", "\n", "- Spot-level deconvolution with expected cell types and corresponding annotated signature gene sets (default)\n", "\n", "\n", "He, S., Jin, Y., Nazaret, A. et al.\n", "Starfysh integrates spatial transcriptomic and histologic data to reveal heterogeneous tumorβimmune hubs.\n", "Nat Biotechnol (2024).\n", "https://doi.org/10.1038/s41587-024-02173-8" ] }, { "cell_type": "code", "execution_count": 1, "id": "6f20b97a-567e-4f72-9c62-ac2dd4423f5e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "π¬ Starting plot initialization...\n", "Using already downloaded Arial font from: /tmp/omicverse_arial.ttf\n", "Registered as: Arial\n", "𧬠Detecting GPU devicesβ¦\n", "β NVIDIA CUDA GPUs detected: 1\n", " β’ [CUDA 0] NVIDIA H100 80GB HBM3\n", " Memory: 79.1 GB | Compute: 9.0\n", "\n", " ____ _ _ __ \n", " / __ \\____ ___ (_)___| | / /__ _____________ \n", " / / / / __ `__ \\/ / ___/ | / / _ \\/ ___/ ___/ _ \\ \n", "/ /_/ / / / / / / / /__ | |/ / __/ / (__ ) __/ \n", "\\____/_/ /_/ /_/_/\\___/ |___/\\___/_/ /____/\\___/ \n", "\n", "π Version: 1.7.9rc1 π Tutorials: https://omicverse.readthedocs.io/\n", "β plot_set complete.\n", "\n" ] } ], "source": [ "import cv2\n", "import scanpy as sc\n", "import omicverse as ov\n", "ov.style(font_path='Arial')" ] }, { "cell_type": "markdown", "id": "3f4bf20e-adfd-4f48-a3e0-25d2a4d2011f", "metadata": {}, "source": [ "## Step 1: Prepare spatial transcriptomics (1 min)\n", "\n", "Purpose: load 10x Visium (Space Ranger outputs) or similar to obtain a coordinate-aware spatial `AnnData` (`adata_sp`).\n", "\n", "- Inputs: Visium count matrix and spatial coordinates (from the `spatial` folder)\n", "- Outputs: `AnnData` object (`adata_sp`) with spot coordinates and counts\n", "- Key points:\n", " - Ensure maximal gene overlap with the scRNA-seq reference; map gene IDs if necessary.\n", " - For multiple samples, keep batch labels explicit to support merging and visualization.\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "49053051-660b-4b2e-a657-a8fdabb986ab", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "reading /scratch/users/steorra/analysis/omic_test/data/V1_Human_Lymph_Node/filtered_feature_bc_matrix.h5\n", " (0:00:00)\n" ] } ], "source": [ "adata_sp = sc.datasets.visium_sge(sample_id=\"V1_Human_Lymph_Node\")\n", "adata_sp.obs['sample'] = list(adata_sp.uns['spatial'].keys())[0]\n", "adata_sp.var_names_make_unique()" ] }, { "cell_type": "markdown", "id": "2fb4246d-76aa-4821-8918-adb36ab18935", "metadata": {}, "source": [ "## Step 2: Prepare the gene sig marker\n", "\n", "gene_sig means the dataframe stored the marker gene in each columns. If you don't have it, you can calculated it using `ov.space.calculate_gene_signature`" ] }, { "cell_type": "code", "execution_count": 3, "id": "a3096d05-ddc1-4320-af3e-e0f45da7d3f6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[95m𧬠Loading SC reference data for Lymph Node\u001b[0m\n", "\u001b[94mπ Downloading data to ./data/sc_ref_Lymph_Node.h5ad\u001b[0m\n", "\u001b[93mβ οΈ File ./data/sc_ref_Lymph_Node.h5ad already exists\u001b[0m\n", "\u001b[96m Loading data from ./data/sc_ref_Lymph_Node.h5ad\u001b[0m\n", "\u001b[92mβ Successfully loaded: 73260 cells Γ 10237 genes\u001b[0m\n", "...get cell type marker\n", "ranking genes\n", "WARNING: It seems you use rank_genes_groups on the raw count data. Please logarithmize your data before calling rank_genes_groups.\n", " finished: added to `.uns['rank_genes_groups']`\n", " 'names', sorted np.recarray to be indexed by group ids\n", " 'scores', sorted np.recarray to be indexed by group ids\n", " 'logfoldchanges', sorted np.recarray to be indexed by group ids\n", " 'pvals', sorted np.recarray to be indexed by group ids\n", " 'pvals_adj', sorted np.recarray to be indexed by group ids (0:01:00)\n" ] }, { "data": { "text/html": [ "
| \n", " | B_Cycling | \n", "B_GC_DZ | \n", "B_GC_LZ | \n", "B_GC_prePB | \n", "B_IFN | \n", "B_activated | \n", "B_mem | \n", "B_naive | \n", "B_plasma | \n", "B_preGC | \n", "... | \n", "T_CD4+_TfH | \n", "T_CD4+_TfH_GC | \n", "T_CD4+_naive | \n", "T_CD8+_CD161+ | \n", "T_CD8+_cytotoxic | \n", "T_CD8+_naive | \n", "T_TIM3+ | \n", "T_TfR | \n", "T_Treg | \n", "VSMC | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "HMGN2 | \n", "CD79B | \n", "CD22 | \n", "BIK | \n", "HLA-DPA1 | \n", "HLA-DPA1 | \n", "HLA-DPA1 | \n", "HLA-DPA1 | \n", "SDF2L1 | \n", "HLA-DPA1 | \n", "... | \n", "TRAC | \n", "TCF7 | \n", "RPS15A | \n", "ZFP36 | \n", "CD3E | \n", "NOSIP | \n", "TMSB4X | \n", "CD4 | \n", "TRAC | \n", "TAGLN | \n", "
| 1 | \n", "DEK | \n", "TCEA1 | \n", "PRPSAP2 | \n", "PRPSAP2 | \n", "ISG15 | \n", "HVCN1 | \n", "CD52 | \n", "HVCN1 | \n", "SSR3 | \n", "CD72 | \n", "... | \n", "CD69 | \n", "SRGN | \n", "NOSIP | \n", "SRGN | \n", "CST7 | \n", "TCF7 | \n", "LCK | \n", "SRGN | \n", "SRGN | \n", "LGALS3 | \n", "
| 2 | \n", "HMGN1 | \n", "GAPDH | \n", "HMGN1 | \n", "VPREB3 | \n", "STAT1 | \n", "CD74 | \n", "SMIM14 | \n", "CD79B | \n", "PPIB | \n", "NME2 | \n", "... | \n", "SRGN | \n", "PASK | \n", "TCF7 | \n", "RGCC | \n", "GZMK | \n", "RGCC | \n", "SRGN | \n", "CD3E | \n", "RGCC | \n", "CST3 | \n", "
| 3 | \n", "GAPDH | \n", "SUGCT | \n", "GAPDH | \n", "HMCES | \n", "CD74 | \n", "TCL1A | \n", "CD74 | \n", "CD72 | \n", "SEC61G | \n", "CD74 | \n", "... | \n", "CD3E | \n", "ITM2A | \n", "RPS14 | \n", "CD8A | \n", "NKG7 | \n", "DNAJB1 | \n", "CD3E | \n", "ZAP70 | \n", "DNAJB1 | \n", "LAPTM4A | \n", "
| 4 | \n", "HMGB1 | \n", "EZR | \n", "CD74 | \n", "MEF2B | \n", "IFIT3 | \n", "HLA-DPB1 | \n", "VPREB3 | \n", "CD74 | \n", "HERPUD1 | \n", "HSP90AB1 | \n", "... | \n", "SLC2A3 | \n", "CD3E | \n", "LEF1 | \n", "CCL5 | \n", "CTSW | \n", "LEF1 | \n", "CD27 | \n", "TCRA_VDJsum | \n", "FOXP3 | \n", "IGFBP5 | \n", "
5 rows Γ 34 columns
\n", "| \n", " | B_Cycling | \n", "B_GC_DZ | \n", "B_GC_LZ | \n", "B_GC_prePB | \n", "B_IFN | \n", "B_activated | \n", "B_mem | \n", "B_naive | \n", "B_plasma | \n", "B_preGC | \n", "... | \n", "T_CD4+_TfH | \n", "T_CD4+_TfH_GC | \n", "T_CD4+_naive | \n", "T_CD8+_CD161+ | \n", "T_CD8+_cytotoxic | \n", "T_CD8+_naive | \n", "T_TIM3+ | \n", "T_TfR | \n", "T_Treg | \n", "VSMC | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "HMGN2 | \n", "CD79B | \n", "CD22 | \n", "BIK | \n", "HLA-DPA1 | \n", "HLA-DPA1 | \n", "HLA-DPA1 | \n", "HLA-DPA1 | \n", "SDF2L1 | \n", "HLA-DPA1 | \n", "... | \n", "TRAC | \n", "TCF7 | \n", "RPS15A | \n", "ZFP36 | \n", "CD3E | \n", "NOSIP | \n", "TMSB4X | \n", "CD4 | \n", "TRAC | \n", "TAGLN | \n", "
| 1 | \n", "DEK | \n", "TCEA1 | \n", "PRPSAP2 | \n", "PRPSAP2 | \n", "ISG15 | \n", "HVCN1 | \n", "CD52 | \n", "HVCN1 | \n", "SSR3 | \n", "CD72 | \n", "... | \n", "CD69 | \n", "SRGN | \n", "NOSIP | \n", "SRGN | \n", "CST7 | \n", "TCF7 | \n", "LCK | \n", "SRGN | \n", "SRGN | \n", "LGALS3 | \n", "
| 2 | \n", "HMGN1 | \n", "GAPDH | \n", "HMGN1 | \n", "VPREB3 | \n", "STAT1 | \n", "CD74 | \n", "SMIM14 | \n", "CD79B | \n", "PPIB | \n", "NME2 | \n", "... | \n", "SRGN | \n", "PASK | \n", "TCF7 | \n", "RGCC | \n", "GZMK | \n", "RGCC | \n", "SRGN | \n", "CD3E | \n", "RGCC | \n", "CST3 | \n", "
| 3 | \n", "GAPDH | \n", "SUGCT | \n", "GAPDH | \n", "HMCES | \n", "CD74 | \n", "TCL1A | \n", "CD74 | \n", "CD72 | \n", "SEC61G | \n", "CD74 | \n", "... | \n", "CD3E | \n", "ITM2A | \n", "RPS14 | \n", "CD8A | \n", "NKG7 | \n", "DNAJB1 | \n", "CD3E | \n", "ZAP70 | \n", "DNAJB1 | \n", "LAPTM4A | \n", "
| 4 | \n", "HMGB1 | \n", "EZR | \n", "CD74 | \n", "MEF2B | \n", "IFIT3 | \n", "HLA-DPB1 | \n", "VPREB3 | \n", "CD74 | \n", "HERPUD1 | \n", "HSP90AB1 | \n", "... | \n", "SLC2A3 | \n", "CD3E | \n", "LEF1 | \n", "CCL5 | \n", "CTSW | \n", "LEF1 | \n", "CD27 | \n", "TCRA_VDJsum | \n", "FOXP3 | \n", "IGFBP5 | \n", "
5 rows Γ 34 columns
\n", "