{ "cells": [ { "cell_type": "markdown", "id": "ca2a4297-3823-4db9-bf2a-164226c28260", "metadata": {}, "source": [ "# Spatial clustering and denoising expressions\n", "\n", "Spatial clustering, which shares an analogy with single-cell clustering, has expanded the scope of tissue physiology studies from cell-centroid to structure-centroid with spatially resolved transcriptomics (SRT) data.\n", "\n", "Here, we presented four spatial clustering methods in OmicVerse.\n", "\n", "We made three improvements in integrating the `GraphST`,`BINARY`,`Banksy`,`CAST` and `STAGATE` algorithm in OmicVerse:\n", "- We removed the preprocessing that comes with `GraphST` and used the preprocessing consistent with all SRTs in OmicVerse\n", "- We optimised the dimensional display of `GraphST`, and PCA is considered a self-contained computational step.\n", "- We implemented `mclust` using Python, removing the R language dependency.\n", "- We provided a unified interface `ov.space.cluster`, the user can use the function interface at once to complete all the simultaneous\n", "\n", "If you found this tutorial helpful, please cite `GraphST`,`BINARY`,`CAST`, `Banksy` and `STAGATE` and `OmicVerse`:\n", "\n", "- Long, Y., Ang, K.S., Li, M. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat Commun 14, 1155 (2023). https://doi.org/10.1038/s41467-023-36796-3\n", "- Lin S, Cui Y, Zhao F, Yang Z, Song J, Yao J, et al. Complete spatially resolved gene expression is not necessary for identifying spatial domains. Cell Genomics. 2024;4:100565.\n", "- Tang, Z., Luo, S., Zeng, H. et al. Search and match across spatial omics samples at single-cell resolution. Nat Methods 21, 1818–1829 (2024). https://doi.org/10.1038/s41592-024-02410-7\n", "- Singhal, V., Chou, N., Lee, J. et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat Genet 56, 431–441 (2024)\n", "- Dong, K., Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun 13, 1739 (2022). https://doi.org/10.1038/s41467-022-29439-6\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "711f609f-84b4-4d5a-ae2b-3d14c9ae6494", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔬 Starting plot initialization...\n", "Using already downloaded Arial font from: /tmp/omicverse_arial.ttf\n", "Registered as: Arial\n", "🧬 Detecting GPU devices…\n", "✅ NVIDIA CUDA GPUs detected: 1\n", " • [CUDA 0] NVIDIA H100 80GB HBM3\n", " Memory: 79.1 GB | Compute: 9.0\n", "\n", " ____ _ _ __ \n", " / __ \\____ ___ (_)___| | / /__ _____________ \n", " / / / / __ `__ \\/ / ___/ | / / _ \\/ ___/ ___/ _ \\ \n", "/ /_/ / / / / / / / /__ | |/ / __/ / (__ ) __/ \n", "\\____/_/ /_/ /_/_/\\___/ |___/\\___/_/ /____/\\___/ \n", "\n", "🔖 Version: 1.7.9rc1 📚 Tutorials: https://omicverse.readthedocs.io/\n", "✅ plot_set complete.\n", "\n" ] } ], "source": [ "import omicverse as ov\n", "#print(f\"omicverse version: {ov.__version__}\")\n", "import scanpy as sc\n", "#print(f\"scanpy version: {sc.__version__}\")\n", "ov.style(font_path='Arial')\n", "\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "id": "5960e503-e6a2-4d3a-b797-92d2635e6ebb", "metadata": {}, "source": [ "## Preprocess data\n", "\n", "Here we present our re-analysis of 151676 sample of the dorsolateral prefrontal cortex (DLPFC) dataset. Maynard et al. has manually annotated DLPFC layers and white matter (WM) based on the morphological features and gene markers.\n", "\n", "This tutorial demonstrates how to identify spatial domains on 10x Visium data using STAGATE. The processed data are available at https://github.com/LieberInstitute/spatialLIBD. We downloaded the manual annotation from the spatialLIBD package and provided at https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing." ] }, { "cell_type": "code", "execution_count": 2, "id": "b3026bbf-328a-49c7-acc4-c23b10ca6389", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "reading data/151676/151676_filtered_feature_bc_matrix.h5\n", " (0:00:00)\n" ] } ], "source": [ "adata = sc.read_visium(\n", " path='data/151676', \n", " count_file='151676_filtered_feature_bc_matrix.h5'\n", ")\n", "adata.var_names_make_unique()" ] }, { "cell_type": "markdown", "id": "8ff7be34-bdb7-4f15-885d-8050efc6b850", "metadata": {}, "source": [ "
Note
\n", "\n", " We introduced the spatial special svg calculation module prost in omicverse versions greater than `1.6.0` to replace scanpy's HVGs, if you want to use scanpy's HVGs you can set mode=`scanpy` in `ov.space.svg` or use the following code.\n", "
\n", "| \n", " | gene_ids | \n", "feature_types | \n", "genome | \n", "n_cells_by_counts | \n", "mean_counts | \n", "log1p_mean_counts | \n", "pct_dropout_by_counts | \n", "total_counts | \n", "log1p_total_counts | \n", "n_cells | \n", "SEP | \n", "SIG | \n", "PI | \n", "Moran_I | \n", "Geary_C | \n", "p_norm | \n", "p_rand | \n", "fdr_norm | \n", "fdr_rand | \n", "space_variable_features | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MBP | \n", "ENSG00000197971 | \n", "Gene Expression | \n", "GRCh38 | \n", "3411 | \n", "15.419075 | \n", "2.798444 | \n", "1.416185 | \n", "53350.0 | \n", "10.884648 | \n", "3411 | \n", "0.823299 | \n", "0.214148 | \n", "1.000000 | \n", "0.910362 | \n", "0.092733 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "True | \n", "
| GFAP | \n", "ENSG00000131095 | \n", "Gene Expression | \n", "GRCh38 | \n", "2938 | \n", "3.930347 | \n", "1.595409 | \n", "15.086705 | \n", "13599.0 | \n", "9.517825 | \n", "2938 | \n", "0.694169 | \n", "0.129941 | \n", "0.587889 | \n", "0.743831 | \n", "0.255528 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "True | \n", "
| PLP1 | \n", "ENSG00000123560 | \n", "Gene Expression | \n", "GRCh38 | \n", "3214 | \n", "9.255780 | \n", "2.327842 | \n", "7.109827 | \n", "32025.0 | \n", "10.374304 | \n", "3214 | \n", "0.668771 | \n", "0.099919 | \n", "0.478698 | \n", "0.737326 | \n", "0.264750 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "True | \n", "
| MT-ND1 | \n", "ENSG00000198888 | \n", "Gene Expression | \n", "GRCh38 | \n", "3460 | \n", "74.200577 | \n", "4.320159 | \n", "0.000000 | \n", "256734.0 | \n", "12.455800 | \n", "3460 | \n", "0.362000 | \n", "0.163292 | \n", "0.359299 | \n", "0.740392 | \n", "0.262487 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "True | \n", "
| MT-CO1 | \n", "ENSG00000198804 | \n", "Gene Expression | \n", "GRCh38 | \n", "3460 | \n", "115.025436 | \n", "4.753809 | \n", "0.000000 | \n", "397988.0 | \n", "12.894179 | \n", "3460 | \n", "0.472005 | \n", "0.100106 | \n", "0.338241 | \n", "0.755924 | \n", "0.246897 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "True | \n", "
| \n", " | decay | \n", "lambda_param | \n", "num_pcs | \n", "resolution | \n", "num_labels | \n", "labels | \n", "adata | \n", "
|---|---|---|---|---|---|---|---|
| scaled_gaussian_pc20_nc0.20_r0.80 | \n", "scaled_gaussian | \n", "0.2 | \n", "20 | \n", "0.8 | \n", "11 | \n", "Label object:\\nNumber of labels: 11, number of... | \n", "[[[View of AnnData object with n_obs × n_vars ... | \n", "