{ "cells": [ { "cell_type": "markdown", "id": "879ade2f-bd73-4a5c-b038-98c916867812", "metadata": {}, "source": [ "# Bulk RNA-seq to Single RNA-seq\n", "\n", "Bulk2Single is used for bulk RNA-seq deconvolution. We extracted the beta-VAE part of the Bulk2Space algorithm and constructed an algorithm that can deconvolute from Bulk RNA-seq to Single Cell RNA-seq. In addition, we have redesigned the input and output of the data so that it can be more compatible with the analysis conventions in the Python environment.\n", "\n", "Paper: [De novo analysis of bulk RNA-seq data at spatially resolved single-cell resolution](https://www.nature.com/articles/s41467-022-34271-z)\n", "\n", "Code: https://github.com/ZJUFanLab/bulk2space\n", "\n", "Colab_Reproducibility:https://colab.research.google.com/drive/1He71hAyeAv1DHQyXUlxtoJ4QvwZwW7I0?usp=sharing\n", "\n", "This tutorial walks through how to read, set-up and train the model from bulk RNA-seq and reference scRNA-seq data. We use the pdac datasets as example" ] }, { "cell_type": "code", "execution_count": 1, "id": "c41d1f20-93fd-4aa6-81b2-115d8ce10d17", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " ____ _ _ __ \n", " / __ \\____ ___ (_)___| | / /__ _____________ \n", " / / / / __ `__ \\/ / ___/ | / / _ \\/ ___/ ___/ _ \\ \n", "/ /_/ / / / / / / / /__ | |/ / __/ / (__ ) __/ \n", "\\____/_/ /_/ /_/_/\\___/ |___/\\___/_/ /____/\\___/ \n", "\n", "Version: 1.6.3, Tutorials: https://omicverse.readthedocs.io/\n" ] } ], "source": [ "import scanpy as sc\n", "import omicverse as ov\n", "import matplotlib.pyplot as plt\n", "ov.plot_set()" ] }, { "cell_type": "markdown", "id": "956f8ded-20bf-4275-baa0-7e13675af1c2", "metadata": {}, "source": [ "## loading data\n", "\n", "For illustration, we apply differential kinetic analysis to dentate gyrus neurogenesis, which comprises multiple heterogeneous subpopulations.\n", "\n", "We utilized single-cell RNA-seq data (GEO accession: GSE95753) obtained from the dentate gyrus of the hippocampus in rats, along with bulk RNA-seq data (GEO accession: GSE74985). " ] }, { "cell_type": "code", "execution_count": 2, "id": "95a084f1-c9f8-4afa-b3b7-218bbab34e76", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
| \n", " | dg_d_1 | \n", "dg_d_2 | \n", "dg_d_3 | \n", "dg_v_1 | \n", "dg_v_2 | \n", "dg_v_3 | \n", "ca4_1 | \n", "ca4_2 | \n", "ca4_3 | \n", "ca3_d_1 | \n", "... | \n", "ca3_v_3 | \n", "ca2_1 | \n", "ca2_2 | \n", "ca2_3 | \n", "ca1_d_1 | \n", "ca1_d_2 | \n", "ca1_d_3 | \n", "ca1_v_1 | \n", "ca1_v_2 | \n", "ca1_v_3 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Gm12150 | \n", "0 | \n", "2 | \n", "0 | \n", "11 | \n", "0 | \n", "9 | \n", "72 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| Mir219a-2 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| Hspd1 | \n", "1418 | \n", "685 | \n", "1404 | \n", "3073 | \n", "2316 | \n", "1945 | \n", "7724 | \n", "8255 | \n", "6802 | \n", "4956 | \n", "... | \n", "8154 | \n", "7104 | \n", "5854 | \n", "7508 | \n", "5322 | \n", "6172 | \n", "5199 | \n", "1865 | \n", "1253 | \n", "2298 | \n", "
| Crhbp | \n", "0 | \n", "0 | \n", "0 | \n", "31 | \n", "17 | \n", "32 | \n", "0 | \n", "0 | \n", "0 | \n", "29 | \n", "... | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| Gm11735 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
5 rows × 24 columns
\n", "| \n", " | Astrocytes | \n", "Cajal Retzius | \n", "Cck-Tox | \n", "Endothelial | \n", "GABA | \n", "Granule immature | \n", "Granule mature | \n", "Microglia | \n", "Mossy | \n", "Neuroblast | \n", "OL | \n", "OPC | \n", "Radial Glia-like | \n", "nIPC | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| dg_d_1 | \n", "0.004780 | \n", "0.003839 | \n", "0.004187 | \n", "0.002460 | \n", "0.005536 | \n", "0.527208 | \n", "0.393742 | \n", "0.005203 | \n", "0.028935 | \n", "0.004639 | \n", "0.007397 | \n", "0.005216 | \n", "0.002961 | \n", "0.003898 | \n", "
| dg_d_2 | \n", "0.005013 | \n", "0.002877 | \n", "0.003001 | \n", "0.002407 | \n", "0.004481 | \n", "0.508747 | \n", "0.413222 | \n", "0.004478 | \n", "0.032327 | \n", "0.006355 | \n", "0.007488 | \n", "0.004283 | \n", "0.002102 | \n", "0.003218 | \n", "
| dg_d_3 | \n", "0.003915 | \n", "0.002676 | \n", "0.002945 | \n", "0.002558 | \n", "0.005772 | \n", "0.479360 | \n", "0.446842 | \n", "0.004949 | \n", "0.026702 | \n", "0.006624 | \n", "0.008542 | \n", "0.004052 | \n", "0.002157 | \n", "0.002908 | \n", "
| dg_v_1 | \n", "0.003247 | \n", "0.002842 | \n", "0.003309 | \n", "0.001613 | \n", "0.010134 | \n", "0.539566 | \n", "0.347792 | \n", "0.002481 | \n", "0.063813 | \n", "0.006122 | \n", "0.008335 | \n", "0.005785 | \n", "0.002116 | \n", "0.002846 | \n", "
| dg_v_2 | \n", "0.004015 | \n", "0.003188 | \n", "0.003747 | \n", "0.002137 | \n", "0.010382 | \n", "0.523644 | \n", "0.362331 | \n", "0.002693 | \n", "0.056484 | \n", "0.009367 | \n", "0.008487 | \n", "0.007403 | \n", "0.002478 | \n", "0.003644 | \n", "
Note
\n", "\n", " the default max epochs is set to 3500, but in practice Bulk2Single stops early once the model converges, which rarely requires that many, especially for large datasets.(We can set the `patience` to control the stop steps)\n", "
\n", "