{ "cells": [ { "cell_type": "markdown", "id": "77430c68-6439-4e72-9b6d-8dee371212f4", "metadata": { "scrolled": true }, "source": [ "# Bulk deconvolution with reference scRNA-seq\n", "\n", "Cell type deconvolution is a computational framework designed for inferring the compositions of cell populations within a bulk heterogeneous tissue. Bulk deconvolution approaches can be divided into linear regression based methods, enrichment based methods, non-linear deep-learning based methods and others.\n", "\n", "Here, we provide `Bayesprime` and `scaden` to infer the celltype compositions using scrna-seq as reference with class `omicverse.bulk.Deconvolution`. It is very easy for user to run bulk deconvolution via omicverse in python enviroments. we combined `InstaPrism` and `pybayesprime` to accerlate the calculation.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "4f545721-dfa4-4f30-b051-7b430f059822", "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "id": "44a4a606-7d95-4429-be23-9c1518408613", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "π¬ Starting plot initialization...\n", "𧬠Detecting GPU devicesβ¦\n", "β Apple Silicon MPS detected\n", " β’ [MPS] Apple Silicon GPU - Metal Performance Shaders available\n", "\n", " ____ _ _ __ \n", " / __ \\____ ___ (_)___| | / /__ _____________ \n", " / / / / __ `__ \\/ / ___/ | / / _ \\/ ___/ ___/ _ \\ \n", "/ /_/ / / / / / / / /__ | |/ / __/ / (__ ) __/ \n", "\\____/_/ /_/ /_/_/\\___/ |___/\\___/_/ /____/\\___/ \n", "\n", "π Version: 1.7.8rc1 π Tutorials: https://omicverse.readthedocs.io/\n", "β plot_set complete.\n", "\n" ] } ], "source": [ "import omicverse as ov\n", "ov.plot_set()" ] }, { "cell_type": "markdown", "id": "837a3e99-207f-410e-bb60-006c0e4f76c3", "metadata": {}, "source": [ "## 1. Data prepare\n", "\n", "To demonstrate the accuracy of our integrated bayesprime and scaden tools, we use bulk RNA-seq data from COVID-19 for this tutorial.\n", "\n", "- Bulk RNA-seq: We can obtain bulk RNA-seq data for COVID-19 through GSE152418, which includes 17 healthy controls, 16 COVID-19 patients, and 1 COVID-19 convalescent patient.\n", "- scRNA-seq: We can obtain scRNA-seq reference from [cellxgene](https://cellxgene.cziscience.com/collections/a72afd53-ab92-4511-88da-252fb0e26b9a) directly, which includes `healthy` and `COVID` groups to let us know the celltype compositions.\n", "\n", "Besides, you can also directly download the propressed data from figshare ([bulk rna-seq](https://figshare.com/ndownloader/files/59192924) and [scRNA-seq](https://figshare.com/ndownloader/files/59192927)).\n" ] }, { "cell_type": "code", "execution_count": 123, "id": "9885f998-eb6f-4c9a-803f-c8249431ce78", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[95m𧬠Loading COVID-19 PBMC bulk data\u001b[0m\n", "\u001b[94mπ Downloading data to ./data/COVID_PBMC_bulk.h5ad\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[92mDownloading\u001b[0m: 100%|\u001b[32mββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\u001b[0m| 4.50M/4.50M [00:02<00:00, 1.51MB/s]\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[92mβ Download completed\u001b[0m\n", "\u001b[96m Loading data from ./data/COVID_PBMC_bulk.h5ad\u001b[0m\n", "\u001b[92mβ Successfully loaded: 34 cells Γ 60683 genes\u001b[0m\n" ] }, { "data": { "text/plain": [ "AnnData object with n_obs Γ n_vars = 34 Γ 60683\n", " obs: 'days_post_symptom_onset', 'gender', 'disease_state', 'severity', 'location', 'source'" ] }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bulk_ad=ov.datasets.decov_bulk_covid_bulk()\n", "bulk_ad" ] }, { "cell_type": "markdown", "id": "e2809964-bd1a-4604-b31d-b7d4a92eadb9", "metadata": {}, "source": [ "
Note
\n", "\n", " \"The obs field can be left blank. If you have a bulk RNA-seq matrix, you can use ov.AnnData(count) to convert it to AnnData format. Just ensure that obs contains sample names and var contains gene names.\"\n", "
\n", "Note
\n", "\n", " \"It is important to note that our data are all raw counts. If your single-cell data have undergone log1p transformation, please use the `omicverse.pp.recover_counts` function to restore the raw expression matrix.\"\n", "
\n", "Note
\n", "\n", " \"The key `celltype_key` and `cellstate_key` can be set as same. This parameter is only used in the Bayesprime method; other methods do not involve minor cell types.\"\n", "
\n", "Note
\n", "\n", " It is important to note that when we set the `fast_mode` parameter to True, we are actually invoking InstaPrime. In fact, the similarity between the two is as high as 0.99. If you wish to invoke BayesPrime, simply set `fast_mode=False`.\n", "
\n", "| \n", " | B | \n", "CD4 T | \n", "CD8 T | \n", "CD14 Monocyte | \n", "CD16 Monocyte | \n", "DC | \n", "Granulocyte | \n", "NK | \n", "PB | \n", "Platelet | \n", "RBC | \n", "gd T | \n", "pDC | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S145_nCOV001_C | \n", "0.007152 | \n", "0.045643 | \n", "0.028839 | \n", "0.350588 | \n", "0.358540 | \n", "0.063780 | \n", "0.005991 | \n", "0.102774 | \n", "0.003153 | \n", "0.019414 | \n", "2.157114e-05 | \n", "0.000909 | \n", "0.013197 | \n", "
| S147_nCoV001EUHM-Draw-1 | \n", "0.042290 | \n", "0.064573 | \n", "0.145169 | \n", "0.207204 | \n", "0.209637 | \n", "0.064921 | \n", "0.014626 | \n", "0.176877 | \n", "0.046780 | \n", "0.014637 | \n", "1.785132e-10 | \n", "0.003598 | \n", "0.009689 | \n", "
| S149_nCoV002EUHM-Draw-2 | \n", "0.044652 | \n", "0.002136 | \n", "0.026601 | \n", "0.435803 | \n", "0.268785 | \n", "0.032620 | \n", "0.017837 | \n", "0.083378 | \n", "0.048856 | \n", "0.034941 | \n", "1.004674e-09 | \n", "0.000243 | \n", "0.004149 | \n", "
| S150_nCoV003EUHM-Draw-1 | \n", "0.016705 | \n", "0.004011 | \n", "0.105450 | \n", "0.345644 | \n", "0.154792 | \n", "0.024709 | \n", "0.012880 | \n", "0.160400 | \n", "0.152966 | \n", "0.018139 | \n", "5.689293e-04 | \n", "0.000129 | \n", "0.003605 | \n", "
| S151_nCoV004EUHM-Draw-1 | \n", "0.029047 | \n", "0.000164 | \n", "0.037455 | \n", "0.610666 | \n", "0.127016 | \n", "0.009614 | \n", "0.009160 | \n", "0.023490 | \n", "0.087138 | \n", "0.062375 | \n", "2.029203e-03 | \n", "0.000016 | \n", "0.001830 | \n", "
Note
\n", "\n", " Since Scaden is a deep learning-based approach, we need to configure the device for running torch, including Nvidia's CUDA, Apple's MPS, or running directly on the CPU.\n", "
\n", "Note
\n", "\n", " It is crucial to note that while the original implementation of scaden supports TPM, RPKM, and similar metrics, we strongly recommend using raw count data to ensure algorithm consistency.\n", "
\n", "