{ "cells": [ { "cell_type": "markdown", "id": "b44a8308-9153-4b31-bdf8-5e94134f5ad0", "metadata": {}, "source": [ "# Preprocessing the data of scRNA-seq with omicverse[CPU-GPU-mixed]\n", "\n", "The count table, a numeric matrix of genesโรโcells, is the basic input data structure in the analysis of single-cell RNA-sequencing data. A common preprocessing step is to adjust the counts for variable sampling efficiency and to transform them so that the variance is similar across the dynamic range. \n", "\n", "Suitable methods to preprocess the scRNA-seq is important. Here, we introduce some preprocessing step to help researchers can perform downstream analysis easyier.\n", "\n", "User can compare our tutorial with [scanpy'tutorial](https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html) to learn how to use omicverse well\n", "\n", "Colab_Reproducibility๏ผhttps://colab.research.google.com/drive/1DXLSls_ppgJmAaZTUvqazNC_E7EDCxUe?usp=sharing" ] }, { "cell_type": "code", "execution_count": 1, "id": "c2d6c1db-2fc7-4c8e-abd9-af9c126cbca3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐ฌ Starting plot initialization...\n", "Using already downloaded Arial font from: /tmp/omicverse_arial.ttf\n", "Registered as: Arial\n", "๐งฌ Detecting GPU devicesโฆ\n", "โ NVIDIA CUDA GPUs detected: 1\n", " โข [CUDA 0] NVIDIA L40S\n", " Memory: 44.5 GB | Compute: 8.9\n", "\n", " ____ _ _ __ \n", " / __ \\____ ___ (_)___| | / /__ _____________ \n", " / / / / __ `__ \\/ / ___/ | / / _ \\/ ___/ ___/ _ \\ \n", "/ /_/ / / / / / / / /__ | |/ / __/ / (__ ) __/ \n", "\\____/_/ /_/ /_/_/\\___/ |___/\\___/_/ /____/\\___/ \n", "\n", "๐ Version: 1.7.9 ๐ Tutorials: https://omicverse.readthedocs.io/\n", "โ plot_set complete.\n", "\n" ] } ], "source": [ "import scanpy as sc\n", "import omicverse as ov\n", "ov.plot_set(font_path='Arial')\n", "\n", "# Enable auto-reload for development\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "id": "ba8de1ce-2b12-4210-b86f-15b4d782c8d4", "metadata": {}, "source": [ "
Note
\n", "\n", " โWhen OmicVerse is upgraded to version > 1.7.0, it supports CPUโGPU mixed acceleration without requiring `rapids_singlecell` as a dependencyโenjoy faster single-cell analysis!โ\n", "\n", "
\n", "Note
\n", "\n", " if the version of `omicverse` larger than `1.6.4`, the `doublets_method` can be set between `scrublet` and `sccomposite`.\n", "
\n", "Note
\n", "\n", " if the version of `omicverse` lower than `1.4.13`, the mode can only be set between `scanpy` and `pearson`.\n", "
\n", "| \n", " | AAACATACAACCAC-1 | \n", "AAACATTGAGCTAC-1 | \n", "AAACATTGATCAGC-1 | \n", "AAACCGTGCTTCCG-1 | \n", "AAACCGTGTATGCG-1 | \n", "AAACGCACTGGTAC-1 | \n", "AAACGCTGACCAGT-1 | \n", "AAACGCTGGTTCTT-1 | \n", "AAACGCTGTAGCCA-1 | \n", "AAACGCTGTTTCTG-1 | \n", "... | \n", "TTTCAGTGTCACGA-1 | \n", "TTTCAGTGTCTATC-1 | \n", "TTTCAGTGTGCAGT-1 | \n", "TTTCCAGAGGTGAG-1 | \n", "TTTCGAACACCTGA-1 | \n", "TTTCGAACTCTCAT-1 | \n", "TTTCTACTGAGGCA-1 | \n", "TTTCTACTTCCTCG-1 | \n", "TTTGCATGAGAGGC-1 | \n", "TTTGCATGCCTCAC-1 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CD3D | \n", "6.718757 | \n", "0.0 | \n", "7.371373 | \n", "0.0 | \n", "0.0 | \n", "5.447429 | \n", "6.132899 | \n", "6.499361 | \n", "5.974209 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "6.532146 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "6.224622 | \n", "
1 rows ร 2661 columns
\n", "| \n", " | AAACATACAACCAC-1 | \n", "AAACATTGAGCTAC-1 | \n", "AAACATTGATCAGC-1 | \n", "AAACCGTGCTTCCG-1 | \n", "AAACCGTGTATGCG-1 | \n", "AAACGCACTGGTAC-1 | \n", "AAACGCTGACCAGT-1 | \n", "AAACGCTGGTTCTT-1 | \n", "AAACGCTGTAGCCA-1 | \n", "AAACGCTGTTTCTG-1 | \n", "... | \n", "TTTCAGTGTCACGA-1 | \n", "TTTCAGTGTCTATC-1 | \n", "TTTCAGTGTGCAGT-1 | \n", "TTTCCAGAGGTGAG-1 | \n", "TTTCGAACACCTGA-1 | \n", "TTTCGAACTCTCAT-1 | \n", "TTTCTACTGAGGCA-1 | \n", "TTTCTACTTCCTCG-1 | \n", "TTTGCATGAGAGGC-1 | \n", "TTTGCATGCCTCAC-1 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CD3D | \n", "4.0 | \n", "0.0 | \n", "10.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "2.0 | \n", "3.0 | \n", "1.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "3.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "2.0 | \n", "
1 rows ร 2667 columns
\n", "| \n", " | AAACATACAACCAC-1 | \n", "AAACATTGAGCTAC-1 | \n", "AAACATTGATCAGC-1 | \n", "AAACCGTGCTTCCG-1 | \n", "AAACCGTGTATGCG-1 | \n", "AAACGCACTGGTAC-1 | \n", "AAACGCTGACCAGT-1 | \n", "AAACGCTGGTTCTT-1 | \n", "AAACGCTGTAGCCA-1 | \n", "AAACGCTGTTTCTG-1 | \n", "... | \n", "TTTCAGTGTCACGA-1 | \n", "TTTCAGTGTCTATC-1 | \n", "TTTCAGTGTGCAGT-1 | \n", "TTTCCAGAGGTGAG-1 | \n", "TTTCGAACACCTGA-1 | \n", "TTTCGAACTCTCAT-1 | \n", "TTTCTACTGAGGCA-1 | \n", "TTTCTACTTCCTCG-1 | \n", "TTTGCATGAGAGGC-1 | \n", "TTTGCATGCCTCAC-1 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CD3D | \n", "4 | \n", "0 | \n", "9 | \n", "0 | \n", "0 | \n", "1 | \n", "2 | \n", "2 | \n", "1 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "
1 rows ร 2654 columns
\n", "| \n", " | group | \n", "rank | \n", "names | \n", "scores | \n", "logfoldchanges | \n", "pvals | \n", "pvals_adj | \n", "pct_group | \n", "pct_rest | \n", "
|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "0 | \n", "1 | \n", "LYZ | \n", "30.139215 | \n", "8.308955 | \n", "1.485183e-199 | \n", "2.036780e-195 | \n", "1.000000 | \n", "0.536060 | \n", "
| 1 | \n", "0 | \n", "2 | \n", "S100A9 | \n", "29.859339 | \n", "10.341062 | \n", "6.640705e-196 | \n", "4.553531e-192 | \n", "0.997416 | \n", "0.245822 | \n", "
| 2 | \n", "0 | \n", "3 | \n", "S100A8 | \n", "28.556858 | \n", "10.218678 | \n", "2.308892e-179 | \n", "1.055472e-175 | \n", "0.968992 | \n", "0.156552 | \n", "
| 3 | \n", "0 | \n", "4 | \n", "FCN1 | \n", "28.287178 | \n", "8.907052 | \n", "4.969369e-176 | \n", "1.703748e-172 | \n", "0.976744 | \n", "0.176781 | \n", "
| 4 | \n", "0 | \n", "5 | \n", "TYROBP | \n", "28.184223 | \n", "8.633690 | \n", "9.127943e-175 | \n", "2.503612e-171 | \n", "1.000000 | \n", "0.292436 | \n", "
| 5 | \n", "0 | \n", "6 | \n", "CST3 | \n", "28.036358 | \n", "8.869659 | \n", "5.858849e-173 | \n", "1.339138e-169 | \n", "1.000000 | \n", "0.293316 | \n", "
| 6 | \n", "0 | \n", "7 | \n", "LGALS2 | \n", "27.883018 | \n", "9.479619 | \n", "4.287074e-171 | \n", "8.398990e-168 | \n", "0.937984 | \n", "0.088830 | \n", "
| 7 | \n", "0 | \n", "8 | \n", "S100A6 | \n", "27.122936 | \n", "4.239765 | \n", "5.282608e-162 | \n", "8.049520e-159 | \n", "1.000000 | \n", "0.743184 | \n", "
| 8 | \n", "0 | \n", "9 | \n", "GSTP1 | \n", "26.817543 | \n", "6.501383 | \n", "2.017550e-158 | \n", "2.766869e-155 | \n", "0.976744 | \n", "0.367194 | \n", "
| 9 | \n", "0 | \n", "10 | \n", "FTL | \n", "26.692188 | \n", "3.368342 | \n", "5.800019e-157 | \n", "7.231042e-154 | \n", "1.000000 | \n", "0.985488 | \n", "
| \n", " | group | \n", "rank | \n", "names | \n", "scores | \n", "logfoldchanges | \n", "pvals | \n", "pvals_adj | \n", "pct_group | \n", "pct_rest | \n", "
|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "0 | \n", "1 | \n", "LYZ | \n", "30.139215 | \n", "8.308955 | \n", "1.485183e-199 | \n", "2.036780e-195 | \n", "1.000000 | \n", "0.536060 | \n", "
| 1 | \n", "0 | \n", "2 | \n", "S100A9 | \n", "29.859339 | \n", "10.341062 | \n", "6.640705e-196 | \n", "4.553531e-192 | \n", "0.997416 | \n", "0.245822 | \n", "
| 2 | \n", "0 | \n", "3 | \n", "S100A8 | \n", "28.556858 | \n", "10.218678 | \n", "2.308892e-179 | \n", "1.055472e-175 | \n", "0.968992 | \n", "0.156552 | \n", "
| 3 | \n", "0 | \n", "4 | \n", "FCN1 | \n", "28.287178 | \n", "8.907052 | \n", "4.969369e-176 | \n", "1.703748e-172 | \n", "0.976744 | \n", "0.176781 | \n", "
| 4 | \n", "0 | \n", "5 | \n", "TYROBP | \n", "28.184223 | \n", "8.633690 | \n", "9.127943e-175 | \n", "2.503612e-171 | \n", "1.000000 | \n", "0.292436 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 195 | \n", "9 | \n", "16 | \n", "RPS27 | \n", "3.653990 | \n", "0.227837 | \n", "2.581968e-04 | \n", "1.744291e-02 | \n", "0.984252 | \n", "0.994475 | \n", "
| 196 | \n", "9 | \n", "17 | \n", "MT-CYB | \n", "3.345087 | \n", "-0.051800 | \n", "8.225686e-04 | \n", "4.283594e-02 | \n", "0.874016 | \n", "0.935280 | \n", "
| 197 | \n", "9 | \n", "18 | \n", "RPS3 | \n", "3.297153 | \n", "0.106617 | \n", "9.767012e-04 | \n", "4.870720e-02 | \n", "0.984252 | \n", "0.995264 | \n", "
| 198 | \n", "9 | \n", "19 | \n", "RPS16 | \n", "3.260641 | \n", "0.137249 | \n", "1.111605e-03 | \n", "5.405871e-02 | \n", "0.976378 | \n", "0.985793 | \n", "
| 199 | \n", "9 | \n", "20 | \n", "RPL31 | \n", "3.222946 | \n", "0.402909 | \n", "1.268796e-03 | \n", "5.918458e-02 | \n", "0.984252 | \n", "0.971192 | \n", "
200 rows ร 9 columns
\n", "