Reference-free automated single-cell cell type annotation¶

By 2025, algorithms for automated cell type annotation have proliferated. Omicverse is committed to reducing discrepancies between different algorithms, so we categorize automated annotation methods into two groups: with single-cell reference and without single-cell reference. Each category has its own advantages and disadvantages. In this tutorial, we will only cover usage and will not compare different algorithms.

This chapter focuses on no single-cell reference approaches, meaning cell type annotation can be performed without downloading existing single-cell datasets.

In [1]:

Copied!





import scanpy as sc
import omicverse as ov
ov.plot_set(font_path='Arial')

# Enable auto-reload for development
%load_ext autoreload
%autoreload 2
import scanpy as sc
import omicverse as ov
ov.plot_set(font_path='Arial')

# Enable auto-reload for development
%load_ext autoreload
%autoreload 2

🔬 Starting plot initialization...
Using already downloaded Arial font from: /tmp/omicverse_arial.ttf

/home/groups/xiaojie/steorra/env/omicverse/lib/python3.10/site-packages/IPython/core/pylabtools.py:77: DeprecationWarning: backend2gui is deprecated since IPython 8.24, backends are managed in matplotlib and can be externally registered.
  warnings.warn(

Registered as: Arial
🧬 Detecting GPU devices…
✅ NVIDIA CUDA GPUs detected: 1
    • [CUDA 0] NVIDIA H100 80GB HBM3
      Memory: 79.1 GB | Compute: 9.0

   ____            _     _    __                  
  / __ \____ ___  (_)___| |  / /__  _____________ 
 / / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \ 
/ /_/ / / / / / / / /__ | |/ /  __/ /  (__  )  __/ 
\____/_/ /_/ /_/_/\___/ |___/\___/_/  /____/\___/                                              

🔖 Version: 1.7.8rc2   📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.

Data preprocess¶

Load Dataset¶

To quickly demonstrate our capability for reference-free cell type annotation, we utilize the classic pbmc3k dataset. You can import it directly using omicverse.datasets.pbmc3k or download it via the link: https://falexwolf.de/data/pbmc3k_raw.h5ad.

In [2]:

Copied!

adata=ov.datasets.pbmc3k()
adata
adata=ov.datasets.pbmc3k()
adata

 Loading PBMC 3k dataset (raw)
🔍 Downloading data to ./data/pbmc3k_raw.h5ad

[92mDownloading: 100%|█████████▉| 5.85M/5.86M [00:02<00:00, 2.90MB/s]

✅ Download completed
 Loading data from ./data/pbmc3k_raw.h5ad
✅ Successfully loaded: 2700 cells × 32738 genes

Out[2]:

AnnData object with n_obs × n_vars = 2700 × 32738
    var: 'gene_ids'

Lazy Preprocess¶

Since the single dataset lacks batch effects, we directly applied the default processing workflow from omicverse for preprocessing.

In [3]:

Copied!





#quantity control
adata=ov.pp.qc(adata,
              tresh={'mito_perc': 0.05, 'nUMIs': 500, 'detected_genes': 250})
#normalize and high variable genes (HVGs) calculated
adata=ov.pp.preprocess(adata,mode='shiftlog|pearson',n_HVGs=2000,target_sum=1e4)

#save the whole genes and filter the non-HVGs
adata.raw = adata
adata = adata[:, adata.var.highly_variable_features]

#scale the adata.X
ov.pp.scale(adata)

#Dimensionality Reduction
ov.pp.pca(adata,layer='scaled',n_pcs=50)

#Neighbourhood graph construction
ov.pp.neighbors(adata, n_neighbors=15, n_pcs=50,
               use_rep='scaled|original|X_pca')

#clusters
ov.pp.leiden(adata)

#Dimensionality Reduction for visualization(X_mde=X_umap+GPU)
ov.pp.umap(adata)
adata
#quantity control
adata=ov.pp.qc(adata,
              tresh={'mito_perc': 0.05, 'nUMIs': 500, 'detected_genes': 250})
#normalize and high variable genes (HVGs) calculated
adata=ov.pp.preprocess(adata,mode='shiftlog|pearson',n_HVGs=2000,target_sum=1e4)

#save the whole genes and filter the non-HVGs
adata.raw = adata
adata = adata[:, adata.var.highly_variable_features]

#scale the adata.X
ov.pp.scale(adata)

#Dimensionality Reduction
ov.pp.pca(adata,layer='scaled',n_pcs=50)

#Neighbourhood graph construction
ov.pp.neighbors(adata, n_neighbors=15, n_pcs=50,
               use_rep='scaled|original|X_pca')

#clusters
ov.pp.leiden(adata)

#Dimensionality Reduction for visualization(X_mde=X_umap+GPU)
ov.pp.umap(adata)
adata

🖥️ Using CPU mode for QC...

📊 Step 1: Calculating QC Metrics

   ✓ Gene Family Detection:
   ┌──────────────────────────────┬────────────────────┬────────────────────┐
   │ Gene Family                  │ Genes Found        │ Detection Method   │
   ├──────────────────────────────┼────────────────────┼────────────────────┤
   │ Mitochondrial                │ 13                 │ Auto (MT-)         │
   ├──────────────────────────────┼────────────────────┼────────────────────┤
   │ Ribosomal                    │ 106                │ Auto (RPS/RPL)     │
   ├──────────────────────────────┼────────────────────┼────────────────────┤
   │ Hemoglobin                   │ 13                 │ Auto (regex)       │
   └──────────────────────────────┴────────────────────┴────────────────────┘

   ✓ QC Metrics Summary:
   ┌─────────────────────────┬────────────────────┬─────────────────────────┐
   │ Metric                  │ Mean               │ Range (Min - Max)       │
   ├─────────────────────────┼────────────────────┼─────────────────────────┤
   │ nUMIs                   │ 2367               │ 548 - 15844             │
   ├─────────────────────────┼────────────────────┼─────────────────────────┤
   │ Detected Genes          │ 847                │ 212 - 3422              │
   ├─────────────────────────┼────────────────────┼─────────────────────────┤
   │ Mitochondrial %         │ 2.2%               │ 0.0% - 22.6%            │
   ├─────────────────────────┼────────────────────┼─────────────────────────┤
   │ Ribosomal %             │ 34.9%              │ 1.1% - 59.4%            │
   ├─────────────────────────┼────────────────────┼─────────────────────────┤
   │ Hemoglobin %            │ 0.0%               │ 0.0% - 1.4%             │
   └─────────────────────────┴────────────────────┴─────────────────────────┘

   📈 Original cell count: 2,700

🔧 Step 2: Quality Filtering (SEURAT)
   Thresholds: mito≤0.05, nUMIs≥500, genes≥250
   📊 Seurat Filter Results:
     • nUMIs filter (≥500): 0 cells failed (0.0%)
     • Genes filter (≥250): 3 cells failed (0.1%)
     • Mitochondrial filter (≤0.05): 57 cells failed (2.1%)
   ✓ Filters applied successfully
   ✓ Combined QC filters: 60 cells removed (2.2%)

🎯 Step 3: Final Filtering
   Parameters: min_genes=200, min_cells=3
   Ratios: max_genes_ratio=1, max_cells_ratio=1
filtered out 19041 genes that are detected in less than 3 cells
   ✓ Final filtering: 0 cells, 19,041 genes removed

🔍 Step 4: Doublet Detection
   ⚠️  Note: 'scrublet' detection is too old and may not work properly
   💡 Consider using 'doublets_method=sccomposite' for better results
   🔍 Running scrublet doublet detection...

🔍 Running Scrublet Doublet Detection:
   Mode: cpu
   Computing doublet prediction using Scrublet algorithm
   🔍 Filtering genes and cells...
   🔍 Normalizing data and selecting highly variable genes...

🔍 Count Normalization:
   Target sum: median
   Exclude highly expressed: False

✅ Count Normalization Completed Successfully!
   ✓ Processed: 2,640 cells × 13,697 genes
   ✓ Runtime: 0.00s

🔍 Highly Variable Genes Selection:
   Method: seurat
   ⚠️ Gene indices [7846] fell into a single bin: normalized dispersion set to 1
   💡 Consider decreasing `n_bins` to avoid this effect

✅ HVG Selection Completed Successfully!
   ✓ Selected: 1,738 highly variable genes out of 13,697 total (12.7%)
   ✓ Results added to AnnData object:
     • 'highly_variable': Boolean vector (adata.var)
     • 'means': Float vector (adata.var)
     • 'dispersions': Float vector (adata.var)
     • 'dispersions_norm': Float vector (adata.var)
   🔍 Simulating synthetic doublets...
   🔍 Normalizing observed and simulated data...

🔍 Count Normalization:
   Target sum: 1000000.0
   Exclude highly expressed: False

✅ Count Normalization Completed Successfully!
   ✓ Processed: 2,640 cells × 1,738 genes
   ✓ Runtime: 0.00s

🔍 Count Normalization:
   Target sum: 1000000.0
   Exclude highly expressed: False

✅ Count Normalization Completed Successfully!
   ✓ Processed: 5,280 cells × 1,738 genes
   ✓ Runtime: 0.01s
   🔍 Embedding transcriptomes using PCA...
   🔍 Calculating doublet scores...
    using data matrix X directly
   🔍 Calling doublets with threshold detection...
   📊 Automatic threshold: 0.239
   📈 Detected doublet rate: 1.6%
   🔍 Detectable doublet fraction: 39.5%
   📊 Overall doublet rate comparison:
     • Expected: 5.0%
     • Estimated: 4.0%

✅ Scrublet Analysis Completed Successfully!
   ✓ Results added to AnnData object:
     • 'doublet_score': Doublet scores (adata.obs)
     • 'predicted_doublet': Boolean predictions (adata.obs)
     • 'scrublet': Parameters and metadata (adata.uns)
   ✓ Scrublet completed: 42 doublets removed (1.6%)
🔍 [2025-11-03 14:00:25] Running preprocessing in 'cpu' mode...
Begin robust gene identification
    After filtration, 13697/13697 genes are kept.
    Among 13697 genes, 13697 genes are robust.
✅ Robust gene identification completed successfully.
Begin size normalization: shiftlog and HVGs selection pearson

🔍 Count Normalization:
   Target sum: 10000.0
   Exclude highly expressed: True
   Max fraction threshold: 0.2
   ⚠️ Excluding 0 highly-expressed genes from normalization computation
   Excluded genes: []

✅ Count Normalization Completed Successfully!
   ✓ Processed: 2,598 cells × 13,697 genes
   ✓ Runtime: 0.11s

🔍 Highly Variable Genes Selection (Experimental):
   Method: pearson_residuals
   Target genes: 2,000
   Theta (overdispersion): 100

✅ Experimental HVG Selection Completed Successfully!
   ✓ Selected: 2,000 highly variable genes out of 13,697 total (14.6%)
   ✓ Results added to AnnData object:
     • 'highly_variable': Boolean vector (adata.var)
     • 'highly_variable_rank': Float vector (adata.var)
     • 'highly_variable_nbatches': Int vector (adata.var)
     • 'highly_variable_intersection': Boolean vector (adata.var)
     • 'means': Float vector (adata.var)
     • 'variances': Float vector (adata.var)
     • 'residual_variances': Float vector (adata.var)
    Time to analyze data in cpu: 1.18 seconds.
✅ Preprocessing completed successfully.
    Added:
        'highly_variable_features', boolean vector (adata.var)
        'means', float vector (adata.var)
        'variances', float vector (adata.var)
        'residual_variances', float vector (adata.var)
        'counts', raw counts layer (adata.layers)
    End of size normalization: shiftlog and HVGs selection pearson
computing PCA🔍
    with n_comps=50
   🖥️ Using sklearn PCA for CPU computation
   🖥️ sklearn PCA backend: CPU computation
    finished✅ (0:00:00)
🖥️ Using Scanpy CPU to calculate neighbors...

🔍 K-Nearest Neighbors Graph Construction:
   Mode: cpu
   Neighbors: 15
   Method: umap
   Metric: euclidean
   Representation: scaled|original|X_pca
   PCs used: 50
computing neighbors
   🔍 Computing neighbor distances...
   🔍 Computing connectivity matrix...
   💡 Using UMAP-style connectivity
   ✓ Graph is fully connected

✅ KNN Graph Construction Completed Successfully!
   ✓ Processed: 2,598 cells with 15 neighbors each
   ✓ Results added to AnnData object:
     • 'neighbors': Neighbors metadata (adata.uns)
     • 'distances': Distance matrix (adata.obsp)
     • 'connectivities': Connectivity matrix (adata.obsp)
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:02)
running Leiden clustering
    finished: found 9 clusters and added
    'leiden', the cluster labels (adata.obs, categorical) (0:00:00)
🔍 [2025-11-03 14:00:30] Running UMAP in 'cpu' mode...
🖥️ Using Scanpy CPU UMAP...

🔍 UMAP Dimensionality Reduction:
   Mode: cpu
   Method: umap
   Components: 2
   Min distance: 0.5
{'n_neighbors': 15, 'method': 'umap', 'random_state': 0, 'metric': 'euclidean', 'use_rep': 'scaled|original|X_pca', 'n_pcs': 50}
   🔍 Computing UMAP parameters...
   🔍 Computing UMAP embedding (classic method)...

✅ UMAP Dimensionality Reduction Completed Successfully!
   ✓ Embedding shape: 2,598 cells × 2 dimensions
   ✓ Results added to AnnData object:
     • 'X_umap': UMAP coordinates (adata.obsm)
     • 'umap': UMAP parameters (adata.uns)
✅ UMAP completed successfully.

Out[3]:

AnnData object with n_obs × n_vars = 2598 × 2000
    obs: 'nUMIs', 'mito_perc', 'ribo_perc', 'hb_perc', 'detected_genes', 'cell_complexity', 'passing_mt', 'passing_nUMIs', 'passing_ngenes', 'n_genes', 'doublet_score', 'predicted_doublet', 'leiden'
    var: 'gene_ids', 'mt', 'ribo', 'hb', 'n_cells', 'percent_cells', 'robust', 'means', 'variances', 'residual_variances', 'highly_variable_rank', 'highly_variable_features'
    uns: 'scrublet', 'status', 'status_args', 'REFERENCE_MANU', 'log1p', 'hvg', 'pca', 'scaled|original|pca_var_ratios', 'scaled|original|cum_sum_eigenvalues', 'neighbors', 'leiden', 'umap'
    obsm: 'X_pca', 'scaled|original|X_pca', 'X_umap'
    varm: 'PCs', 'scaled|original|pca_loadings'
    layers: 'counts', 'scaled'
    obsp: 'distances', 'connectivities'

In [5]:

Copied!





ov.pl.umap(
    adata,
    color='leiden'
)
ov.pl.umap(
    adata,
    color='leiden'
)

Automated Annotation¶

We have unified all automatic annotation algorithms into the omicverse.single.Annotation class.

In [7]:

Copied!

obj=ov.single.Annotation(adata)
obj=ov.single.Annotation(adata)

Celltypist Automated Annotation¶

Here, we introduce the first algorithm, Celltypist, published in Cell and Science, which we have integrated into the automatic annotation module of Omicverse. It is important to note that to obtain the optimal pre-trained model, we have incorporated Agent for query processing.

In [8]:

Copied!





res=obj.query_reference(
    source='celltypist',
    data_desc='pbmc of human',
    llm_model='gpt-5-mini',
    llm_api_key='sk-*',
    llm_provider='openai',
    llm_base_url='https://api.openai.com/v1',
)
res.head()
res=obj.query_reference(
    source='celltypist',
    data_desc='pbmc of human',
    llm_model='gpt-5-mini',
    llm_api_key='sk-*',
    llm_provider='openai',
    llm_base_url='https://api.openai.com/v1',
)
res.head()

CellTypist model table saved to self.celltypist_models_df
✓ LLM-selected CellTypist models:
  - Immune_All_Low.pkl: Immune_All_Low.pkl
  - Immune_All_High.pkl: Immune_All_High.pkl
  - Healthy_COVID19_PBMC.pkl: Healthy_COVID19_PBMC.pkl
  - Adult_COVID19_PBMC.pkl: Adult_COVID19_PBMC.pkl
  - PaediatricAdult_COVID19_PBMC.pkl: PaediatricAdult_COVID19_PBMC.pkl

Out[8]:

	model	description	version	No_celltypes	source	date	default	llm_reason
0	Immune_All_Low.pkl	immune sub-populations combined from 20 tissue...	v2	98	https://doi.org/10.1126/science.abl5197	2022-07-16 00:20:42.927778	True	High-resolution immune reference (98 immune su...
1	Immune_All_High.pkl	immune populations combined from 20 tissues of...	v2	32	https://doi.org/10.1126/science.abl5197	2022-07-16 08:53:00.959521	NaN	Compact immune reference (32 broad immune popu...
2	Healthy_COVID19_PBMC.pkl	peripheral blood mononuclear cell types from h...	v1	51	https://doi.org/10.1038/s41591-021-01329-2	2022-03-10 05:08:08.224597	NaN	PBMC-specific reference derived from healthy a...
3	Adult_COVID19_PBMC.pkl	peripheral blood mononuclear cell types from C...	v1	20	https://doi.org/10.1038/s41591-020-0944-y	2024-06-24 19:37:48.634397	NaN	PBMC reference from adult human donors (COVID-...
4	PaediatricAdult_COVID19_PBMC.pkl	peripheral blood mononuclear cell types of pae...	v1	42	https://doi.org/10.1038/s41586-021-04345-x	2025-10-15 00:51:41.857714	NaN	PBMC reference spanning pediatric and adult hu...

Based on the LLM's recommendation, we found that Immune_All_Low.pkl is the model best suited for our data. Then we use download_reference_pkl function to download this model.

In [9]:

Copied!

!pwd
!pwd

/scratch/users/steorra/analysis/omic_test

In [12]:

Copied!





obj.download_reference_pkl(
    'Immune_All_Low.pkl',
    save_path="/scratch/users/steorra/analysis/omic_test/models/Immune_All_Low.pkl",
    #force_download=True
)
obj.download_reference_pkl(
    'Immune_All_Low.pkl',
    save_path="/scratch/users/steorra/analysis/omic_test/models/Immune_All_Low.pkl",
    #force_download=True
)

🔍 Downloading data to /scratch/users/steorra/analysis/omic_test/models/Immune_All_Low.pkl

Downloading: 100%|█████████▉| 2.82M/2.82M [00:01<00:00, 1.90MB/s]

✅ Download completed
https://celltypist.cog.sanger.ac.uk/models/Pan_Immune_CellTypist/v2/Immune_All_Low.pkl
✓ Model saved to /scratch/users/steorra/analysis/omic_test/models/Immune_All_Low.pkl

Out[12]:

'/scratch/users/steorra/analysis/omic_test/models/Immune_All_Low.pkl'

After download the model, we need to load it to our Annotation class.

In [14]:

Copied!

obj.add_reference_pkl('/scratch/users/steorra/analysis/omic_test/models/Immune_All_Low.pkl')
obj.add_reference_pkl('/scratch/users/steorra/analysis/omic_test/models/Immune_All_Low.pkl')

In [16]:

Copied!

obj.model.cell_types[:5]
obj.model.cell_types[:5]

Out[16]:

array(['Age-associated B cells', 'Alveolar macrophages', 'B cells',
       'CD16+ NK cells', 'CD16- NK cells'], dtype=object)

In [17]:

Copied!

obj.annotate(
    method='celltypist'
)
obj.annotate(
    method='celltypist'
)

WARNING:celltypist.logger:⚠️ Warning: invalid expression matrix, expect ALL genes and log1p normalized expression to 10000 counts per cell. The prediction result may not be accurate
running Leiden clustering
    finished: found 55 clusters and added
    'over_clustering', the cluster labels (adata.obs, categorical) (0:00:00)
Celltypist prediction saved to adata.obs['celltypist_prediction']
Celltypist decision matrix saved to adata.obsm['celltypist_decision_matrix']
Celltypist probability matrix saved to adata.obsm['celltypist_probability_matrix']

In [18]:

Copied!





ov.pl.embedding(
    obj.adata,
    basis='X_umap',
    color='celltypist_prediction'
)
ov.pl.embedding(
    obj.adata,
    basis='X_umap',
    color='celltypist_prediction'
)

gpt4celltype Automated Annotation¶

Besides, we also provide the gpt4celltype to annotate the celltype automatically.

In [20]:

Copied!





import os
os.environ['AGI_API_KEY'] = 'sk-*'  # Replace with your actual API key

obj=ov.single.Annotation(adata)
result = obj.annotate(
    method='gpt4celltype',
    tissuename='PBMC', speciename='human',
    model='gpt-5-mini', provider='openai',
    topgenenumber=5 
)
import os
os.environ['AGI_API_KEY'] = 'sk-*'  # Replace with your actual API key

obj=ov.single.Annotation(adata)
result = obj.annotate(
    method='gpt4celltype',
    tissuename='PBMC', speciename='human',
    model='gpt-5-mini', provider='openai',
    topgenenumber=5 
)

...get cell type marker
ranking genes
    finished: added to `.uns['rank_genes_groups']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:00)
Note: AGI API key found: returning the cell type annotations.
Note: It is always recommended to check the results returned by GPT-4 in case of AI hallucination, before going to downstream analysis.
GPT4celltype prediction saved to adata.obs['gpt4celltype_prediction']

In [21]:

Copied!





ov.pl.embedding(
    obj.adata,
    basis='X_umap',
    color='gpt4celltype_prediction'
)
ov.pl.embedding(
    obj.adata,
    basis='X_umap',
    color='gpt4celltype_prediction'
)

SCSA Automated Annotation¶

We haved a clearly detailed tutorial of SCSA in https://omicverse.readthedocs.io/en/latest/Tutorials-single/t_cellanno/

Here, we only provided a simple tutorial to demonstrate the ability of Annotation class.

In [ ]:

Copied!

obj=ov.single.Annotation(adata)
obj=ov.single.Annotation(adata)

To perform the SCSA automated annotation, we need to download the database at first.

In [22]:

Copied!

obj.download_scsa_db(
    'temp/pySCSA_2024_v1_plus.db'
)
obj.download_scsa_db(
    'temp/pySCSA_2024_v1_plus.db'
)

🔍 Downloading data to temp/pySCSA_2024_v1_plus.db

Downloading: 100%|█████████▉| 14.8M/14.8M [00:14<00:00, 1.03MB/s]

✅ Download completed
SCSA database saved to temp/pySCSA_2024_v1_plus.db

Out[22]:

'temp/pySCSA_2024_v1_plus.db'

In [23]:

Copied!

obj.add_reference_scsa_db(
    'temp/pySCSA_2024_v1_plus.db'
)
obj.add_reference_scsa_db(
    'temp/pySCSA_2024_v1_plus.db'
)

In [24]:

Copied!





obj.annotate(
    method='scsa',
    cluster_key='leiden',
    foldchange=1.5,
    pvalue=0.01,
    celltype='normal',
    target='cellmarker',
    tissue='All',  
)
obj.annotate(
    method='scsa',
    cluster_key='leiden',
    foldchange=1.5,
    pvalue=0.01,
    celltype='normal',
    target='cellmarker',
    tissue='All',  
)

ranking genes
    finished (0:00:00)
...Auto annotate cell
🔍 Version V2.2 [2024/12/18]
📊 DB load: GO_items:47347, Human_GO:3, Mouse_GO:3,
           CellMarkers:82887, CancerSEA:1574, PanglaoDB:24223
           Ensembl_HGNC:61541, Ensembl_Mouse:55414
<omicverse.single._SCSA.Annotator object at 0x7f65c53f6bc0>
🔍 Version V2.2 [2024/12/18]
📊 DB load: GO_items:47347, Human_GO:3, Mouse_GO:3,
           CellMarkers:82887, CancerSEA:1574, PanglaoDB:24223
           Ensembl_HGNC:61541, Ensembl_Mouse:55414
📦 Load markers: 70276

============================================================
🔬 Analyzing 9 clusters...
============================================================

[1/9]      Cluster 0    │ 48   genes │ 988  other genes
[2/9]      Cluster 1    │ 29   genes │ 1006 other genes
[3/9]      Cluster 2    │ 346  genes │ 930  other genes
[4/9]      Cluster 3    │ 118  genes │ 946  other genes
[5/9]      Cluster 4    │ 51   genes │ 1011 other genes
[6/9]      Cluster 5    │ 160  genes │ 934  other genes
[7/9]      Cluster 6    │ 429  genes │ 865  other genes
[8/9]      Cluster 7    │ 274  genes │ 890  other genes
[9/9]      Cluster 8    │ 144  genes │ 944  other genes

============================================================
✅ Cluster analysis completed! (9/9 processed)
============================================================


================================================================================
📋 Cell Type Annotation Results
================================================================================

Cluster    Type     Cell Type                           Score           Times
--------------------------------------------------------------------------------
0          ⚠️ ?      T cell|CD4+ T cell                  9.945870198596303|5.360011326945353 1.86
1          ⚠️ ?      T cell|Naive CD8+ T cell            5.451241689383974|4.768292656196209 1.14
2          ⚠️ ?      Monocyte|Macrophage                 14.354365574078278|8.528970464022539 1.68
3          ✅ Good   B cell                              13.78474042389334 4.02
4          ⚠️ ?      Natural killer cell|T cell          8.221301039648155|6.61215702423662 1.24
5          ✅ Good   Natural killer cell                 15.26958201763484 3.82
6          ⚠️ ?      Monocyte|Macrophage                 10.86288578272659|8.672538874890472 1.25
7          ⚠️ ?      Dendritic cell|Monocyte             9.461295981098543|5.912723904106833 1.60
8          ✅ Good   Megakaryocyte                       10.05563188309469 2.01
================================================================================

...cell type added to scsa_prediction on obs of anndata

In [25]:

Copied!





ov.pl.embedding(
    obj.adata,
    basis='X_umap',
    color='scsa_prediction'
)
ov.pl.embedding(
    obj.adata,
    basis='X_umap',
    color='scsa_prediction'
)

In [ ]: