Cell Ontology (CL) (https://obofoundry.org/ontology/cl.html) is an ontology designed to classify and describe cell types across different organisms. It serves as a resource for model organism and bioinformatics databases. The ontology covers a broad range of cell types in animal cells, with over 2700 cell type classes, and provides high-level cell type classes as mapping points for cell type classes in ontologies representing other species, such as the Plant Ontology or Drosophila Anatomy Ontology. Integration with other ontologies such as Uberon, GO, CHEBI, PR, and PATO enables linking cell types to anatomical structures, biological processes, and other relevant concepts.
Cell Taxonomy (https://ngdc.cncb.ac.cn/celltaxonomy), a comprehensive and curated repository of cell types and associated cell markers encompassing a wide range of species, tissues and conditions. Combined with literature curation and data integration, the current version of Cell Taxonomy establishes a well-structured taxonomy for 3,143 cell types and houses a comprehensive collection of 26,613 associated cell markers in 257 conditions and 387 tissues across 34 species. Based on 4,299 publications and single-cell transcriptomic profiles of ∼3.5 million cells, Cell Taxonomy features multifaceted characterization for cell types and cell markers, involving quality assessment of cell markers and cell clusters, cross-species comparison, cell composition of tissues and cellular similarity based on markers.
Here we provide several functions that convert the cell names you annotated to their corresponding Cell Ontology/Taxonomy names and IDs.
All analysis are performed with omicverse.single.CellOntologyMapper class.
CellOntologyMapper: Zeng, Z., Wang, X., & Du, H. (2025). CellOntologyMapper: Consensus mapping of cell type annotation. bioRxiv, 2025-06.
But we have also provided a function named omicverse.single.download_cl() to download it automatically. The benefit of this function is that it can choose an appropriate source to download the file even if you encounter a network error.
There are some alternative links to download the file manual. :
Because the CellOntologyMapper rely on the NLP embedding model of SentenceTransformer. So we need to choose a NLP embedding from huggingface. Here are some recommendation.
📊 Using 8 unique cell names from column 'cell_label'
🔄 Loading model sentence-transformers/all-MiniLM-L6-v2...
🌐 Checking network connectivity...
✓ Network connection available
🇨🇳 Using HF-Mirror (hf-mirror.com) for faster downloads in China
📁 Models will be saved to: ./my_models
🪞 Downloading model from HF-Mirror: sentence-transformers/all-MiniLM-L6-v2
✓ Model loaded successfully from HF-Mirror!
🎯 Mapping 8 cell names...
📝 Applying mapping results to AnnData...
✓ Mapping completed: 7/8 cell names have high confidence mapping
In addition, we often use abbreviations to name our cell types, such as TA and TA.Early in our data sets. Calculating similarity to the Cell Ontology directly can be confusing, so we use an LLM to expand these abbreviated cell names.
To do so, specify the following arguments:
api_type: openai, anthropic, ollama, or any other API that follows the OpenAI format
tissue_context: the tissue source of the single-cell data set
species: the species from which the data set was derived
study_context: any additional information that may help the model expand the cell-type name
api_key: the apikey of your model.
mapper.setup_llm_expansion(api_type="openai",model='gpt-4o-2024-11-20',tissue_context="gut",# 组织上下文species="mouse",# 物种信息study_context="Epithelial cells from the small intestine and organoids of mice. Some of the cells were also subject to Salmonella or Heligmosomoides polygyrus infection",api_key="sk-*")
✓ Loaded 25 cached abbreviation expansions
✓ LLM expansion functionality setup complete (Type: openai, Model: gpt-4o-2024-11-20)
🧬 Tissue context: gut
🔬 Study context: Epithelial cells from the small intestine and organoids of mice. Some of the cells were also subject to Salmonella or Heligmosomoides polygyrus infection
🐭 Species: mouse
True
You can choose any other model api from the alternative provider, such as ohmygpt. But the format of openai should observe the rule of openai.
📊 Loading Cell Taxonomy resource from: new_ontology/Cell_Taxonomy_resource.txt
✓ Loaded 226222 taxonomy entries
🐭 Filtered by species ['Homo sapiens', 'Mus musculus']: 224736/226222 entries
🔄 Loading model sentence-transformers/all-MiniLM-L6-v2...
🌐 Checking network connectivity...
✓ Network connection available
🇨🇳 Using HF-Mirror (hf-mirror.com) for faster downloads in China
📁 Models will be saved to: ./my_models
🪞 Downloading model from HF-Mirror: sentence-transformers/all-MiniLM-L6-v2
✓ Model loaded successfully from HF-Mirror!
🧠 Creating embeddings for 2540 taxonomy cell types...
✓ Created taxonomy embeddings for 2540 cell types
📈 Species distribution:
🐭 Mus musculus: 141727 entries
🐭 Homo sapiens: 83009 entries
🧬 Unique cell types: 2540
🎯 Unique markers: 25818
True
Similiarly, we can use map_adata_with_taxonomy to perform the mapping.
🧬 Taxonomy cell types most similar to 'T helper cell':
1. Helper T cell (Similarity: 0.966)
🐭 Species: Mus musculus
🎯 Marker: Tigit
🆔 CT ID: CT:00000919
2. T-helper 1 cell (Similarity: 0.926)
🐭 Species: Homo sapiens
🎯 Marker: CXCR6
🆔 CT ID: CT:00000502
mapper.get_cell_info("regulatory T cell")
ℹ️ === regulatory T cell ===
🆔 Ontology ID: http://purl.obolibrary.org/obo/CL_0000815
📝 Description: regulatory T cell: A T cell which regulates overall immune responses as well as the responses of other T cell subsets through direct cell-cell contact and cytokine release. This cell type may express FoxP3 and CD25 and secretes IL-10 and TGF-beta.
{'name': 'regulatory T cell',
'description': 'regulatory T cell: A T cell which regulates overall immune responses as well as the responses of other T cell subsets through direct cell-cell contact and cytokine release. This cell type may express FoxP3 and CD25 and secretes IL-10 and TGF-beta.',
'ontology_id': 'http://purl.obolibrary.org/obo/CL_0000815'}
# 获取详细的taxonomy信息info_list=mapper.get_cell_info_taxonomy("Helper T cell",species="Mus musculus")
mapper.get_cell_info("regulatory T cells")
✗ Cell type not found: regulatory T cells
🔍 Found 0 cell types containing 'regulatory t cells':
📊 === Ontology Statistics ===
📝 Total cell types: 16841
📏 Average name length: 31.7 characters
📏 Shortest name length: 3 characters
📏 Longest name length: 144 characters
🔤 Most common words:
of: 5473 times
cell: 3857 times
regulation: 3168 times
negative: 1009 times
positive: 1003 times
process: 980 times
development: 875 times
differentiation: 727 times
muscle: 639 times
in: 571 times