Consensus annotation with CellVote¶
CellVote combines results from multiple cell type annotation approaches using a simple majority vote. By aggregating predictions, inconsistent labels can be resolved and the most plausible identity can be assigned to each cluster.
Prerequisites¶
- Clustering results should be available in adata.obs(e.g. theleidenfield).
- At least two independent cell type annotation results need to be stored in adata.obs. Typical methods includescsa_anno,scMulan_annoor GPT-based annotations such asgpt_celltype.
- A dictionary of marker genes for each cluster is required. You can generate this with ov.single.get_celltype_marker.
Basic usage¶
import ov
# adata contains clustering results in "leiden"
cv = ov.single.CellVote(adata)
markers = ov.single.get_celltype_marker(adata)
cv.vote(
    clusters_key="leiden",
    cluster_markers=markers,
    celltype_keys=["scsa_annotation", "scMulan_anno"],
)
print(adata.obs["CellVote_celltype"].value_counts())
The final consensus label is stored in adata.obs['CellVote_celltype'].
Advanced options¶
The vote method exposes a few additional arguments:
- model,- base_urland- providerallow you to specify a large language model when using GPT-based annotation as one of the voting sources.
- result_keychanges the output column name.
cv.vote(
    clusters_key="leiden",
    cluster_markers=markers,
    celltype_keys=["scsa_annotation", "gpt_celltype"],
    model="gpt-3.5-turbo",  # choose any model supported by your provider
    provider="openai",
    result_key="vote_label",
)
Tips¶
- Ensure that the marker dictionary contains biologically meaningful genes to help resolve disagreements between annotation methods.
- You can inspect adata.obs[['scsa_annotation','scMulan_anno','CellVote_celltype']]to compare individual predictions with the final vote.
- Any number of annotation columns can be provided in celltype_keys.