Consensus annotation with CellVote¶
CellVote combines results from multiple cell type annotation approaches using a simple majority vote. By aggregating predictions, inconsistent labels can be resolved and the most plausible identity can be assigned to each cluster.
Prerequisites¶
Clustering results should be available in
adata.obs(e.g. theleidenfield).At least two independent cell type annotation results need to be stored in
adata.obs. Typical methods includescsa_anno,scMulan_annoor GPT-based annotations such asgpt_celltype.A dictionary of marker genes for each cluster is required. You can generate this with
ov.single.get_celltype_marker.
Basic usage¶
import ov
# adata contains clustering results in "leiden"
cv = ov.single.CellVote(adata)
markers = ov.single.get_celltype_marker(adata)
cv.vote(
clusters_key="leiden",
cluster_markers=markers,
celltype_keys=["scsa_annotation", "scMulan_anno"],
)
print(adata.obs["CellVote_celltype"].value_counts())
The final consensus label is stored in adata.obs['CellVote_celltype'].
Example Notebook (PBMC3k)¶
For a complete, step-by-step walkthrough on PBMC3k, see the Jupyter notebook:
Tutorials-single/t_cellvote_pbmc3k.ipynb
It covers preprocessing, clustering, marker selection, simulated multi-method annotations, and both offline and optional online CellVote arbitration.
Advanced options¶
The vote method exposes a few additional arguments:
model,base_urlandproviderallow you to specify a large language model when using GPT-based annotation as one of the voting sources.result_keychanges the output column name.
cv.vote(
clusters_key="leiden",
cluster_markers=markers,
celltype_keys=["scsa_annotation", "gpt_celltype"],
model="gpt-3.5-turbo", # choose any model supported by your provider
provider="openai",
result_key="vote_label",
)
Tips¶
Ensure that the marker dictionary contains biologically meaningful genes to help resolve disagreements between annotation methods.
You can inspect
adata.obs[['scsa_annotation','scMulan_anno','CellVote_celltype']]to compare individual predictions with the final vote.Any number of annotation columns can be provided in
celltype_keys.