omicverse.single.find_markers¶

omicverse.single.find_markers(adata, groupby, method='cosg', n_genes=50, key_added=None, use_raw=None, layer=None, groups='all', reference='rest', corr_method='benjamini-hochberg', rankby_abs=False, tie_correct=False, pts=True, **kwargs)[source]¶

Find marker genes for each cluster / group in single-cell data.

A unified wrapper supporting multiple algorithms. For statistical methods (t-test, wilcoxon, logreg) the implementation is ported directly from scanpy — no scanpy runtime dependency. Results are stored in adata.uns[key_added] using the same structured-array format as sc.tl.rank_genes_groups, so all downstream tools (including omicverse.single.get_markers() and omicverse.pl.markers_dotplot()) work out of the box.

Parameters:

adata (Annotated data matrix. Data must be log-normalised for) – statistical tests; raw counts are expected for method='cosg'.
groupby (Key in adata.obs to group cells by (e.g. 'leiden').)
method (Algorithm. One of:) –
- 'cosg' — cosine-similarity-based, fast, recommended for large datasets.
- 't-test' — Welch’s t-test.
- 't-test_overestim_var' — t-test with per-group variance overestimation (conservative).
- 'wilcoxon' — Wilcoxon rank-sum / Mann-Whitney U test.
- 'logreg' — logistic regression (requires scikit-learn).
Default: 'cosg'.
n_genes (Top marker genes per group to keep. Default: 50.)
key_added (Key in adata.uns to write results to.) – Default: 'rank_genes_groups'.
use_raw (Use adata.raw for expression values. None (default)) – means use raw if it exists (matching scanpy behaviour).
layer (Layer to use instead of adata.X. Default: None.)
groups (Groups to compute markers for — 'all' or a list of names.) – Default: 'all'.
reference (Reference group. 'rest' (default) compares each group) – against the union of all other cells; a group name restricts the comparison to that group only.
corr_method (Multiple-testing correction. 'benjamini-hochberg') – (default) or 'bonferroni'. Ignored for 'cosg' and 'logreg'.
rankby_abs (Rank genes by absolute score instead of raw score.) – Default: False.
tie_correct (Apply tie correction for 'wilcoxon'. Default: False.)
pts (Compute fraction of cells expressing each gene (stored as) – adata.uns[key_added]['pts']). Default: False.
**kwargs (Forwarded to the underlying method (e.g. mu for cosg,) – or sklearn parameters for logreg).

Return type:

None

Returns:

None. Results are written to adata.uns[key_added].
Examples – >>> import omicverse as ov >>> ov.single.find_markers(adata, groupby=’leiden’, method=’cosg’) >>> df = ov.single.get_markers(adata, n_genes=5) >>> ov.pl.markers_dotplot(adata, groupby=’leiden’, n_genes=5)