omicverse.single.pyTOSICA

omicverse.single.pyTOSICA(adata: AnnData, project_path: str, gmt_path=None, label_name: str = 'Celltype', mask_ratio: float = 0.015, max_g: int = 300, max_gs: int = 300, n_unannotated: int = 1, embed_dim: int = 48, depth: int = 1, num_heads: int = 4, batch_size: int = 8, device: str = 'cuda:0') None[source]

TOSICA wrapper for pathway-informed transformer-based cell-type annotation.

Parameters:
  • adata (anndata.AnnData) – Training/reference AnnData with labels.

  • project_path (str) – Output directory for TOSICA checkpoints and logs.

  • gmt_path (str|None, optional, default=None) – Pathway GMT file path. If None, default gene-set resources are used.

  • label_name (str, optional, default='Celltype') – Label column in adata.obs.

  • mask_ratio (float, optional, default=0.015) – Ratio of masked genes/tokens used for training regularization.

  • max_g (int, optional, default=300) – Maximum number of genes used per pathway/tokenization unit.

  • max_gs (int, optional, default=300) – Maximum number of gene sets used in the model.

  • n_unannotated (int, optional, default=1) – Number of unlabeled classes reserved during training.

  • embed_dim (int, optional, default=48) – Transformer embedding dimension.

  • depth (int, optional, default=1) – Number of transformer encoder layers.

  • num_heads (int, optional, default=4) – Number of attention heads.

  • batch_size (int, optional, default=8) – Mini-batch size used during training/inference.

  • device (str, optional, default='cuda:0') – Device used for model training/inference.

Returns:

Initializes TOSICA model configuration and training resources.

Return type:

None

Examples

>>> tosica_obj = ov.single.pyTOSICA(adata=ref_adata, project_path="./tosica")