omicverse.pp.scrublet¶
- omicverse.pp.scrublet(adata, adata_sim=None, *, batch_key=None, sim_doublet_ratio=2.0, expected_doublet_rate=0.05, stdev_doublet_rate=0.02, synthetic_doublet_umi_subsampling=1.0, knn_dist_metric='euclidean', normalize_variance=True, log_transform=False, mean_center=True, n_prin_comps=30, use_approx_neighbors=None, get_doublet_neighbor_parents=False, n_neighbors=None, threshold=None, verbose=True, copy=False, random_state=0, use_gpu=False)[source]¶
Predict cell doublets using Scrublet with optional GPU acceleration.
- Parameters:
adata (AnnData) – Observed count matrix (cells x genes) for doublet calling.
adata_sim (AnnData, optional) – Optional pre-simulated doublet AnnData. If
None, synthetic doublets are generated fromadata.batch_key (str, optional) – Batch column in
adata.obsfor per-batch doublet calling.sim_doublet_ratio (float, default=2.0) – Number of synthetic doublets relative to observed cells.
expected_doublet_rate (float, default=0.05) – Prior expected doublet fraction for the experiment.
stdev_doublet_rate (float, default=0.02) – Standard deviation around the expected doublet rate prior.
synthetic_doublet_umi_subsampling (float, default=1.0) – UMI subsampling fraction used during synthetic doublet simulation.
knn_dist_metric (str or callable, default="euclidean") – Distance metric used for nearest-neighbor graph construction.
normalize_variance (bool, default=True) – Whether to variance-normalize features before PCA.
log_transform (bool, default=False) – Whether to log-transform values before PCA.
mean_center (bool, default=True) – Whether to center features before dimensionality reduction.
n_prin_comps (int, default=30) – Number of principal components used for neighbor graph building.
use_approx_neighbors (bool, optional) – Whether to use approximate nearest neighbors.
get_doublet_neighbor_parents (bool, default=False) – Whether to store parent cell identities of doublet neighbors.
n_neighbors (int, optional) – Number of neighbors for the classifier graph.
threshold (float, optional) – Manual threshold for calling predicted doublets.
verbose (bool, default=True) – Whether to print progress logs.
copy (bool, default=False) – Return a copied AnnData instead of modifying in place.
random_state (int or RandomState, default=0) – Random seed/state for reproducibility.
use_gpu (bool, default=False) – Whether to use GPU-accelerated PCA path when available.
- Returns:
Returns modified AnnData when
copy=True; otherwise updates in place and returnsNone.- Return type:
AnnData or None