User API¶
Import OmicVerse as:
import omicverse as ov
Data IO¶
Read common omics file formats into AnnData or pandas DataFrame. |
|
Read an |
|
Read an |
|
Read a 10x Genomics HDF5 matrix file. |
|
Read a 10x Genomics Matrix Market directory. |
|
Read Nanostring formatted dataset. |
|
Read 10x Visium HD outputs with a single entry point. |
Preprocessing (pp)¶
Quality control & filtering
Perform quality control on a dictionary of AnnData objects. |
|
Filter cell outliers based on counts and numbers of genes expressed. |
|
Filter genes based on number of cells or counts. |
|
Predict cell doublets using Scrublet with optional GPU acceleration. |
Normalisation & feature selection
Normalize counts per cell. |
|
Log-transform expression values with |
|
Run HVG detection and write flags/statistics into |
|
Select highly variable features (HVF/HVG) for downstream modeling. |
|
Normalize count matrix using Pearson residuals. |
|
Given log-normalized gene expression data, recover the raw read/UMI counts by inferring the unknown size factors. |
Dimensionality reduction & graph
Performs Principal Component Analysis (PCA) on the data stored in a scanpy AnnData object. |
|
Compute a neighborhood graph of observations [McInnes18]. |
|
Run UMAP on AnnData, choosing implementation based on settings.mode, The argument could be found in scanpy.pp.umap |
|
Compute t-SNE coordinates for cells. |
|
Run MDE (Minimum Distortion Embedding) from a latent representation. |
Clustering
leiden clustering |
|
Run Louvain clustering on the precomputed kNN graph. |
Batch correction & scaling
Scale the input AnnData object. |
|
Regress out technical covariates from each gene. |
|
Scale the regressed layer and store it as a new analysis layer. |
|
Remove cell-cycle-correlated genes from |
|
Score cell cycle phases using predefined or custom gene sets. |
Single-cell (single)¶
Annotation
Automated cell-type annotation using SCSA marker-enrichment scoring. |
|
MetaTiME wrapper for tumor microenvironment cell-state annotation. |
|
Ensemble cell-type annotation manager with multiple backends. |
|
Annotate cluster cell types with a remote LLM service. |
|
Annotate cell types with a local instruction-tuned LLM. |
|
🧬 Cell ontology mapping class using NLP |
|
Unified single-cell annotation manager for cell-type labeling. |
|
Reference-based label transfer helper for single-cell annotation. |
Trajectory & cell fate
Trajectory inference class for single-cell data analysis. |
|
RNA velocity analysis wrapper for directional cell-state transition inference. |
|
Adaptive ridge-regression framework for pseudotime-associated gene discovery. |
|
Predict developmental potency with CytoTRACE2. |
Cell structure
SEACells-based metacell construction workflow. |
|
Differential gene-expression testing wrapper for single-cell datasets. |
|
Calculate gene signature enrichment scores using AUCell algorithm. |
|
Calculate the AUC-ell score for a given gene set. |
|
Run CellPhoneDB statistical analysis with proper file handling |
|
Predict drug sensitivity from single-cell transcriptomes using CaDRReS models. |
Batch correction & integration
Run MultiMAP to correct batch effect within a single AnnData object. |
|
SIMBA wrapper for single-cell batch integration and graph-embedding construction. |
|
Run MultiMAP to integrate a number of AnnData objects from various multi-omics experiments into a single joint dimensionally reduced space. |
Multi-omics
Train MOFA models for latent factor discovery across multiple omics layers. |
|
Load pretrained MOFA models for downstream factor interpretation. |
|
Pair RNA and ATAC cells using GLUE latent embeddings and neighbor matching. |
|
TOSICA wrapper for pathway-informed transformer-based cell-type annotation. |
Topic modelling
Consensus NMF workflow wrapper for robust gene-program discovery. |
Bulk RNA-seq (bulk)¶
Differential-expression analysis helper for bulk RNA-seq count tables. |
|
Gene Set Enrichment Analysis (GSEA) wrapper for ranked gene lists. |
|
Protein-protein interaction (PPI) analysis wrapper based on STRING. |
|
TCGA (The Cancer Genome Atlas) data analysis module. |
|
Bulk RNA-seq deconvolution class for inferring cell-type fractions from single-cell references. |
|
Map gene IDs in the input data to gene symbols using a reference table. |
|
Perform batch effect correction using ComBat algorithm. |
|
Perform pathway enrichment analysis using Enrichr-compatible gene-set libraries. |
Spatial transcriptomics (space)¶
Perform clustering analysis on spatial transcriptomics data using multiple methods. |
|
Spatial deconvolution pipeline that aligns scRNA-seq references with spatial transcriptomics. |
|
A class representing the PyTorch implementation of STAGATE (Spatial Transcriptomics Analysis using Graph Attention autoEncoder). |
|
STAligner for spatial transcriptomics data integration. |
|
SpaceFlow spatial flow analysis class. |
|
Tangram spatial deconvolution class for cell type mapping. |
|
Spatial Transition Tensor (STT) analysis class. |
|
GASTON spatial depth estimation and clustering. |
|
Construct spatial neighbor networks for spatial integration. |
|
Build a spatial neighborhood graph from coordinates stored in |
|
Compute Moran's I spatial autocorrelation for gene expression. |
Bulk-to-Single (bulk2single)¶
Integrate bulk and single-cell information to infer transitional cell-state trajectories. |
|
VAE-based bulk-to-single framework for reconstructing pseudo single cells from bulk RNA-seq. |
|
Deep-learning mapper that projects single-cell profiles onto spatial coordinates. |
Foundation Models (fm)¶
Execute a foundation model task. |
|
List available single-cell foundation models. |
|
Get the global model registry singleton. |
|
Get detailed specification for a foundation model. |
|
Select the best foundation model for a task and dataset. |
|
Validate data compatibility with a model and suggest preprocessing. |
|
Profile an AnnData file to detect species, gene scheme, and modality. |
|
Generate QA metrics and visualizations for model results. |
|
Complete specification for a foundation model. |
|
Registry of available single-cell foundation models. |
Plotting (pl)¶
Embedding & dimensionality
Scatter plot for user specified embedding basis (e.g. umap, pca, etc). |
|
Plot embedding with celltype color by omicverse. |
|
Plot cluster-specific density on an existing embedding. |
|
Create embedding scatter plots for multi-modal data (MuData) or single-cell data. |
|
Render large-scale embeddings with Datashader. |
|
Plot PCA embedding. |
|
Plot UMAP embedding. |
|
Plot t-SNE embedding. |
Differential expression
Create a volcano plot for differential expression analysis. |
|
Create a dot plot heatmap showing marker gene expression using PyComplexHeatmap. |
|
Create a dot plot from rank_genes_groups results. |
|
Make a dot plot of the expression values of var_names. |
|
Dot plot of marker genes — clean drop-in for |
Cell proportion & composition
Plot cell proportion of each cell type in each visual cluster. |
|
Plot the cell type percentage in each groupby category |
|
Create a Venn diagram to visualize set overlaps. |
|
Create a combined bar-and-dot summary plot by groups. |
Distribution
Enhanced violin plot compatible with omicverse's interface. |
|
Create a boxplot with jittered points to visualize data distribution across categories. |
|
Grouped boxplot visualization. |
Spatial
Scatter plot in spatial coordinates, aligned with |
|
Create spatial plot from Visium data with color gradient and interpolation. |
|
Mark a rectangular region on a spatial plot. |
Cell communication
Create a dot heatmap of CellPhoneDB interaction counts between cell types. |
|
Create a circular network plot of CellPhoneDB cell-cell interactions. |
|
Create a chord diagram visualization of CellPhoneDB interactions. |
|
Visualization helper for CellPhoneDB cell-cell communication outputs. |
Colours & palettes
Built-in mutable sequence. |
|
Built-in mutable sequence. |
|
Built-in mutable sequence. |
|
Forbidden City traditional-color palette utility. |
|
Optimized palette for plotting |
|
Returns a colormap palette. |
Datasets¶
Load PBMC 3k dataset from URL. |
|
The zebrafish is from Saunders, et al (2019). |
|
Pancreatic endocrinogenesis. |
|
The Dentate Gyrus dataset used in https://github.com/velocyto-team/velocyto-notebooks/blob/master/python/DentateGyrus.ipynb. |
|
Create a mock single-cell dataset for testing statistical functions. |
|
dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2). |