# Release Notes ## v 1.0.0 - First public release. ## v 1.1.7 ### bulk module: - Added Deseq2, including `pyDEseq` functions: `deseq2_normalize`, `estimateSizeFactors`, `estimateDispersions`, `Matrix_ID_mapping`. - Included TCGA with `TCGA`. - Introduced Enrichment with functions `geneset_enrichment`, `geneset_plot`. ### single module: - Integrated scdrug with functions `autoResolution`, `writeGEP`, `Drug_Response`. - Added cpdb with functions `cpdb_network_cal`, `cpdb_plot_network`, `cpdb_plot_interaction`, `cpdb_interaction_filtered`. - Included scgsea with functions `geneset_aucell`, `pathway_aucell`, `pathway_aucell_enrichment`, `pathway_enrichment`, `pathway_enrichment_plot`. ## v 1.1.8 ### single module: - Addressed errors in cpdb, including import errors and color issues in `cpdb_plot_network`. - Introduced `cpdb_submeans_exacted` in cpdb for easy sub-network extraction. ## v 1.1.9 ### bulk2single module: - Added the `bulk2single` module. - Fixed model load error from bulk2space. - Resolved early stop issues from bulk2space. - Included more user-friendly input methods and visualizations. - Added loss history visualization. ### utils module: - Introduced `pyomic_palette` in the plot module. ## v 1.1.10 - Updated all code references. ### single module: - Fixed non-valid parameters in `single.mofa.mofa_run` function. - Added layer raw count addition in `single.scanpy_lazy` function. - Introduced `utils.plot_boxplot` for plotting box plots with jittered points. - Added `bulk.pyDEseq.plot_boxplot` for plotting box plots with jittered points for specific genes. ## v 1.2.0 ### bulk module: - Fixed non-valid `cutoff` parameter in `bulk.geneset_enrichment`. - Added modules: `pyPPI`, `pyGSEA`, `pyWGCNA`, `pyTCGA`, `pyDEG`. ### bulk2single module: - Introduced `bulk2single.save` for manual model saving. ## v 1.2.1-4 ### single module: - Added `pySCSA` module with functions: `cell_anno`, `cell_anno_print`, `cell_auto_anno`, `get_model_tissue`. - Implemented doublet cell filtering in `single.scanpy_lazy`. - Added `single.scanpy_cellanno_from_dict` for easier annotation. - Updated SCSA database from [CellMarker2.0](http://bio-bigdata.hrbmu.edu.cn/CellMarker/). - Fixed errors in SCSA database keys: `Ensembl_HGNC` and `Ensembl_Mouse`. ## v 1.2.5 ### single module: - Added `pyVIA` module with functions: `run`, `plot_piechart_graph`, `plot_stream`, `plot_trajectory_gams`, `plot_lineage_probability`, `plot_gene_trend`, `plot_gene_trend_heatmap`, `plot_clustergraph`. - Fixed warning error in `utils.pyomic_plot_set`. - Updated requirements, including `pybind11`, `hnswlib`, `termcolor`, `pygam`, `pillow`, `gdown`. ## v 1.2.6 ### single module: - Added `pyVIA.get_piechart_dict` and `pyVIA.get_pseudotime`. ## v 1.2.7 ### bulk2single module: - Added `Single2Spatial` module with functions: `load`, `save`, `train`, `spot_assess`. - Fixed installation errors for packages in pip. ## v 1.2.8 - Fixed pip installation errors. ### bulk2single module: - Replaced `deep-forest` in `Single2Spatial` with `Neuron Network` for classification tasks. - Accelerated the entire Single2Spatial inference process using GPU and batch-level estimation by modifying the `predicted_size` setting. ## v 1.2.9 ### bulk module: - Fixed duplicates_index mapping in `Matrix_ID_mapping`. - Resolved hub genes plot issues in `pyWGCNA.plot_sub_network`. - Fixed backupgene in `pyGSEA.geneset_enrichment` to support rare species. - Added matrix plot module in `pyWGCNA.plot_matrix`. ### single module: - Added `rank_genes_groups` check in `pySCSA`. ### bulk2single module: - Fixed import error of `deepforest`. ## v 1.2.10 - Renamed the package to `omicverse`. ### single module: - Fixed argument error in `pySCSA`. ### bulk2single module: - Updated plot arguments in `bulk2single`. ## v 1.2.11 ### bulk module: - Fixed `wilcoxon` method in `pyDEG.deg_analysis`. - Added parameter setting for treatment and control group names in `pyDEG.plot_boxplot`. - Fixed figure display issues in `pyWGCNA.plot_matrix`. - Fixed category correlation failed by one-hot in `pyWGCNA.analysis_meta_correlation`. - Fixed network display issues in `pyWGCNA.plot_sub_network` and updated `utils.plot_network` to avoid errors. ## v 1.3.0 ### bulk module: - Added `DEseq2` method to `pyDEG.deg_analysis`. - Introduced `pyGSEA` module in `bulk`. - Renamed raw `pyGSEA` to `pyGSE` in `bulk`. - Added `get_gene_annotation` in `utils` for gene name transformation. ## v 1.3.1 ### single module: - Added `get_celltype_marker` method. ### single module: - Added `GLUE_pair`, `pyMOFA`, `pyMOFAART` module. - Added tutorials for `Multi omics analysis by MOFA and GLUE`. - Updated tutorial for `Multi omics analysis by MOFA`. ## v 1.4.0 ### bulk2single module: - Added `BulkTrajBlend` method. ### single module: - Fixed errors in `scnocd` model. - Added `save`, `load`, and `get_pair_dict` in `scnocd` model. ### utils module: - Added `mde` method. - Added `gz` format support for `utils.read`. ## v 1.4.1 ### preprocess module: - Added `pp` (preprocess) module with `qc` (quantity control), `hvg` (high variable feature), `pca`. - Added `data_files` for cell cycle calculation from [Cellula](https://github.com/andrecossa5/Cellula/) and [pegasus](https://github.com/lilab-bcb/pegasus/). ## v 1.4.3 ### preprocess module: - Fixed sparse preprocess error in `pp`. - Fixed trajectory import error in `via`. - Added gene correlation analysis of trajectory. ## v 1.4.4 ### single module: - Added `panglaodb` database to `pySCSA` module. - Fixed errors in `pySCSA.cell_auto_anno` when some cell types are not found in clusters. - Fixed errors in `pySCSA.cell_anno` when `rank_genes_groups` are not consistent with clusters. - Added `pySIMBA` module in single for batch correction. ### preprocess module: - Added `store_layers` and `retrieve_layers` in `ov.utils`. - Added `plot_embedding_celltype` and `plot_cellproportion` in `ov.utils`. ## v 1.4.5 ### single module: - Added `MetaTiME` module to perform cell type annotation automatically in TME. ## v 1.4.12 - Updated `conda install omicverse -c conda-forge`. ### single module: - Added `pyTOSICA` module to perform cell type migration from reference scRNA-seq in Transformer model. - Added `atac_concat_get_index`, `atac_concat_inner`, `atac_concat_outer` functions to merge/concatenate scATAC data. - Fixed `MetaTime.predicted` when Unknown cell type appears. ### preprocess module: - Added `plot_embedding` in `ov.utils` to plot UMAP in a special color dictionary. ## v 1.4.13 ### bulk module: - Added `mad_filtered` to filter robust genes when calculating the network in `ov.bulk.pyWGCNA` module. - Fixed `string_interaction` in `ov.bulk.pyPPI` for string-db updates. ### preprocess module: - Changed `mode` argument of `pp.preprocess` to control preprocessing steps. - Added `ov.utils.embedding`, `ov.utils.neighbors`, and `ov.utils.stacking_vol`. ## v 1.4.14 ### preprocess module: - Added `batch_key` in `pp.preprocess` and `pp.qc`. ### utils module: - Added `plot_ConvexHull` to visualize the boundary of clusters. - Added `weighted_knn_trainer` and `weighted_knn_transfer` for multi-adata integration. ### single module: - Fixed import errors in `mofa`. ## v 1.4.17 ### bulk module: - Fixed compatibility issues with `pydeseq2` version `0.4.0`. - Added `bulk.batch_correction` for multi-bulk RNA-seq/microarray samples. ### single module: - Added `single.batch_correction` for multi-single cell datasets. ### preprocess module: - Added parameter `layers_add` in `pp.scale`. ## v 1.5.0 ### single module: - Added `cellfategenie` to calculate timing-associated genes/genesets. - Fixed the name error in `atac_concat_outer`. - Added more kwargs for `batch_correction`. ### utils module: - Added `plot_heatmap` to visualize the heatmap of pseudotime. - Fixed `embedding` when the version of `mpl` is larger than `3.7.0`. - Added `geneset_wordcloud` to visualize geneset heatmaps of pseudotime. ## v 1.5.1 ### single module: - Added `scLTNN` to infer cell trajectory. ### bulk2single module: - Updated cell fraction prediction with `TAPE` in bulk2single. - Fixed group and normalization issues in bulk2single. ### utils module: - Added `Ro/e` calculation (by: Haihao Zhang). - Added `cal_paga` and `plot_paga` to visualize the state transfer matrix. - Fixed the `read` function. ## v 1.5.2 ### bulk2single Module: - Resolved a matrix error occurring when gene symbols are not unique. - Addressed the `interpolation` issue in `BulkTrajBlend` when target cells do not exist. - Corrected the `generate` function in `BulkTrajBlend`. - Rectified the argument for `vae_configure` in `BulkTrajBlend` when `cell_target_num` is set to None. - Introduced the parameter `max_single_cells` for input in `BulkTrajBlend`. - Defaulted to using `scaden` for deconvolution in Bulk RNA-seq. ### single Module: - Fixed an error in `pyVIA` when the root is set to None. - Added the `TrajInfer` module for inferring cell trajectories. - Integrated `Palantir` and `Diffusion_map` into the `TrajInfer` module. - Corrected the parameter error in `batch_correction`. ### utils Module: - Introduced `plot_pca_variance_ratio` for visualizing the ratio of PCA variance. - Added the `cluster` and `filtered` module for clustering the cells - Integrated `MiRA` to calculate the LDA topic ## v 1.5.3 ### single Module: - Added `scVI` and `MIRA` to remove batch effect ### space Module: - Added `STAGATE` to cluster and denoisy the spatial RNA-seq ### pp Module: - Added `doublets` argument of `ov.pp.qc` to control doublets('Default'=True) ## v 1.5.4 ### bulk Module: - Fixed an error in `pyDEG.deg_analysis` when `n_cpus` can not be set in `pyDeseq2(v0.4.3)` ### single Module: - Fixed an argument error in `single.batch_correction` of combat ### utils Module: - Added `venn4` plot to visualize - Fixed the label visualization of `plot_network` - Added `ondisk` argument of `LDA_topic` ### space Module: - Added `Tangram` to mapping the scRNA-seq to stRNA-seq ## v 1.5.5 ### pp Module: - Added `max_cells_ratio` and `max_genes_ratio` to control the max threshold in qc of scRNA-seq ### single Module: - Added `SEACells` model to calculate the metacells from scRNA-seq ### space Module: - Added `STAligner` to integrate multi stRNA-seq ## v 1.5.6 ### pp Module - Added `mt_startswith` argument to control the `qc` in mouse or other species. ### utils Module - Added `schist` method to cluster the single cell RNA-seq ### single Module - Fixed the import error of `palantir` in SEACells - Added `CEFCON` model to identify the driver regulators of cell fate decisions ### bulk2single Module - Added `use_rep` and `neighbor_rep` argument to configure the nocd ### space Module - Added `SpaceFlow` to identify the pseudo-spatial map ## v 1.5.8 ### pp Module - Added `score_genes_cell_cycle` function to calculate the cell cycle ### bulk Module - Fixed `dds.plot_volcano` text plot error when the version of `adjustText` larger than `0.9` ### single Module - Optimised `MetaCell.load` model loading logic - Fixed an error when loading the model usng `MetaCell.load` - Added tutorials of `Metacells` ### pl Module Add `pl` as a unified drawing prefix for the next release, to replace the drawing functionality in the original utils, while retaining the drawing in the original utils. - Added `embedding` to plot the embedding of scRNA-seq using `ov.pl.embedding` - Added `optim_palette` to provide a spatially constrained approach that generates discriminate color assignments for visualizing single-cell spatial data in various scenarios - Added `cellproportion` to plot the proportion of stack bar of scRNA-seq - Added `embedding_celltype` to plot the figures both celltype proportion and embedding - Added `ConvexHull` to plot the ConvexHull around the target cells - Added `embedding_adjust` to adjust the text of celltype legend in embedding - Added `embedding_density` to plot the category density in the cells - Added `bardotplot` to plot the bardotplot between different groups. - Added `add_palue` to plot the p-threshold between different groups. - Added `embedding_multi` to support the `mudata` object - Added `purple_color` to visualize the purple palette. - Added `venn` to plot the venn from set 2 to set 4 - Added `boxplot` to visualize the boxdotplot - Added `volcano` to visualzize the result of differential expressed genes ## v 1.5.9 ### single Module - Added `slingshot` in `single.TrajInfer` - Fixed some error of `scLTNN` - Added `GPU` mode to preprocess the data - Added `cNMF` to calculate the nmf ### space Module - Added `Spatrio` to mapping the scRNA-seq to stRNA-seq ## v 1.6.0 Move `CEFCON`,`GNTD`,`mofapy2`,`spaceflow`,`spatrio`,`STAligner`,`tosica` from root to external module. ### space Module - Added `STT` in `omicverse.space` to calculate the spatial transition tensor. - Added `scSLAT` in `omicverse.external` to align of different spatial slices. - Added `PROST` in `omicverse.external` and `svg` in `omicverse.space` to identify the spatially variable genes and domain. ### single Module - Added `get_results_rfc` in `omicverse.single.cNMF` to predict the precise cluster in complex scRNA-seq/stRNA-seq - Added `get_results_rfc` in `omicverse.utils.LDA_topic` to predict the precise cluster in complex scRNA-seq/stRNA-seq - Added `gptcelltype` in `omicverse.single` to annotate celltype using large language model #82. ### pl Module - Added `plot_spatial` in `omicverse.pl` to visual the spot proportion of cells when deconvolution ## v 1.6.2 Support Raw Windows platform - Added `mde` in `omicverse.pp` to accerate the umap calculation. ## v 1.6.3 - Added `ov.setting.cpu_init` to change the environment to CPU. - Move module `tape`,`SEACells` and `palantir` to `external` ### Single Module - Added `CytoTrace2` to predict cellular potency categories and absolute developmental potential from single-cell RNA-sequencing data. - Added `cpdb_exact_target` and `cpdb_exact_source` to exact the means of special ligand/receptor - Added `gptcelltype_local` to identify the celltype using local LLM #96 #99 ### Bulk Module - Added `MaxBaseMean` columns in dds.result to help people ignore the empty samples. ### Space Module - Added `**kwargs` in `STT.compute_pathway` - Added `GraphST` to identify the spatial domain ### pl Module - Added `cpdb_network`, `cpdb_chord`, `cpdb_heatmap`, `cpdb_interacting_network`,`cpdb_interacting_heatmap` and `cpdb_group_heatmap` to visualize the result of CellPhoneDB ### utils Module - Added `mclust_py` to identify the Gaussian Mixture cluster - Added `mclust` methdo in `cluster` function ## v 1.6.4 ### Bulk Module - Optimised pyGSEA's `geneset_plot` visualisation of coordinate effects - Fixed an error of `pyTCGA.survival_analysis` when the matrix is sparse. #62, #68, #95 - Added tqdm to visualize the process of `pyTCGA.survial_analysis_all` - Fixed an error of `data_drop_duplicates_index` with remove duplicate indexes to retain only the highest expressed genes #45 - Added `geneset_plot_multi` in `ov.bulk` to visualize the multi results of enrichment. #103 ### Single Module - Added `mellon_density` to calculate the cell density. #103 ### PP Module - Fixed an error of `ov.pp.pca` when pcs smaller than 13. #102 - Added `COMPOSITE` in `ov.pp.qc`'s method to predicted doublet cells. #103 - Added `species` argument in `score_genes_cell_cycle` to calculate the cell phase without gene manual input ## v 1.6.6 ### Pl Module - Fixed the 'celltyep_key' error of `ov.pl.cpdb_group_heatmap` #109 - Fixed an error in `ov.utils.roe` when some expected frequencies are less than expected value. - Added `cellstackarea` to visual the Percent stacked area chart of celltype in samples. ### Single Module - Fixed the bug of `ov.single.cytotrace2` when adata.X is not sparse data. #115, #116 - Fixed the groupby error in `ov.single.get_obs_value` of SEACells. - Fixed the error of cNMF #107, #85 - Fixed the plot error when `Pycomplexheatmap` version > 1.7 #136 ### Bulk Module - Fixed an key error in `ov.bulk.Matrix_ID_mapping` - Added `enrichment_multi_concat` in `ov.bulk` to concat the result of enrichment. - Fixed the pandas version error in gseapy #137 ### Bulk2Single Module - Added `adata.var_names_make_unique()` to avoid mat shape error if gene not unique. #100 ### Space Module - Fixed an error in `construct_landscape` of `ov.space.STT` - Fixed an error of `get_image_idx_1D` in `ov.space.svg` #117 - Added `COMMOT` to calculate the cell-cell interaction of spatial RNA-seq. - Added `starfysh` to deconvolute spatial transcriptomic without scRNA-seq (#108) ### PP Module - Updated constraint error of ov.pp.mde #129 - Fixed type error of `float128` #134 ## v 1.6.7 ### Space Module - Added `n_jobs` argument to adjust thread in `extenel.STT.pl.plot_tensor_single` - Fixed an error in `extenel.STT.tl.construct_landscape` - Updated the tutorial of `COMMOT` and `Flowsig` ### Pl Module - Added `legend_awargs` to adjust the legend set in `pl.cellstackarea` and `pl.cellproportion` ### Single Module - Fixed the error of `get_results` and `get_results_rfc` in `cNMF` module. (#143) (#139) - Added `sccaf` to obtain the best clusters. - Fixed the `.str` error in cytotrace2 (#146) ### Bulk Module - Fixed the import error of `gseapy` in `bulk.geneset_enrichment` - Optimized code logic for offline enrichment analysis, added background parameter - Added `pyWGCNA` package replace the raw calculation of pyWGCNA (#162) ### Bulk2Single Module - Remove `_stat_axis` in `bulk2single_data_prepare` and use `index` instead of it (#160). ### PP Module - Fixed a return bugs in `pp.regress_and_scale` (#156) - Fixed a scanpy version error when using `ov.pp.pca` (#154) ## v 1.6.8 ### Bulk Module - Fixed the error of log_init in gsea_obj.enrichment (#184) - Added `ax` argument to visualize the `geneset_plot` ### Space Module - Added CAST to integrate multi slice - Added `crop_space_visium` in `omicverse.tl` to crop the sub area of space data ### Pl Module - Added `legend` argument to visualize the `cpdb_heatmap` - Added `text_show` argument to visualize the `cellstackarea` - Added `ForbiddenCity` color system ## v 1.6.9 ### PP Module - Added `recover_counts` to recover `counts` after `ov.pp.preprocess` - removed the lognorm layers added in `ov.pp.pca` ### Single Module - Added `MultiMap` module to integrate multi species - Added `CellVote` to vote the best cells - Added `CellANOVA` to integrate samples and correct the batch effect - Added `StaVia` to calculate the pseudotime and infer trajectory. ### Space Module - Added `ov.space.cluster` to identify the spatial domain - Added `Binary` for spatial cluster - Added `Spateo` to calculate the SVG ## v 1.7.0 Added `cpu-gpu-mixed` to accelerate the analysis of scrna-seq using GPU. Changed the logo presentation of Omicverse to `ov.plot_set` ### Bulk Module - Added `limma`, `edgeR` in different expression gene analysis. (#238) - Fixed the version error of `DEseq2` analysis. ### Single Module - Added `lazy` function to calculate all function of scrna-seq (#291) - Added `generate_scRNA_report` and `generate_reference_table` to generate the report and reference (#291) (#292) - Fixed `geneset_prepare` not being able to read gmt not split by `\t\t` (#235) (#238) - Added `geneset_aucell_tmp`,`pathway_aucell_tmp`,`pathway_aucell_enrichment_tmp` to test the chunk_size (#238) - Added data enhancement of `Fate` - Added `plot_atlas_view_ov` in VIA - Fixed an error when the matrix is too large in `recover_counts`. - Added `forceatlas2` to calculate the `X_force_directed`. - Added `milo` and `scCODA` to analysis different celltype abundance. - Added `memento` to analysis different gene expression. ### Space Module - Added `GASTON` to learn a topographic map of a tissue slice from spatially resolved transcriptomics (SRT) data (#238) - Added super kwargs in `plot_tensor_single` of STT. - Updated `COMMOT` using GPU-accerlate ### Plot Module - Added `dotplot_doublegroup` to visual the genes in doublegroup. - Added `transpose` argument of `cpdb_interacting_heatmap` to transpose the figure. - Added `calculate_gene_density` to plot the gene's density. ## v 1.7.1 ### Single Module - Fixed some error of `ov.single.lazy`. - Fixed the format of `ov.single.generate_scRNA_report` - Updated some functions of `palantir` - Added `CellOntologyMapper` to map cell name. ## v 1.7.2 ### Pl Module - Optimated the plot effect of `ov.pl.box_plot` - Optimated the plot effect of `ov.pl.volcano` Optimated the plot effect of `ov.pl.violin` - Added beautiful dotplot than scanpy (#318) - Added the similar visualization function of CellChat. (#313) ### Space Module - Added 3D cell-cell interaction analysis in `COMMOT` (#315) ### Single Module - Fixed the error of pathway_enrichment. (#184) - Added SCENIC module with GPU-accerlate. (#331) ### utils Module - Added scICE to calculate the best cluster (#329) ## v 1.7.6 ### LLM Module - Added `GeneFromer`, `scGPT`, `scFoundation`, `UCE`, `CellPLM` to call directly in OmicVerse. ### Pl Module - Optimized the visualization effect of embedding. - Added `ov.pl.umap`, `ov.pl.pca`, `ov.pl.mde`, and `ov.pl.tsne` ## v 1.7.8 Implemented lazy loading system that reduces `import omicverse` time by **40%** (from ~7.8s to ~4.7s). Added GPU-accelerated PCA support for Apple Silicon (MLX) and CUDA (TorchDR) devices. Introduced Smart Agent System with natural language processing for 50+ AI models from 8 providers. Added and fixed the `anndata-rs` to support million size's datasets (#336) ### PP Module - Added GPU-accelerated PCA in `ov.pp.pca()` with MLX support for Apple Silicon MPS devices - Added TorchDR-based PCA acceleration in `ov.pp.pca()` for NVIDIA CUDA devices - Added smart device detection and automatic backend selection in `init_pca()` and `pca()` functions - Added graceful fallback to CPU implementation when GPU acceleration fails - Added enhanced verbose output with device selection information and emoji indicators - Added optimal component determination based on variance contribution thresholds in `init_pca()` - Added GPU-accelerated SUDE dimensionality reduction in `ov.pp.sude()` with MLX/CUDA support - Optimize the `ov.pp.qc` and added ribosome and hb-genes to know more information of data quantity. ### Datasets Module - Complete elimination of scanpy dependencies for faster loading - Added dynamo-style dataset framework with comprehensive collection - Added robust download system with progress tracking and caching - Added enhanced mock data generation with realistic structure - Added support for h5ad, loom, xlsx, and compressed formats ### Agent Module - Added multi-provider LLM support (OpenAI, Anthropic, Google, DeepSeek, Qwen, Moonshot, Grok, Zhipu AI) - Added natural language processing for both English and Chinese - Added code generation architecture with local execution - Added function registry system with multi-language aliases - Added smart API key management and provider-specific configuration ### Bulk Module - Added `BayesPrime` and `Scaden` to deconvoluted Bulk RNA-seq's celltype proportion. - Added `alignment` to alignment the fastq to counts. ### Single Module - Added `ov.single.Annotation` and `ov.single.AnnotationRef` to annotate the cell type automatically. - Added `ov.alignment.single` to alignment the scRNA-seq to counts directly. ## v 1.7.9 Implemented **smart lazy loading system** that dramatically reduces `import omicverse` time by **85.6x** (from ~16.57s to ~0.19s). Enhanced RNA-seq alignment workflow with comprehensive toolkit for FASTQ processing and counting. Optimized dataset management with nested directory creation for better organization. ### Performance Optimization **Lazy Loading System**: - Implemented module-level lazy loading using `__getattr__` mechanism for all major modules - Added attribute-level lazy loading for frequently-used functions (read, palette, Agent, etc.) - Introduced intelligent caching system to ensure instant access after first load - Reduced initial import time from **16.57 seconds to 0.19 seconds** (85.6x speedup) - Maintained full backward compatibility - all existing code works without modification - Preserved complete IDE support with tab completion via `__dir__()` implementation - Fixed circular import issues by delaying settings module initialization - **MkDocs API documentation generation fully compatible** with lazy loading **Benefits for Users**: - ⚑ Instant startup for Jupyter notebooks and scripts - 🎯 Load only what you use - modules imported on first access - πŸ’Ύ Reduced memory footprint for simple tasks - πŸ”„ Second access is cached and instant (< 0.001s) ### Alignment Module **New Comprehensive RNA-seq Alignment Toolkit**: Added complete end-to-end workflow for processing raw sequencing data: - **`ov.alignment.prefetch`**: Download SRA datasets from NCBI with built-in retry logic - **`ov.alignment.fqdump`**: Convert SRA to FASTQ format with parallel processing support - **`ov.alignment.parallel_fastq_dump`**: High-performance parallel FASTQ extraction - **`ov.alignment.fastp`**: Quality control and adapter trimming for FASTQ files - **`ov.alignment.STAR`**: RNA-seq alignment using STAR aligner with customizable parameters - **`ov.alignment.featureCount`**: Gene-level read counting (renamed from `count` to avoid conflicts) - **`ov.alignment.single`**: One-command scRNA-seq alignment with kb-python (kallisto|bustools) - **`ov.alignment.ref`**: Build kallisto|bustools reference index for alignment - **`ov.alignment.count`**: Quantify gene expression from aligned reads **Key Features**: - Unified API for both bulk RNA-seq (STAR + featureCount) and scRNA-seq (kb-python) workflows - Built-in support for RNA velocity analysis with kb-python - Parallel processing capabilities for faster data conversion - Automatic handling of paired-end and single-end reads - Technology-specific filtering for bulk vs single-cell data - Integration with SRA toolkit for seamless data download **Example Workflow**: ```python # Download and process bulk RNA-seq ov.alignment.prefetch('SRR1234567', output_dir='./data') ov.alignment.fqdump('SRR1234567', output_dir='./fastq') ov.alignment.fastp('sample_1.fastq.gz', 'sample_2.fastq.gz', output_prefix='clean') ov.alignment.STAR(fastq1='clean_1.fastq.gz', fastq2='clean_2.fastq.gz', genome_dir='./genome', output_prefix='aligned') ov.alignment.featureCount(bam='aligned.bam', annotation='genes.gtf', output='counts.txt') # Or use one-command scRNA-seq alignment ov.alignment.single( fastq=['read1.fastq.gz', 'read2.fastq.gz'], index='./kb_index', output_dir='./kb_output', technology='10xv3' ) ``` ### PP Module - Fixed an HVG (Highly Variable Genes) selection issue in `ov.pp.preprocess` - Improved preprocessing pipeline stability and accuracy - Refactored PCA implementation to utilize `torch_pca` for GPU acceleration (replacing TorchDR) - Enhanced support for sparse matrices in PCA computation - Updated PCA embedding basis from `X_pca` to `PCA` for clarity and consistency - Improved error handling with try-except blocks in PCA computation - Fixed PCA GPU mode support with sparse matrices to avoid memory errors ### Single Module - Added `CONCORD` method to `ov.single.batch_correction` for single-cell data integration - Enhanced batch correction capabilities with state-of-the-art algorithm - **Fixed critical performance issue in pySCENIC**: Reverted inefficient correlation calculation optimization that caused memory issues and slowdowns in scRNA-seq data - Removed misleading warnings about dropout genes in SCENIC correlation calculations - Restored memory-efficient pairwise correlation computation (prevents OOM with >20k genes) - SCENIC now uses original approach: calculate correlations only for specific TF-target pairs instead of creating full geneΓ—gene matrices - Added `ov.single.find_markers` for unified marker gene identification supporting five methods: `cosg`, `t-test`, `t-test_overestim_var`, `wilcoxon`, and `logreg`; statistical methods are natively ported from scanpy with no scanpy runtime dependency and numerically consistent results (rtol=1e-4) - Added `ov.single.get_markers` to extract top marker genes from results as a `DataFrame` or `dict`, with support for single/multiple cluster filtering and optional filtering by `min_logfoldchange`, `min_score`, and `min_pval_adj`; output includes `pct_group` and `pct_rest` columns showing cell expression proportions within and outside each cluster ### Space Module - Added `FlashDeconv` for fast, GPU-free deconvolution in Visium spatial transcriptomics - Added `Banksy` clustering method for spatial domain identification - Updated spatial analysis documentation with new clustering approaches ### Web Module - Launched `Omicverse-Notebook` for browser-based interactive analysis without local installation - Launched `Omicverse-Web` for web-based data analysis without coding requirements - Democratized bioinformatics analysis for researchers without programming background ### Agent Module - Enhanced `ov.Agent` with improved natural language processing for data analysis - Expanded LLM provider support and model selection - Optimized code generation and execution pipeline ### Pl Module - Enhanced categorical legend handling for scatterplot embeddings - Added `legend_loc='on data'` option for direct annotation on plots - Improved visualization clarity for complex datasets - Added `ov.pl.markers_dotplot` as a cleaner drop-in for `rank_genes_groups_dotplot` with improved defaults (`standard_scale='var'`, `cmap='Spectral_r'`, `dendrogram=False`) - Fixed `KeyError` in `rank_genes_groups_df` when cluster names are numeric strings (e.g., leiden `'0'`, `'1'`); now correctly handles structured arrays, DataFrames, and plain 2D arrays from all marker methods ### Datasets Module - Added comprehensive dataset URLs for easier data access - Expanded data downloading utilities with progress tracking - **Fixed dataset download to create nested target directories automatically** - Improved dataset utilities with better error handling - Refreshed download behaviors for more reliable data fetching ### Docs - Strengthened data handling documentation in dotplot and DEG analysis tutorials - Updated the scTour clustering tutorial with latest best practices - Added comprehensive release notes for v1.7.9 - Enhanced alignment module documentation with end-to-end workflows ### Bug Fixes - Resolved circular import issues between `_settings` and `utils` modules - Fixed compatibility issues with latest package versions (zarr, pandas, etc.) - Improved error handling in parallel processing functions ### Single Module **Enhanced DEG Analysis with Expression Percentages**: Added cell expression percentage information to differential expression results - Added `pct_ctrl` column showing percentage of cells expressing each gene in control group (0-100%) - Added `pct_test` column showing percentage of cells expressing each gene in test group (0-100%) - Added `pct_diff` column showing the difference in expression percentage (pct_test - pct_ctrl) - Works with all DEG methods: `wilcoxon`, `t-test`, and `memento-de` - Enables better marker gene identification by filtering genes based on expression prevalence - Similar to dotplot circle size information, helps identify genes with widespread vs. sparse expression patterns **Example Usage**: ```python deg_obj = ov.single.DEG(adata, condition='condition', ctrl_group='Control', test_group='Treatment') deg_obj.run(celltype_key='cell_label', celltype_group=['T_cells']) results = deg_obj.get_results() # Now includes pct_ctrl, pct_test, pct_diff columns ``` ### Compatibility **NumPy 2.0 Compatibility**: Fixed all NPY201 compatibility issues to ensure seamless support for both NumPy 1.x and 2.x **Fixed Issues (31 total)**: 1. **`np.in1d` β†’ `np.isin`** (9 instances) - `omicverse/bulk/_dynamicTree.py`: 3 instances (lines 697, 741) - `omicverse/single/_cosg.py`: 1 instance (line 77) - `omicverse/external/GNTD/_preprocessing.py`: 2 instances - `omicverse/external/scdiffusion/guided_diffusion/cell_datasets_WOT.py`: 1 instance - Other external modules: 2 instances 2. **`np.row_stack` β†’ `np.vstack`** (13 instances) - `omicverse/external/CAST/CAST_Projection.py`: 2 instances - `omicverse/external/CAST/visualize.py`: 2 instances - `omicverse/external/scSLAT/viz/multi_dataset.py`: multiple instances - `omicverse/single/_mdic3.py`: 1 instance 3. **`np.product` β†’ `np.prod`** (4 instances) - `omicverse/external/umap_pytorch/model.py`: 2 instances - `omicverse/external/umap_pytorch/modules.py`: 2 instances 4. **`np.trapz` compatibility wrapper** (2 instances) - Added compatibility wrapper in: - `omicverse/external/VIA/plotting_via.py` - `omicverse/external/VIA/plotting_via_ov.py` - Uses `numpy.trapezoid` (NumPy 2.0+) with fallback to `numpy.trapz` (NumPy 1.x) **Backward Compatibility**: - βœ… All changes maintain full backward compatibility with NumPy 1.x (1.13+) - βœ… `np.isin` available since NumPy 1.13 - βœ… `np.vstack` available in all NumPy versions - βœ… `np.prod` available in all NumPy versions - βœ… Custom compatibility wrapper handles `trapz`/`trapezoid` transition ## v 1.7.10 ### Scope - This release note summarizes changes from commit `cd3d151` (version set to `1.7.10rc1`) to current `HEAD`. - Total code delta in this window: `252 files changed`, `+46,992 / -9,752`. ### Agent & Runtime - Upgraded `ov.Agent` architecture to modern agentic tool-calling workflows with subagent delegation (v4/v5 evolution). - Improved GPT-5.2 robustness, response parsing, and backend error handling. - Added harness runtime components for execution contracts, tool catalog, runtime state, tracing, and cleanup policies. - Strengthened sandbox behavior with restricted import controls for internal modules. - Added web bridge and session-level execution improvements for agent workflows. ### New Modules - Added `omicverse.biocontext` for biomedical knowledge queries via BioContext MCP tooling. - Added `omicverse.fm` (foundation-model adapters, routing, registry, and API). - Added structured `omicverse.io` namespaces for general/single/bulk/spatial I/O paths. - Added `omicverse.jarvis` multi-channel bot framework (Feishu/QQ/Telegram) with bridge support. ### Core OmicVerse Improvements - Continued enhancements across `pp`, `pl`, `single`, `space`, and `utils` modules. - Fixed circular import between preprocessing utility internals (`_utils.py` and `_scale.py` path). - Added/updated function-level metadata and documentation quality in key analysis modules (preprocessing, annotation, trajectory, spatial, datasets, bulk). - Extended dataset utilities with new signature resources and improved loading pathways. ### Registry & Help System - Improved registry behavior and module import exposure in package entrypoints. - Enhanced function/class registration metadata coverage for agent discoverability. - Registry help generation now better aligns with class constructor documentation in class-based tools. ### Web & UI - Single-cell analysis UI received iterative upgrades: - Better code cell management and undo behavior - Improved AnnData slot detail retrieval and display - Better DataFrame rendering and integration - Plot density/point style control refinements - i18n and UX polish for analysis panels - `omicverse_web` service layer expanded with session-oriented agent service support. ### Developer Experience & Testing - Added FM test suite and multiple harness/ovagent test modules. - Removed obsolete legacy-priority and complexity-classifier test paths. - Added workflow and harness documentation pages for runtime contracts and operational guidance. ### Documentation - Updated and expanded agent architecture and streaming API docs. - Updated `t_preprocess_cpu.ipynb` to match latest GPU/version detection behavior. - Added bilingual and deployment-oriented guidance for Jarvis and agent-related workflows.