{ "cells": [ { "cell_type": "markdown", "id": "a79fd05b", "metadata": { "tags": [] }, "source": [ "# ov.fm — Unified Foundation Model API\n", "\n", "The `ov.fm` module provides a **unified, model-agnostic API** for working with 22 single-cell foundation models. Instead of learning each model's unique interface, you can use the same 6-step workflow for any model:\n", "\n", "1. **Discover** — Browse available models and their capabilities\n", "2. **Profile** — Automatically analyze your dataset\n", "3. **Select** — Let the system recommend the best model\n", "4. **Validate** — Check data-model compatibility before running\n", "5. **Run** — Execute inference with a single function call\n", "6. **Interpret** — Generate QA metrics and visualizations\n", "\n", "**Supported models include:** scGPT, Geneformer, UCE, scFoundation, CellPLM, scBERT, GeneCompass, Nicheformer, scMulan, and 13 more.\n", "\n", "**Cite:** Zeng, Z. et al. (2024). OmicVerse: a framework for bridging and deepening insights across bulk and single-cell sequencing. *Nature Communications*, 15(1), 5983." ] }, { "cell_type": "code", "execution_count": 1, "id": "0213f2f7", "metadata": { "tags": [] }, "outputs": [], "source": [ "import omicverse as ov\n", "import scanpy as sc\n", "import numpy as np\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "sc.settings.set_figure_params(dpi=80, facecolor='white')\n", "sc.settings.figdir = './figures/'" ] }, { "cell_type": "markdown", "id": "012c41cc", "metadata": { "tags": [] }, "source": [ "## Step 1: Discover Available Models\n", "\n", "Use `ov.fm.list_models()` to browse all registered foundation models. You can filter by task type to find models that support your specific analysis." ] }, { "cell_type": "code", "execution_count": 2, "id": "78eeabdd", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total models available: 22\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namestatustasksspecieszero_shotgpu_requiredmin_vram_gb
0scgptready[embed, integrate][human, mouse]TrueTrue8
1geneformerready[embed, integrate][human]TrueFalse4
2uceready[embed, integrate][human, mouse, zebrafish, mouse_lemur, macaque...TrueTrue16
3scfoundationready[embed, integrate][human]TrueTrue16
4scbertpartial[embed, integrate][human]TrueTrue8
5genecompasspartial[embed, integrate][human, mouse]TrueTrue16
6cellplmready[embed, integrate][human]TrueTrue8
7nicheformerpartial[embed, integrate, spatial][human, mouse]TrueTrue16
8scmulanpartial[embed, integrate][human]TrueTrue16
9tgptpartial[embed, integrate][human]TrueTrue16
10cellfmpartial[embed, integrate][human]TrueTrue16
11sccellopartial[embed, integrate, annotate][human]TrueTrue16
12scprintpartial[embed, integrate][human]TrueTrue16
13aidocellpartial[embed, integrate][human]TrueTrue16
14pulsarpartial[embed, integrate][human]TrueTrue16
15atacformerpartial[embed, integrate][human]TrueTrue16
16scplantllmpartial[embed, integrate][plant]TrueTrue16
17langcellpartial[embed, integrate][human]TrueTrue16
18cell2sentencepartial[embed][human]FalseTrue16
19geneptpartial[embed][human]TrueFalse0
20chatcellpartial[embed, annotate][human]TrueTrue16
21tabulapartial[embed, annotate, integrate, perturb][human]TrueTrue8
\n", "
" ], "text/plain": [ " name status tasks \\\n", "0 scgpt ready [embed, integrate] \n", "1 geneformer ready [embed, integrate] \n", "2 uce ready [embed, integrate] \n", "3 scfoundation ready [embed, integrate] \n", "4 scbert partial [embed, integrate] \n", "5 genecompass partial [embed, integrate] \n", "6 cellplm ready [embed, integrate] \n", "7 nicheformer partial [embed, integrate, spatial] \n", "8 scmulan partial [embed, integrate] \n", "9 tgpt partial [embed, integrate] \n", "10 cellfm partial [embed, integrate] \n", "11 sccello partial [embed, integrate, annotate] \n", "12 scprint partial [embed, integrate] \n", "13 aidocell partial [embed, integrate] \n", "14 pulsar partial [embed, integrate] \n", "15 atacformer partial [embed, integrate] \n", "16 scplantllm partial [embed, integrate] \n", "17 langcell partial [embed, integrate] \n", "18 cell2sentence partial [embed] \n", "19 genept partial [embed] \n", "20 chatcell partial [embed, annotate] \n", "21 tabula partial [embed, annotate, integrate, perturb] \n", "\n", " species zero_shot \\\n", "0 [human, mouse] True \n", "1 [human] True \n", "2 [human, mouse, zebrafish, mouse_lemur, macaque... True \n", "3 [human] True \n", "4 [human] True \n", "5 [human, mouse] True \n", "6 [human] True \n", "7 [human, mouse] True \n", "8 [human] True \n", "9 [human] True \n", "10 [human] True \n", "11 [human] True \n", "12 [human] True \n", "13 [human] True \n", "14 [human] True \n", "15 [human] True \n", "16 [plant] True \n", "17 [human] True \n", "18 [human] False \n", "19 [human] True \n", "20 [human] True \n", "21 [human] True \n", "\n", " gpu_required min_vram_gb \n", "0 True 8 \n", "1 False 4 \n", "2 True 16 \n", "3 True 16 \n", "4 True 8 \n", "5 True 16 \n", "6 True 8 \n", "7 True 16 \n", "8 True 16 \n", "9 True 16 \n", "10 True 16 \n", "11 True 16 \n", "12 True 16 \n", "13 True 16 \n", "14 True 16 \n", "15 True 16 \n", "16 True 16 \n", "17 True 16 \n", "18 True 16 \n", "19 False 0 \n", "20 True 16 \n", "21 True 8 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# List all available models\n", "all_models = ov.fm.list_models()\n", "print(f\"Total models available: {all_models['count']}\")\n", "\n", "# Display as a table\n", "import pandas as pd\n", "df = pd.DataFrame(all_models['models'])\n", "df[['name', 'status', 'tasks', 'species', 'zero_shot', 'gpu_required', 'min_vram_gb']]" ] }, { "cell_type": "code", "execution_count": 3, "id": "313117f7", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Models supporting embedding: 22\n", " - scgpt species=['human', 'mouse'] zero_shot=True\n", " - geneformer species=['human'] zero_shot=True\n", " - uce species=['human', 'mouse', 'zebrafish', 'mouse_lemur', 'macaque', 'frog', 'pig'] zero_shot=True\n", " - scfoundation species=['human'] zero_shot=True\n", " - scbert species=['human'] zero_shot=True\n", " - genecompass species=['human', 'mouse'] zero_shot=True\n", " - cellplm species=['human'] zero_shot=True\n", " - nicheformer species=['human', 'mouse'] zero_shot=True\n", " - scmulan species=['human'] zero_shot=True\n", " - tgpt species=['human'] zero_shot=True\n", " - cellfm species=['human'] zero_shot=True\n", " - sccello species=['human'] zero_shot=True\n", " - scprint species=['human'] zero_shot=True\n", " - aidocell species=['human'] zero_shot=True\n", " - pulsar species=['human'] zero_shot=True\n", " - atacformer species=['human'] zero_shot=True\n", " - scplantllm species=['plant'] zero_shot=True\n", " - langcell species=['human'] zero_shot=True\n", " - cell2sentence species=['human'] zero_shot=False\n", " - genept species=['human'] zero_shot=True\n", " - chatcell species=['human'] zero_shot=True\n", " - tabula species=['human'] zero_shot=True\n" ] } ], "source": [ "# Filter by task: only models that support embedding\n", "embed_models = ov.fm.list_models(task=\"embed\")\n", "print(f\"Models supporting embedding: {embed_models['count']}\")\n", "for m in embed_models['models']:\n", " print(f\" - {m['name']:15s} species={m['species']} zero_shot={m['zero_shot']}\")" ] }, { "cell_type": "markdown", "id": "e57706d3", "metadata": { "tags": [] }, "source": [ "### Get detailed model information\n", "\n", "Use `ov.fm.describe_model()` to get full specifications for any model, including input/output contracts, hardware requirements, and documentation links." ] }, { "cell_type": "code", "execution_count": 4, "id": "0676b514", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== Model Info ===\n", "Name: scgpt\n", "Version: whole-human-2024\n", "Tasks: ['embed', 'integrate']\n", "Species: ['human', 'mouse']\n", "Embedding dim: 512\n", "Differentiator: Multi-modal transformer (RNA+ATAC+Spatial), attention-based gene interaction modeling\n", "\n", "=== Input Contract ===\n", "Gene ID scheme: symbol\n", "Gene ID notes: Uses HGNC gene symbols. Convert Ensembl IDs to symbols if needed.\n", "Preprocessing: Normalize to 1e4 via sc.pp.normalize_total, then bin into 51 expression bins.\n", "\n", "=== Output Contract ===\n", "Embedding key: obsm['X_scGPT']\n", "Embedding dim: 512\n" ] } ], "source": [ "# Get detailed information about scGPT\n", "info = ov.fm.describe_model(\"scgpt\")\n", "\n", "print(\"=== Model Info ===\")\n", "print(f\"Name: {info['model']['name']}\")\n", "print(f\"Version: {info['model']['version']}\")\n", "print(f\"Tasks: {info['model']['tasks']}\")\n", "print(f\"Species: {info['model']['species']}\")\n", "print(f\"Embedding dim: {info['model']['embedding_dim']}\")\n", "print(f\"Differentiator: {info['model']['differentiator']}\")\n", "\n", "print(\"\\n=== Input Contract ===\")\n", "print(f\"Gene ID scheme: {info['input_contract']['gene_id_scheme']}\")\n", "print(f\"Gene ID notes: {info['input_contract']['gene_id_notes']}\")\n", "print(f\"Preprocessing: {info['input_contract']['preprocessing']}\")\n", "\n", "print(\"\\n=== Output Contract ===\")\n", "print(f\"Embedding key: {info['output_contract']['embedding_key']}\")\n", "print(f\"Embedding dim: {info['output_contract']['embedding_dim']}\")" ] }, { "cell_type": "code", "execution_count": 5, "id": "80b855ac", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ModelEmbedding DimGene IDsSpeciesZero-shotGPU RequiredMin VRAM (GB)
0scgpt512symbolhuman, mouseTrueTrue8
1geneformer512ensemblhumanTrueFalse4
2uce1280symbolhuman, mouse, zebrafish, mouse_lemur, macaque,...TrueTrue16
3scfoundation512customhumanTrueTrue16
4cellplm512symbolhumanTrueTrue8
\n", "
" ], "text/plain": [ " Model Embedding Dim Gene IDs \\\n", "0 scgpt 512 symbol \n", "1 geneformer 512 ensembl \n", "2 uce 1280 symbol \n", "3 scfoundation 512 custom \n", "4 cellplm 512 symbol \n", "\n", " Species Zero-shot GPU Required \\\n", "0 human, mouse True True \n", "1 human True False \n", "2 human, mouse, zebrafish, mouse_lemur, macaque,... True True \n", "3 human True True \n", "4 human True True \n", "\n", " Min VRAM (GB) \n", "0 8 \n", "1 4 \n", "2 16 \n", "3 16 \n", "4 8 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Compare multiple models side by side\n", "models_to_compare = [\"scgpt\", \"geneformer\", \"uce\", \"scfoundation\", \"cellplm\"]\n", "comparison = []\n", "for name in models_to_compare:\n", " info = ov.fm.describe_model(name)\n", " m = info['model']\n", " comparison.append({\n", " 'Model': m['name'],\n", " 'Embedding Dim': m['embedding_dim'],\n", " 'Gene IDs': info['input_contract']['gene_id_scheme'],\n", " 'Species': ', '.join(m['species']),\n", " 'Zero-shot': m['zero_shot_embedding'],\n", " 'GPU Required': m['hardware']['gpu_required'],\n", " 'Min VRAM (GB)': m['hardware']['min_vram_gb'],\n", " })\n", "pd.DataFrame(comparison)" ] }, { "cell_type": "markdown", "id": "5e172352", "metadata": { "tags": [] }, "source": [ "## Step 2: Profile Your Data\n", "\n", "`ov.fm.profile_data()` automatically detects your dataset's species, gene identifier scheme, modality, and checks compatibility with all registered models.\n", "\n", "First, let's prepare a test dataset. **Important:** Most foundation models expect raw counts (non-negative values). We use `pbmc3k()` (unprocessed) rather than `pbmc3k_processed()` which contains scaled/negative values." ] }, { "cell_type": "code", "execution_count": 6, "id": "3e42e567", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: 2700 cells x 13714 genes\n", "Gene names (first 5): ['AL627309.1', 'AP006222.2', 'RP11-206L10.2', 'RP11-206L10.9', 'LINC00115']\n", "X range: [0.0, 419.0]\n" ] } ], "source": [ "# Load example PBMC dataset (raw counts)\n", "adata = sc.datasets.pbmc3k()\n", "sc.pp.filter_cells(adata, min_genes=200)\n", "sc.pp.filter_genes(adata, min_cells=3)\n", "print(f\"Dataset: {adata.n_obs} cells x {adata.n_vars} genes\")\n", "print(f\"Gene names (first 5): {adata.var_names[:5].tolist()}\")\n", "print(f\"X range: [{adata.X.min():.1f}, {adata.X.max():.1f}]\")\n", "\n", "# Save to h5ad for ov.fm workflow\n", "adata.write_h5ad(\"pbmc3k.h5ad\")" ] }, { "cell_type": "code", "execution_count": 7, "id": "bbd51edc", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== Data Profile ===\n", "Cells: 2,700\n", "Genes: 13,714\n", "Species: human (inferred)\n", "Gene scheme: symbol\n", "Modality: RNA\n", "Has raw counts: False\n", "Layers: []\n", "Batch columns: []\n", "Cell type columns: []\n" ] } ], "source": [ "# Profile the dataset\n", "profile = ov.fm.profile_data(\"pbmc3k.h5ad\")\n", "\n", "print(\"=== Data Profile ===\")\n", "print(f\"Cells: {profile['n_cells']:,}\")\n", "print(f\"Genes: {profile['n_genes']:,}\")\n", "print(f\"Species: {profile['species']}\")\n", "print(f\"Gene scheme: {profile['gene_scheme']}\")\n", "print(f\"Modality: {profile['modality']}\")\n", "print(f\"Has raw counts: {profile['has_raw']}\")\n", "print(f\"Layers: {profile['layers']}\")\n", "print(f\"Batch columns: {profile['batch_columns']}\")\n", "print(f\"Cell type columns: {profile['celltype_columns']}\")" ] }, { "cell_type": "code", "execution_count": 8, "id": "c53bba05", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " geneformer: ['Model requires Ensembl IDs']\n", " atacformer: [\"Modality 'RNA' not supported\"]\n", " scplantllm: [\"Species 'human' not supported\"]\n", "\n", "Compatible models (19): ['scgpt', 'uce', 'scfoundation', 'scbert', 'genecompass', 'cellplm', 'nicheformer', 'scmulan', 'tgpt', 'cellfm', 'sccello', 'scprint', 'aidocell', 'pulsar', 'langcell', 'cell2sentence', 'genept', 'chatcell', 'tabula']\n" ] } ], "source": [ "# Check compatibility with specific models\n", "compatible_models = []\n", "for name, compat in profile['model_compatibility'].items():\n", " if compat['compatible']:\n", " compatible_models.append(name)\n", " elif compat['issues']:\n", " print(f\" {name}: {compat['issues']}\")\n", "\n", "print(f\"\\nCompatible models ({len(compatible_models)}): {compatible_models}\")" ] }, { "cell_type": "markdown", "id": "7d87cb40", "metadata": { "tags": [] }, "source": [ "## Step 3: Automatic Model Selection\n", "\n", "`ov.fm.select_model()` analyzes your data and recommends the best model based on:\n", "- Species and gene ID compatibility\n", "- Task support and zero-shot capability\n", "- Hardware requirements\n", "- Adapter implementation readiness" ] }, { "cell_type": "code", "execution_count": 9, "id": "ea4adb89", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== Recommended ===\n", "Model: scgpt\n", "Rationale: fully implemented adapter; matches gene symbols; supports human; zero-shot embedding (no fine-tuning needed); CPU fallback available\n", "\n", "=== Fallback Options ===\n", " - cellplm: fully implemented adapter; matches gene symbols; supports human; zero-shot embedding (no fine-tuning needed); CPU fallback available\n", " - uce: fully implemented adapter; matches gene symbols; supports human; zero-shot embedding (no fine-tuning needed)\n", "\n", "Preprocessing: Normalize to 1e4 via sc.pp.normalize_total, then bin into 51 expression bins.\n" ] } ], "source": [ "# Auto-select the best model for embedding\n", "selection = ov.fm.select_model(\n", " \"pbmc3k.h5ad\",\n", " task=\"embed\",\n", " prefer_zero_shot=True,\n", ")\n", "\n", "print(\"=== Recommended ===\")\n", "print(f\"Model: {selection['recommended']['name']}\")\n", "print(f\"Rationale: {selection['recommended']['rationale']}\")\n", "\n", "print(\"\\n=== Fallback Options ===\")\n", "for fb in selection['fallbacks']:\n", " print(f\" - {fb['name']}: {fb['rationale']}\")\n", "\n", "print(f\"\\nPreprocessing: {selection['preprocessing_notes']}\")" ] }, { "cell_type": "code", "execution_count": 10, "id": "c7500456", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Best model for 8GB VRAM: scgpt\n", "Rationale: fully implemented adapter; matches gene symbols; supports human; zero-shot embedding (no fine-tuning needed); CPU fallback available\n" ] } ], "source": [ "# Select with VRAM constraint (e.g., 8 GB GPU)\n", "selection_8gb = ov.fm.select_model(\n", " \"pbmc3k.h5ad\",\n", " task=\"embed\",\n", " max_vram_gb=8,\n", ")\n", "print(f\"Best model for 8GB VRAM: {selection_8gb['recommended']['name']}\")\n", "print(f\"Rationale: {selection_8gb['recommended']['rationale']}\")" ] }, { "cell_type": "markdown", "id": "0634cc62", "metadata": { "tags": [] }, "source": [ "## Step 4: Validate Data-Model Compatibility\n", "\n", "Before running inference, use `ov.fm.preprocess_validate()` to check for potential issues and get auto-fix suggestions." ] }, { "cell_type": "code", "execution_count": 11, "id": "5fe7fa98", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Status: ready\n", "\n", "Diagnostics:\n", " [info] No raw counts found. Some models require unnormalized counts in .raw or layers['counts'].\n", "\n", "No preprocessing needed — data is ready!\n" ] } ], "source": [ "# Validate data compatibility with scGPT\n", "validation = ov.fm.preprocess_validate(\n", " \"pbmc3k.h5ad\",\n", " model_name=\"scgpt\",\n", " task=\"embed\",\n", ")\n", "\n", "print(f\"Status: {validation['status']}\")\n", "print(f\"\\nDiagnostics:\")\n", "for d in validation['diagnostics']:\n", " print(f\" [{d['severity']}] {d['message']}\")\n", "\n", "if validation['auto_fixes']:\n", " print(f\"\\nSuggested fixes:\")\n", " for fix in validation['auto_fixes']:\n", " print(f\" - {fix}\")\n", "else:\n", " print(\"\\nNo preprocessing needed — data is ready!\")" ] }, { "cell_type": "code", "execution_count": 12, "id": "6357081c", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Status: needs_preprocessing\n", " [warning] Data has gene symbols but model requires Ensembl IDs\n", " [info] No raw counts found. Some models require unnormalized counts in .raw or layers['counts'].\n" ] } ], "source": [ "# Validate with a model that requires Ensembl IDs (Geneformer)\n", "validation_gf = ov.fm.preprocess_validate(\n", " \"pbmc3k.h5ad\",\n", " model_name=\"geneformer\",\n", " task=\"embed\",\n", ")\n", "\n", "print(f\"Status: {validation_gf['status']}\")\n", "for d in validation_gf['diagnostics']:\n", " print(f\" [{d['severity']}] {d['message']}\")\n", "# Note: Geneformer requires Ensembl IDs; the diagnostic will flag this\n", "# if your data uses gene symbols" ] }, { "cell_type": "markdown", "id": "39953381", "metadata": { "tags": [] }, "source": [ "## Step 5: Run Foundation Model Inference\n", "\n", "`ov.fm.run()` is the core execution function. It:\n", "1. Validates data-model compatibility\n", "2. Loads the model and checkpoint\n", "3. Runs inference\n", "4. Writes results (embeddings/annotations) back to AnnData\n", "5. Records provenance metadata\n", "\n", "### 5a. Using the high-level `ov.fm.run()` API" ] }, { "cell_type": "code", "execution_count": 13, "id": "6285a8d2", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Loaded] Loaded vocabulary: 60,697 genes\n", "[Loaded] Loaded model config from args.json\n", "[ℹ️Info] Key Parameters Model Information:\n", " embsize: 512\n", " nheads: 8\n", " d_hid: 512\n", " nlayers: 12\n", " n_layers_cls: 3\n", "[Preprocessing] Analyzing model checkpoint for n_cls inference...\n", "[Warning] No classifier layers found in checkpoint\n", "[ℹ️Info] Using default n_cls=50\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[Warning] Loading compatible weights only\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[Loaded] Compatible weights loaded: 135/163\n", "[Warning] Some weights incompatible (28)\n", "[ℹ️Info] Model classes: 50\n", "[Loaded] Model ready on cuda\n", "[Preprocessing] Filtering genes by vocabulary\n", "[ℹ️Info] Matched 12300/13714 genes\n", "[Loaded] Retained 12300 genes\n", "[Loaded] Preprocessor initialized\n", " n_bins: 51, normalize: 10000.0\n", "[ℹ️Info] Data inspection - Mean: 2279.3, Range: [0.000, 419.000]\n", " [ℹ️Info] Auto-detected: raw counts\n", " [Loaded] Decision: applying normalization\n", " [Loaded] Will apply normalization\n", "[Preprocessing] Applying preprocessing pipeline\n", "Normalizing total counts ...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Log1p transforming ...\n", "Binning data ...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[Loaded] Preprocessing completed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[Loaded] Binned data: (2700, 12300), 51 unique values\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[🔬Cells] Data Summary:\n", " Cells: 2,700\n", " Genes: 12,300\n", "[Embedding] Starting get_embeddings...\n", " cells: 2,700\n", " genes: 12,300\n", "[Preprocessing] Filtering genes by vocabulary\n", "[ℹ️Info] Matched 12300/12300 genes\n", "[Loaded] Retained 12300 genes\n", "[ℹ️Info] Data already preprocessed, skipping\n", "[ℹ️Info] Using existing preprocessed data\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " Data shape: (2700, 12300)\n", " Data range: [0.000, 50.000]\n", " Gene IDs: 12300 genes mapped\n", " [Preprocessing] Tokenizing data...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " Tokenized: 2700 cells x 1200 tokens\n", " Created dataloader: 43 batches (batch_size=64)\n", " [Predicting] Running model inference...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r", "[scGPT] Prediction batches: 0%| | 0/43 [00:00" ] }, "metadata": { "image/png": { "height": 296, "width": 325 } }, "output_type": "display_data" } ], "source": [ "# Load results and visualize\n", "adata_scgpt = sc.read_h5ad(\"pbmc3k_scgpt.h5ad\")\n", "print(f\"Embedding key: X_scGPT\")\n", "print(f\"Embedding shape: {adata_scgpt.obsm['X_scGPT'].shape}\")\n", "\n", "# Compute UMAP from scGPT embeddings\n", "sc.pp.neighbors(adata_scgpt, use_rep='X_scGPT')\n", "sc.tl.umap(adata_scgpt)\n", "\n", "# Cluster for visualization\n", "sc.tl.leiden(adata_scgpt, resolution=0.5)\n", "sc.pl.umap(adata_scgpt, color=['leiden'], title='scGPT Embedding (PBMC 3k)')" ] }, { "cell_type": "markdown", "id": "32a6350b", "metadata": { "tags": [] }, "source": [ "### 5b. Using the low-level `ov.llm.SCLLMManager` API\n", "\n", "For more fine-grained control (fine-tuning, cell type annotation, custom preprocessing), you can use the model-specific `SCLLMManager` interface directly." ] }, { "cell_type": "code", "execution_count": 15, "id": "69258567", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "See ov.llm.SCLLMManager documentation for low-level API usage.\n" ] } ], "source": [ "# Low-level API: direct model access via ov.llm.SCLLMManager\n", "# This gives you finer-grained control over the model pipeline.\n", "#\n", "# manager = ov.llm.SCLLMManager(\n", "# model_type=\"scgpt\",\n", "# model_path=\"path/to/scgpt/checkpoint\",\n", "# )\n", "#\n", "# # Get embeddings with full control\n", "# adata = sc.read_h5ad(\"pbmc3k.h5ad\")\n", "# embeddings = manager.get_embeddings(adata, batch_size=64)\n", "# adata.obsm['X_scGPT'] = embeddings\n", "#\n", "# # Fine-tune on reference data\n", "# ref_adata = adata[adata.obs['celltype'].isin(['CD4 T', 'CD8 T', 'B'])]\n", "# manager.fine_tune(train_adata=ref_adata, task=\"annotation\", epochs=5)\n", "#\n", "# # Predict cell types\n", "# predictions = manager.predict_celltypes(adata)\n", "\n", "print(\"See ov.llm.SCLLMManager documentation for low-level API usage.\")" ] }, { "cell_type": "markdown", "id": "882d14ad", "metadata": { "tags": [] }, "source": [ "## Step 6: Interpret Results\n", "\n", "`ov.fm.interpret_results()` generates QA metrics for model outputs, including embedding dimensionality, silhouette scores, and provenance tracking." ] }, { "cell_type": "code", "execution_count": 16, "id": "f6e9817e", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== QA Metrics ===\n", "Cells: 2,700\n", "Genes: 12,300\n", "\n", "Embedding 'X_scGPT':\n", " Dimensions: 512\n", " Cells: 2700\n" ] } ], "source": [ "# Interpret results from the scGPT run\n", "interpretation = ov.fm.interpret_results(\n", " \"pbmc3k_scgpt.h5ad\",\n", " task=\"embed\",\n", ")\n", "\n", "print(\"=== QA Metrics ===\")\n", "print(f\"Cells: {interpretation['metrics']['n_cells']:,}\")\n", "print(f\"Genes: {interpretation['metrics']['n_genes']:,}\")\n", "\n", "if 'embeddings' in interpretation['metrics']:\n", " for key, info in interpretation['metrics']['embeddings'].items():\n", " print(f\"\\nEmbedding '{key}':\")\n", " print(f\" Dimensions: {info['dim']}\")\n", " print(f\" Cells: {info['n_cells']}\")\n", " if 'silhouette' in info:\n", " print(f\" Silhouette score: {info['silhouette']:.4f}\")\n", "\n", "if 'provenance' in interpretation['metrics']:\n", " print(f\"\\nProvenance: {interpretation['metrics']['provenance']}\")" ] }, { "cell_type": "markdown", "id": "10e96c7a", "metadata": { "tags": [] }, "source": [ "## Multi-Model Comparison\n", "\n", "One of the key strengths of `ov.fm` is the ability to run multiple models on the same dataset and compare results. The example below demonstrates this with models that have adapters installed. Models that are not installed will return a graceful error message instead of crashing.\n", "\n", "> **Note:** Each model requires its own dependencies and checkpoints. Install models following their respective documentation before running." ] }, { "cell_type": "code", "execution_count": 17, "id": "7ff3dbde", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "==================================================\n", "Running scgpt...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[Loaded] Loaded vocabulary: 60,697 genes\n", "[Loaded] Loaded model config from args.json\n", "[ℹ️Info] Key Parameters Model Information:\n", " embsize: 512\n", " nheads: 8\n", " d_hid: 512\n", " nlayers: 12\n", " n_layers_cls: 3\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[Preprocessing] Analyzing model checkpoint for n_cls inference...\n", "[Warning] No classifier layers found in checkpoint\n", "[ℹ️Info] Using default n_cls=50\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[Warning] Loading compatible weights only\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[Loaded] Compatible weights loaded: 135/163\n", "[Warning] Some weights incompatible (28)\n", "[ℹ️Info] Model classes: 50\n", "[Loaded] Model ready on cuda\n", "[Preprocessing] Filtering genes by vocabulary\n", "[ℹ️Info] Matched 12300/13714 genes\n", "[Loaded] Retained 12300 genes\n", "[Loaded] Preprocessor initialized\n", " n_bins: 51, normalize: 10000.0\n", "[ℹ️Info] Data inspection - Mean: 2279.3, Range: [0.000, 419.000]\n", " [ℹ️Info] Auto-detected: raw counts\n", " [Loaded] Decision: applying normalization\n", " [Loaded] Will apply normalization\n", "[Preprocessing] Applying preprocessing pipeline\n", "Normalizing total counts ...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Log1p transforming ...\n", "Binning data ...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[Loaded] Preprocessing completed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[Loaded] Binned data: (2700, 12300), 51 unique values\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[🔬Cells] Data Summary:\n", " Cells: 2,700\n", " Genes: 12,300\n", "[Embedding] Starting get_embeddings...\n", " cells: 2,700\n", " genes: 12,300\n", "[Preprocessing] Filtering genes by vocabulary\n", "[ℹ️Info] Matched 12300/12300 genes\n", "[Loaded] Retained 12300 genes\n", "[ℹ️Info] Data already preprocessed, skipping\n", "[ℹ️Info] Using existing preprocessed data\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " Data shape: (2700, 12300)\n", " Data range: [0.000, 50.000]\n", " Gene IDs: 12300 genes mapped\n", " [Preprocessing] Tokenizing data...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " Tokenized: 2700 cells x 1200 tokens\n", " Created dataloader: 43 batches (batch_size=64)\n", " [Predicting] Running model inference...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r", "[scGPT] Prediction batches: 0%| | 0/43 [00:00\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17aaf20>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17a9c60>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17a9cb0>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/pickle.py\", line 1008, in _batch_setitems\n", " tmp = list(islice(it, self._BATCHSIZE))\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17abd30>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17a8310>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17a8860>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17aa6b0>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17a9cb0>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/pickle.py\", line 1008, in _batch_setitems\n", " tmp = list(islice(it, self._BATCHSIZE))\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17a8310>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17a9c60>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17a9f80>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783eefbee2a0>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783eefbee980>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783eefbec4a0>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783eefbee520>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783eefbee980>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/pickle.py\", line 1008, in _batch_setitems\n", " tmp = list(islice(it, self._BATCHSIZE))\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee1ab2160>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783eefbee2a0>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783eefbee2a0>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 131, in \n", " return any((c.__module__, c.__name__) == ('numpy', 'ufunc') for c in obj_type.__mro__)\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee1ab2160>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/dill/_dill.py\", line 375, in save\n", " def save(self, obj, save_persistent_id=True):\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783eefbee700>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/torch/nn/modules/module.py\", line 515, in __init__\n", " super().__setattr__(\"_forward_pre_hooks\", OrderedDict())\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee1ab2160>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/torch/nn/modules/module.py\", line 515, in __init__\n", " super().__setattr__(\"_forward_pre_hooks\", OrderedDict())\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17a93f0>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/torch/nn/modules/module.py\", line 515, in __init__\n", " super().__setattr__(\"_forward_pre_hooks\", OrderedDict())\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17a81d0>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/torch/nn/modules/module.py\", line 515, in __init__\n", " super().__setattr__(\"_forward_pre_hooks\", OrderedDict())\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17aa520>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/torch/nn/modules/module.py\", line 515, in __init__\n", " super().__setattr__(\"_forward_pre_hooks\", OrderedDict())\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee17ab650>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/torch/nn/modules/module.py\", line 515, in __init__\n", " super().__setattr__(\"_forward_pre_hooks\", OrderedDict())\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee152b2e0>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/torch/nn/modules/module.py\", line 515, in __init__\n", " super().__setattr__(\"_forward_pre_hooks\", OrderedDict())\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee152a660>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/torch/nn/modules/module.py\", line 515, in __init__\n", " super().__setattr__(\"_forward_pre_hooks\", OrderedDict())\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee152be70>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/torch/nn/modules/module.py\", line 515, in __init__\n", " super().__setattr__(\"_forward_pre_hooks\", OrderedDict())\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783ee152b1f0>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/torch/nn/modules/module.py\", line 515, in __init__\n", " super().__setattr__(\"_forward_pre_hooks\", OrderedDict())\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception ignored in: <_io.BytesIO object at 0x783eefe294e0>\n", "Traceback (most recent call last):\n", " File \"/home/kblueleaf/micromamba/lib/python3.13/site-packages/torch/nn/modules/module.py\", line 515, in __init__\n", " super().__setattr__(\"_forward_pre_hooks\", OrderedDict())\n", "BufferError: Existing exports of data: object cannot be re-sized\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[Training] Extracting embeddings...\n", " [Loaded] Using all 2700 cells (preserving order)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at /home/kblueleaf/.cache/omicverse/models/geneformer and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']\n", "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d3b154fffde74115868983e3647cc4c6", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/14 [00:00" ] }, "metadata": { "image/png": { "height": 381, "width": 941 } }, "output_type": "display_data" } ], "source": [ "# Compare embeddings visually (only for successfully run models)\n", "import matplotlib.pyplot as plt\n", "import os\n", "\n", "successful_models = [m for m in models_to_run if os.path.exists(f\"pbmc3k_{m}.h5ad\") and 'error' not in results.get(m, {})]\n", "\n", "if successful_models:\n", " fig, axes = plt.subplots(1, len(successful_models), figsize=(6*len(successful_models), 5))\n", " if len(successful_models) == 1:\n", " axes = [axes]\n", "\n", " for i, model_name in enumerate(successful_models):\n", " adata_m = sc.read_h5ad(f\"pbmc3k_{model_name}.h5ad\")\n", " info = ov.fm.describe_model(model_name)\n", " emb_key = info['output_contract']['embedding_key'].split(\"'\")[1]\n", " \n", " sc.pp.neighbors(adata_m, use_rep=emb_key)\n", " sc.tl.umap(adata_m)\n", " sc.tl.leiden(adata_m, resolution=0.5)\n", " sc.pl.umap(adata_m, color='leiden', ax=axes[i], \n", " title=f\"{model_name} ({adata_m.obsm[emb_key].shape[1]}d)\",\n", " show=False)\n", "\n", " plt.tight_layout()\n", " plt.show()\n", "else:\n", " print(\"No models completed successfully for comparison.\")" ] }, { "cell_type": "code", "execution_count": 19, "id": "0c494b73", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ModelEmbedding DimSilhouette Score
0scgpt5120.1540
1uce12800.1691
\n", "
" ], "text/plain": [ " Model Embedding Dim Silhouette Score\n", "0 scgpt 512 0.1540\n", "1 uce 1280 0.1691" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Quantitative comparison: silhouette scores\n", "from sklearn.metrics import silhouette_score\n", "\n", "comparison_results = []\n", "for model_name in successful_models:\n", " adata_m = sc.read_h5ad(f\"pbmc3k_{model_name}.h5ad\")\n", " info = ov.fm.describe_model(model_name)\n", " emb_key = info['output_contract']['embedding_key'].split(\"'\")[1]\n", " emb = adata_m.obsm[emb_key]\n", " \n", " # Use leiden clusters for silhouette score\n", " if 'leiden' not in adata_m.obs:\n", " sc.pp.neighbors(adata_m, use_rep=emb_key)\n", " sc.tl.leiden(adata_m, resolution=0.5)\n", " \n", " sil = silhouette_score(emb, adata_m.obs['leiden'])\n", " comparison_results.append({\n", " 'Model': model_name,\n", " 'Embedding Dim': emb.shape[1],\n", " 'Silhouette Score': round(sil, 4),\n", " })\n", "\n", "pd.DataFrame(comparison_results)" ] }, { "cell_type": "markdown", "id": "9f1fd1e7", "metadata": { "tags": [] }, "source": [ "## Advanced: Custom Checkpoint Paths\n", "\n", "By default, `ov.fm` looks for checkpoints in `~/.cache/omicverse/models//`. You can override this via:\n", "- The `checkpoint_dir` parameter in `ov.fm.run()`\n", "- Environment variables: `OV_FM_CHECKPOINT_DIR_SCGPT`, `OV_FM_CHECKPOINT_DIR_GENEFORMER`, etc.\n", "- A global base directory: `OV_FM_CHECKPOINT_DIR` with model-named subfolders" ] }, { "cell_type": "code", "execution_count": 20, "id": "fdb22f30", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Adjust checkpoint_dir to your local path before running.\n" ] } ], "source": [ "# Using a custom checkpoint directory (example - adjust path for your setup)\n", "# result = ov.fm.run(\n", "# task=\"embed\",\n", "# model_name=\"scgpt\",\n", "# adata_path=\"pbmc3k.h5ad\",\n", "# output_path=\"pbmc3k_scgpt_custom.h5ad\",\n", "# checkpoint_dir=\"/path/to/my/scgpt/checkpoint\",\n", "# )\n", "print(\"Adjust checkpoint_dir to your local path before running.\")" ] }, { "cell_type": "markdown", "id": "86c91507", "metadata": { "tags": [] }, "source": [ "## Advanced: Conda Subprocess Isolation\n", "\n", "Some models have conflicting dependencies. `ov.fm` supports running models in isolated conda environments via subprocess. If a conda env named `scfm-` exists (e.g., `scfm-scgpt`), `ov.fm.run()` will automatically use it.\n", "\n", "```bash\n", "# Create an isolated environment for a model\n", "conda create -n scfm-scgpt python=3.10\n", "conda activate scfm-scgpt\n", "pip install omicverse scgpt\n", "```\n", "\n", "To disable conda subprocess execution:\n", "```python\n", "import os\n", "os.environ['OV_FM_DISABLE_CONDA_SUBPROCESS'] = '1'\n", "```" ] }, { "cell_type": "markdown", "id": "3491412f", "metadata": { "tags": [] }, "source": [ "## Advanced: Plugin System\n", "\n", "You can register custom models with `ov.fm` via the plugin system.\n", "\n", "### Entry-point plugins (pip packages)\n", "\n", "In your package's `pyproject.toml`:\n", "```toml\n", "[project.entry-points.\"omicverse.fm\"]\n", "my_model = \"my_package.fm_plugin:register\"\n", "```\n", "\n", "### Local plugins\n", "\n", "Place a Python file in `~/.omicverse/plugins/fm/`:" ] }, { "cell_type": "code", "execution_count": 21, "id": "a5d34bb1", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Registered custom models: ['my_custom_model']\n" ] } ], "source": [ "# Example: registering a custom model plugin\n", "from omicverse.fm.registry import ModelSpec, TaskType, Modality, SkillReadyStatus, OutputKeys\n", "\n", "# Define model spec\n", "my_spec = ModelSpec(\n", " name=\"my_custom_model\",\n", " version=\"v1.0\",\n", " skill_ready=SkillReadyStatus.READY,\n", " tasks=[TaskType.EMBED],\n", " modalities=[Modality.RNA],\n", " species=[\"human\"],\n", " output_keys=OutputKeys(embedding_key=\"X_my_model\"),\n", " embedding_dim=256,\n", ")\n", "\n", "# Register it\n", "registry = ov.fm.get_registry()\n", "registry.register(my_spec, source=\"user\")\n", "\n", "# Now it appears in list_models\n", "custom = [m['name'] for m in ov.fm.list_models()['models'] if 'custom' in m['name']]\n", "print(f\"Registered custom models: {custom}\")" ] }, { "cell_type": "markdown", "id": "1ad2a0c2", "metadata": { "tags": [] }, "source": [ "## Model Quick Reference\n", "\n", "| Model | Dim | Gene IDs | Species | Key Strength |\n", "|-------|-----|----------|---------|-------------|\n", "| **scGPT** | 512 | Symbol | human, mouse | Multi-modal (RNA+ATAC+Spatial), attention maps |\n", "| **Geneformer** | 512 | Ensembl | human | CPU-capable, rank-value encoding, network biology |\n", "| **UCE** | 1280 | Symbol | 7 species | Broadest species support, protein structure embeddings |\n", "| **scFoundation** | 512 | Custom | human | Perturbation/drug response, xTrimoGene architecture |\n", "| **CellPLM** | 512 | Symbol | human | Fastest inference, cell-centric (not gene-centric) |\n", "| **scBERT** | 200 | Symbol | human | Lightest model, 200-dim compact embeddings |\n", "| **Nicheformer** | 512 | Symbol | human, mouse | Spatial-aware, niche modeling |\n", "| **scMulan** | 512 | Symbol | human | Native multi-omics (RNA+ATAC+Protein) |\n", "\n", "For the full list of 22 models, run `ov.fm.list_models()`." ] }, { "cell_type": "markdown", "id": "70b6c616", "metadata": { "tags": [] }, "source": [ "## API Reference Summary\n", "\n", "| Function | Purpose |\n", "|----------|--------|\n", "| `ov.fm.list_models(task=)` | Browse available models, filter by task |\n", "| `ov.fm.describe_model(name)` | Get full model spec and I/O contract |\n", "| `ov.fm.profile_data(path)` | Auto-detect species, gene scheme, modality |\n", "| `ov.fm.select_model(path, task=)` | Recommend best model for your data |\n", "| `ov.fm.preprocess_validate(path, model, task)` | Check compatibility, get fix suggestions |\n", "| `ov.fm.run(task=, model_name=, adata_path=)` | Execute inference |\n", "| `ov.fm.interpret_results(path, task=)` | Generate QA metrics and visualizations |" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.12" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "07441f60db6345d9a47d85545e6b7a51": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_3f80395546dc4a9dbc1ee4dd00f22927", "placeholder": "​", "style": "IPY_MODEL_7c3a66c474bc4192a0019e65a5cc70d8", "tabbable": null, "tooltip": null, "value": " 0/14 [00:00<?, ?it/s]" } }, "1ddddc0dd1484ab5a07964914ae57b82": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "24dfd6753ad741b0ac1464cc72240000": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null } }, "2af5b64aedc74d4cb6d61ea1cf7cfd33": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null } }, "3f80395546dc4a9dbc1ee4dd00f22927": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "45e40b9504d04f039eb6a98d6fa8d912": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "52b52d76e8d042279f9e4389cbe2cb1e": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "6f7294b7a611440cb8df12e8e91f8da9": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "ProgressView", "bar_style": "danger", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_e57428dc35c84c438c101952219b1a0e", "max": 14.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_e66b4ccebc0f4e9dad45815290620643", "tabbable": null, "tooltip": null, "value": 0.0 } }, "7012f8a9abe04034b109c6fda95fe3ae": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_72dbda6b33444dcf8c66033849bf741f", "placeholder": "​", "style": "IPY_MODEL_2af5b64aedc74d4cb6d61ea1cf7cfd33", "tabbable": null, "tooltip": null, "value": "UCE inference: 100%" } }, "72dbda6b33444dcf8c66033849bf741f": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "7c3a66c474bc4192a0019e65a5cc70d8": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null } }, "7eb2c8e19544419e9b465be35c6a0991": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "8ec52279837345b1bdf2dbe2e6828c1e": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "93f144baa64f414aa12648c9281376ca": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_52b52d76e8d042279f9e4389cbe2cb1e", "max": 108.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_7eb2c8e19544419e9b465be35c6a0991", "tabbable": null, "tooltip": null, "value": 108.0 } }, "adbf88b9ee754a7c82bb1480d9272135": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "cd6312ae8fa2456888247e6f300c3abd": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_7012f8a9abe04034b109c6fda95fe3ae", "IPY_MODEL_93f144baa64f414aa12648c9281376ca", "IPY_MODEL_f7b943e6cd1b42339243f0f0190f99ff" ], "layout": "IPY_MODEL_1ddddc0dd1484ab5a07964914ae57b82", "tabbable": null, "tooltip": null } }, "d3b154fffde74115868983e3647cc4c6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_f5526bec13d84e4bbae830e033522edc", "IPY_MODEL_6f7294b7a611440cb8df12e8e91f8da9", "IPY_MODEL_07441f60db6345d9a47d85545e6b7a51" ], "layout": "IPY_MODEL_8ec52279837345b1bdf2dbe2e6828c1e", "tabbable": null, "tooltip": null } }, "e57428dc35c84c438c101952219b1a0e": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "e66b4ccebc0f4e9dad45815290620643": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "f5526bec13d84e4bbae830e033522edc": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_adbf88b9ee754a7c82bb1480d9272135", "placeholder": "​", "style": "IPY_MODEL_24dfd6753ad741b0ac1464cc72240000", "tabbable": null, "tooltip": null, "value": "  0%" } }, "f7b943e6cd1b42339243f0f0190f99ff": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_45e40b9504d04f039eb6a98d6fa8d912", "placeholder": "​", "style": "IPY_MODEL_f95f06009e5f43d4b14cb1cca9c9da22", "tabbable": null, "tooltip": null, "value": " 108/108 [00:57<00:00,  2.00it/s]" } }, "f95f06009e5f43d4b14cb1cca9c9da22": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null } } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }