# Tabula

⚠️ **Status:** partial | **Version:** federated-v1

---

## Overview

Privacy-preserving federated learning + tabular transformer, 60697 gene vocabulary, quantile-binned expression, FlashAttention

!!! tip "When to choose Tabula"

    User needs privacy-preserving analysis, federated-trained embeddings, or perturbation prediction with tabular modeling approach

---

## Specifications

| Property | Value |
|----------|-------|
| **Model** | Tabula |
| **Version** | federated-v1 |
| **Tasks** | `embed`, `annotate`, `integrate`, `perturb` |
| **Modalities** | RNA |
| **Species** | human |
| **Gene IDs** | custom (60,697 gene vocabulary) |
| **Embedding Dim** | 192 |
| **GPU Required** | Yes |
| **Min VRAM** | 8 GB |
| **Recommended VRAM** | 16 GB |
| **CPU Fallback** | No |
| **Adapter Status** | ⚠️ partial |

---

## Quick Start

```python
import omicverse as ov

# 1. Check model spec
info = ov.fm.describe_model("tabula")

# 2. Profile your data
profile = ov.fm.profile_data("your_data.h5ad")

# 3. Validate compatibility
check = ov.fm.preprocess_validate("your_data.h5ad", "tabula", "embed")

# 4. Run inference
result = ov.fm.run(
    task="embed",
    model_name="tabula",
    adata_path="your_data.h5ad",
    output_path="output_tabula.h5ad",
    device="auto",
)

# 5. Interpret results
metrics = ov.fm.interpret_results("output_tabula.h5ad", task="embed")
```

---

## Input Requirements

| Requirement | Detail |
|-------------|--------|
| **Gene ID scheme** | custom (60,697 gene vocabulary) |
| **Preprocessing** | Gene expression is quantile-binned. Model uses its own 60,697 gene vocabulary for tokenization. |
| **Data format** | AnnData (`.h5ad`) |
| **Batch key** | `.obs` column for batch integration (optional) |
| **Label key** | `.obs` column for cell type labels (optional) |

---

## Output Keys

After running `ov.fm.run()`, results are stored in the AnnData object:

| Key | Location | Description |
|-----|----------|-------------|
| `X_tabula` | `adata.obsm` | Cell embeddings (192-dim) |
| `tabula_pred` | `adata.obs` | Predicted cell type labels |

```python
import scanpy as sc

adata = sc.read_h5ad("output_tabula.h5ad")
embeddings = adata.obsm["X_tabula"]  # shape: (n_cells, 192)

# Downstream analysis
sc.pp.neighbors(adata, use_rep="X_tabula")
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution=0.5)
sc.pl.umap(adata, color=["leiden"])
```

---

## Resources

- **Repository / Checkpoint:** [https://github.com/aristoteleo/tabula](https://github.com/aristoteleo/tabula)
- **License:** Check upstream LICENSE

---

## Hands-On Tutorial

For a step-by-step walkthrough with code, see the [Tabula Tutorial Notebook](t_fm_tabula.ipynb).