GenePT

⚠️ Status: partial | Version: v1.0


Overview

API-based GPT-3.5 gene embeddings (1536-dim), no local GPU required, gene-level (not cell-level)

!!! tip “When to choose GenePT”

User wants gene-level embeddings (not cell-level), has no local GPU, or wants API-based OpenAI embeddings

Specifications

Property

Value

Model

GenePT

Version

v1.0

Tasks

embed

Modalities

RNA

Species

human

Gene IDs

symbol

Embedding Dim

1536

GPU Required

No

Min VRAM

0 GB

Recommended VRAM

0 GB

CPU Fallback

Yes

Adapter Status

⚠️ partial


Quick Start

import omicverse as ov

# 1. Check model spec
info = ov.fm.describe_model("genept")

# 2. Profile your data
profile = ov.fm.profile_data("your_data.h5ad")

# 3. Validate compatibility
check = ov.fm.preprocess_validate("your_data.h5ad", "genept", "embed")

# 4. Run inference
result = ov.fm.run(
    task="embed",
    model_name="genept",
    adata_path="your_data.h5ad",
    output_path="output_genept.h5ad",
    device="auto",
)

# 5. Interpret results
metrics = ov.fm.interpret_results("output_genept.h5ad", task="embed")

Input Requirements

Requirement

Detail

Gene ID scheme

symbol

Preprocessing

No local preprocessing needed. Requires OpenAI API key for embedding generation.

Data format

AnnData (.h5ad)


Output Keys

After running ov.fm.run(), results are stored in the AnnData object:

Key

Location

Description

X_genept

adata.obsm

Cell embeddings (1536-dim)

import scanpy as sc

adata = sc.read_h5ad("output_genept.h5ad")
embeddings = adata.obsm["X_genept"]  # shape: (n_cells, 1536)

# Downstream analysis
sc.pp.neighbors(adata, use_rep="X_genept")
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution=0.5)
sc.pl.umap(adata, color=["leiden"])

Resources


Hands-On Tutorial

For a step-by-step walkthrough with code, see the GenePT Tutorial Notebook.