Velocity Basic Calculation¶

Identifying unspliced and spliced reads allows formulating a dynamical model describing splicing kinetics and inferring the corresponding model weights based on single cell data. The change in spliced RNA described by the model is called RNA velocity. Current models of RNA velocity assume the gene-specific model

$$ \begin{aligned} \frac{du_g}{dt} &= \alpha_g - \beta_g u_g\\ \frac{ds_g}{dt} &= \beta_g u_g - \gamma_g s_g, \end{aligned} $$

with transcription rate $\alpha_g$, splicing rate $\beta_g$, and degradation rate $\gamma_g$ of spliced RNA. While the kinetics of each gene are modelled independent of each other, we will drop the index $g$ for notational simplicity. Even though the field of parameter estimation in dynamical systems is well studied, inference algorithms require the time associated with each observation to be known. Consequently, these traditional methods cannot be applied to infer RNA velocity and its model parameters in the context of scRNA-seq data.

In [1]:

Copied!





import scanpy as sc
import scvelo as scv
import omicverse as ov
ov.plot_set(font_path='Arial')

%load_ext autoreload
%autoreload 2
import scanpy as sc
import scvelo as scv
import omicverse as ov
ov.plot_set(font_path='Arial')

%load_ext autoreload
%autoreload 2

🔬 Starting plot initialization...
Using already downloaded Arial font from: /tmp/omicverse_arial.ttf

/home/groups/xiaojie/steorra/env/omicverse/lib/python3.10/site-packages/IPython/core/pylabtools.py:77: DeprecationWarning: backend2gui is deprecated since IPython 8.24, backends are managed in matplotlib and can be externally registered.
  warnings.warn(

Registered as: Arial
🧬 Detecting GPU devices…
✅ NVIDIA CUDA GPUs detected: 1
    • [CUDA 0] NVIDIA H100 80GB HBM3
      Memory: 79.1 GB | Compute: 9.0

   ____            _     _    __                  
  / __ \____ ___  (_)___| |  / /__  _____________ 
 / / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \ 
/ /_/ / / / / / / / /__ | |/ /  __/ /  (__  )  __/ 
\____/_/ /_/ /_/_/\___/ |___/\___/_/  /____/\___/                                              

🔖 Version: 1.7.8rc1   📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.

scVelo¶

scVelo is a scalable toolkit for RNA velocity analysis in single cells; RNA velocity enables the recovery of directed dynamic information by leveraging splicing kinetics [Manno et al., 2018]. scVelo collects different methods for inferring RNA velocity using an expectation-maximization framework [Bergen et al., 2020] or metabolically labeled transcripts [Weiler et al., 2024].

In [54]:

Copied!

adata = scv.datasets.dentategyrus()
adata
adata = scv.datasets.dentategyrus()
adata

Out[54]:

AnnData object with n_obs × n_vars = 2930 × 13913
    obs: 'clusters', 'age(days)', 'clusters_enlarged'
    uns: 'clusters_colors'
    obsm: 'X_umap'
    layers: 'ambiguous', 'spliced', 'unspliced'

In [55]:

Copied!

velo_obj=ov.single.Velo(adata)
velo_obj=ov.single.Velo(adata)

In Velo module, you should keep all genes' expression not normalized.

In [56]:

Copied!

velo_obj.filter_genes(min_shared_counts=20)
velo_obj.preprocess()
velo_obj.filter_genes(min_shared_counts=20)
velo_obj.preprocess()

Filtered out 10340 genes that are detected 20 counts (shared).
|-----> Running monocle preprocessing pipeline...
|-----------> filtered out 13 outlier cells
|-----------> filtered out 797 outlier genes
|-----> PCA dimension reduction
|-----> <insert> X_pca to obsm in AnnData Object.
|-----> computing cell phase...
|-----> [Cell Phase Estimation] completed [1717.6541s]
|-----> [Cell Cycle Scores Estimation] completed [0.1534s]
|-----> [Preprocessor-monocle] completed [1.1099s]
🖥️ Using Scanpy CPU to calculate neighbors...

🔍 K-Nearest Neighbors Graph Construction:
   Mode: cpu
   Neighbors: 30
   Method: umap
   Metric: euclidean
   Representation: X_pca
   PCs used: 30
computing neighbors
   🔍 Computing neighbor distances...
   🔍 Computing connectivity matrix...
   💡 Using UMAP-style connectivity
   ✓ Graph is fully connected

✅ KNN Graph Construction Completed Successfully!
   ✓ Processed: 2,917 cells with 30 neighbors each
   ✓ Results added to AnnData object:
     • 'neighbors': Neighbors metadata (adata.uns)
     • 'distances': Distance matrix (adata.obsp)
     • 'connectivities': Connectivity matrix (adata.obsp)
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:00)

In [ ]:

Copied!

velo_obj.moments(backend='scvelo')
velo_obj.dynamics(backend='scvelo',n_jobs=8)
velo_obj.moments(backend='scvelo')
velo_obj.dynamics(backend='scvelo',n_jobs=8)

In [58]:

Copied!

velo_obj.cal_velocity(method='scvelo')
velo_obj.cal_velocity(method='scvelo')

computing velocities
    finished (0:00:00) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)

/home/groups/xiaojie/steorra/env/omicverse/lib/python3.10/site-packages/scvelo/tools/optimization.py:184: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  gamma[i] = np.linalg.pinv(A.T.dot(A)).dot(A.T.dot(y[:, i]))

In [ ]:

Copied!





velo_obj.velocity_graph(
    vkey='velocity_S',
    xkey='Ms',
    n_jobs=8,
)
velo_obj.velocity_graph(
    vkey='velocity_S',
    xkey='Ms',
    n_jobs=8,
)

In [60]:

Copied!





velo_obj.velocity_embedding(
    basis='umap',
    vkey='velocity_S',
)
velo_obj.velocity_embedding(
    basis='umap',
    vkey='velocity_S',
)

computing velocity embedding
    finished (0:00:00) --> added
    'velocity_S_umap', embedded velocity vectors (adata.obsm)

In [61]:

Copied!





fig = ov.plt.figure(figsize=(4, 4))
ax = ov.plt.subplot(1, 1, 1)
ov.pl.embedding(
    adata,
    basis='X_umap',
    color='clusters',
    ax=ax,
    show=False,
    size=100,
    alpha=0.3
)
ov.pl.add_streamplot(
    adata,
    basis='X_umap',
    velocity_key='velocity_S_umap',
    ax=ax,
)
ov.plt.title('Velocity: scVelo')
fig = ov.plt.figure(figsize=(4, 4))
ax = ov.plt.subplot(1, 1, 1)
ov.pl.embedding(
    adata,
    basis='X_umap',
    color='clusters',
    ax=ax,
    show=False,
    size=100,
    alpha=0.3
)
ov.pl.add_streamplot(
    adata,
    basis='X_umap',
    velocity_key='velocity_S_umap',
    ax=ax,
)
ov.plt.title('Velocity: scVelo')

Out[61]:

Text(0.5, 1.0, 'Velocity: scVelo')

In [62]:

Copied!

velo_obj.adata.write('data/velo/den_scvelo.h5ad')
velo_obj.adata.write('data/velo/den_scvelo.h5ad')

Dynamo¶

Single-cell (sc)RNA-seq, together with RNA velocity and metabolic labeling, reveals cellular states and transitions at unprecedented resolution. Fully exploiting these data, however, requires kinetic models capable of unveiling governing regulatory functions. Here, we introduce an analytical framework dynamo, which infers absolute RNA velocity, reconstructs continuous vector fields that predict cell fates, employs differential geometry to extract underlying regulations, and ultimately predicts optimal reprogramming paths and perturbation outcomes. We highlight dynamo’s power to overcome fundamental limitations of conventional splicing-based RNA velocity analyses to enable accurate velocity estimations on a metabolically labeled human hematopoiesis scRNA-seq dataset. Furthermore, differential geometry analyses reveal mechanisms driving early megakaryocyte appearance and elucidate asymmetrical regulation within the PU.1-GATA1 circuit. Leveraging the least-action-path method, dynamo accurately predicts drivers of numerous hematopoietic transitions. Finally, in silico perturbations predict cell-fate diversions induced by gene perturbations. Dynamo, thus, represents an important step in advancing quantitative and predictive theories of cell-state transitions.

Cite: Qiu, X., Zhang, Y., Martin-Rufino, J. D., Weng, C., Hosseinzadeh, S., Yang, D., ... & Weissman, J. S. (2022). Mapping transcriptomic vector fields of single cells. Cell, 185(4), 690-711.

In [63]:

Copied!





adata = scv.datasets.dentategyrus()
velo_obj=ov.single.Velo(adata)
velo_obj.filter_genes(min_shared_counts=20)
velo_obj.preprocess()
velo_obj.moments(backend='dynamo')
velo_obj.dynamics(backend='dynamo')
adata = scv.datasets.dentategyrus()
velo_obj=ov.single.Velo(adata)
velo_obj.filter_genes(min_shared_counts=20)
velo_obj.preprocess()
velo_obj.moments(backend='dynamo')
velo_obj.dynamics(backend='dynamo')

In Velo module, you should keep all genes' expression not normalized.
Filtered out 10340 genes that are detected 20 counts (shared).
|-----> Running monocle preprocessing pipeline...
|-----------> filtered out 13 outlier cells
|-----------> filtered out 797 outlier genes
|-----> PCA dimension reduction
|-----> <insert> X_pca to obsm in AnnData Object.
|-----> computing cell phase...
|-----> [Cell Phase Estimation] completed [102.7593s]
|-----> [Cell Cycle Scores Estimation] completed [0.1182s]
|-----> [Preprocessor-monocle] completed [1.0272s]
🖥️ Using Scanpy CPU to calculate neighbors...

🔍 K-Nearest Neighbors Graph Construction:
   Mode: cpu
   Neighbors: 30
   Method: umap
   Metric: euclidean
   Representation: X_pca
   PCs used: 30
computing neighbors
   🔍 Computing neighbor distances...
   🔍 Computing connectivity matrix...
   💡 Using UMAP-style connectivity
   ✓ Graph is fully connected

✅ KNN Graph Construction Completed Successfully!
   ✓ Processed: 2,917 cells with 30 neighbors each
   ✓ Results added to AnnData object:
     • 'neighbors': Neighbors metadata (adata.uns)
     • 'distances': Distance matrix (adata.obsp)
     • 'connectivities': Connectivity matrix (adata.obsp)
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:00)
|-----> calculating first/second moments...
|-----> [moments calculation] completed [8.0277s]
|-----> dynamics_del_2nd_moments_key is None. Using default value from DynamoAdataConfig: dynamics_del_2nd_moments_key=False

stimating gamma: 100%|██████████| 2000/2000 [00:55<00:00, 36.24it/s]

In [64]:

Copied!

velo_obj.cal_velocity(method='dynamo')
velo_obj.cal_velocity(method='dynamo')

|-----> [calculating transition matrix via pearson kernel with sqrt transform.] completed [1.5918s]
|-----> [projecting velocity vector to low dimensional embedding] completed [0.3863s]
|-----> method arg is None, choosing methods automatically...
|-----------> method kd_tree selected

In [65]:

Copied!





fig = ov.plt.figure(figsize=(4, 4))
ax = ov.plt.subplot(1, 1, 1)
ov.pl.embedding(
    adata,
    basis='X_umap',
    color='clusters',
    ax=ax,
    show=False,
    size=100,
    alpha=0.3
)
ov.pl.add_streamplot(
    adata,
    basis='X_umap',
    velocity_key='velocity_umap',
    ax=ax,
)
ov.plt.title('Velocity: Dynamo')
fig = ov.plt.figure(figsize=(4, 4))
ax = ov.plt.subplot(1, 1, 1)
ov.pl.embedding(
    adata,
    basis='X_umap',
    color='clusters',
    ax=ax,
    show=False,
    size=100,
    alpha=0.3
)
ov.pl.add_streamplot(
    adata,
    basis='X_umap',
    velocity_key='velocity_umap',
    ax=ax,
)
ov.plt.title('Velocity: Dynamo')

Out[65]:

Text(0.5, 1.0, 'Velocity: Dynamo')

In [66]:

Copied!

velo_obj.adata.write('data/velo/den_dynamo.h5ad')
velo_obj.adata.write('data/velo/den_dynamo.h5ad')

LatentVelo¶

Gene expression dynamics provide directional information for trajectory inference from single-cell RNA-sequencing data. Traditional approaches compute local RNA velocity using strict assumptions about the equations describing transcription and splicing of RNA. Not surprisingly, these approaches fail where these assumptions are violated, such as in multiple lineages with distinct gene dynamics or time-dependent kinetic rates of transcription and splicing. In this work we present “LatentVelo”, a novel approach to compute a low-dimensional representation of gene dynamics with deep learning. Our approach embeds cells into a latent space with a variational auto-encoder, and describes differentiation dynamics on this latent space with neural ordinary differential equations.

Cite: Farrell, S., Mani, M., & Goyal, S. (2023). Inferring single-cell transcriptomic dynamics with structured latent gene expression dynamics. Cell Reports Methods, 3(9).

In [18]:

Copied!

adata = scv.datasets.dentategyrus()
velo_obj=ov.single.Velo(adata)
velo_obj.filter_genes(min_shared_counts=20)
adata = scv.datasets.dentategyrus()
velo_obj=ov.single.Velo(adata)
velo_obj.filter_genes(min_shared_counts=20)

In Velo module, you should keep all genes' expression not normalized.
Filtered out 10340 genes that are detected 20 counts (shared).

In [19]:

Copied!





velo_obj.cal_velocity(
    method='latentvelo',device='cuda',
    velocity_key='velo_latentvelo',
    batch_size = 100, learning_rate=1e-2,
    epochs=50, param_name_key='result/dentate_gyrus', 
    grad_clip=100,
)
velo_obj.cal_velocity(
    method='latentvelo',device='cuda',
    velocity_key='velo_latentvelo',
    batch_size = 100, learning_rate=1e-2,
    epochs=50, param_name_key='result/dentate_gyrus', 
    grad_clip=100,
)

LatentVelo trainer using device: cuda
LatentVelo trainer(ANVI) using device: cuda
LatentVelo trainer(ATAC) using device: cuda
LatentVelo output using device: cuda
LatentVelo utils using device: cuda
Extracted 2000 highly variable genes.
Choosing top 2000 genes
computing PCA🔍
    with n_comps=50
   🖥️ Using sklearn PCA for CPU computation
   🖥️ sklearn PCA backend: CPU computation
    finished✅ (0:00:01)
🖥️ Using Scanpy CPU to calculate neighbors...

🔍 K-Nearest Neighbors Graph Construction:
   Mode: cpu
   Neighbors: 30
   Method: umap
   Metric: euclidean
   PCs used: 30
computing neighbors
   🔍 Computing neighbor distances...
    using 'X_pca' with n_pcs = 30
   🔍 Computing connectivity matrix...
   💡 Using UMAP-style connectivity
   ✓ Graph is fully connected

✅ KNN Graph Construction Completed Successfully!
   ✓ Processed: 2,930 cells with 30 neighbors each
   ✓ Results added to AnnData object:
     • 'neighbors': Neighbors metadata (adata.uns)
     • 'distances': Distance matrix (adata.obsp)
     • 'connectivities': Connectivity matrix (adata.obsp)
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:00)
computing moments based on connectivities
    finished (0:00:00) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
Warning, folder already exists. This may overwrite a previous fit.
860 velocity genes used

Loading best model at 49 epochs.

In [ ]:

Copied!





velo_obj.velocity_graph(
    vkey='velo_latentvelo',
    xkey='Ms',
    n_jobs=8,
)
velo_obj.velocity_graph(
    vkey='velo_latentvelo',
    xkey='Ms',
    n_jobs=8,
)

In [21]:

Copied!





velo_obj.velocity_embedding(
    basis='umap',
    vkey='velo_latentvelo',
)
velo_obj.velocity_embedding(
    basis='umap',
    vkey='velo_latentvelo',
)

computing velocity embedding
    finished (0:00:00) --> added
    'velo_latentvelo_umap', embedded velocity vectors (adata.obsm)

In [22]:

Copied!





fig = ov.plt.figure(figsize=(4, 4))
ax = ov.plt.subplot(1, 1, 1)
ov.pl.embedding(
    velo_obj.adata,
    basis='X_umap',
    color='clusters',
    ax=ax,
    show=False,
    size=100,
    alpha=0.3
)
ov.pl.add_streamplot(
    velo_obj.adata,
    basis='X_umap',
    velocity_key='velo_latentvelo_umap',
    ax=ax,
)
ov.plt.title('Velocity: LatentVelo')
fig = ov.plt.figure(figsize=(4, 4))
ax = ov.plt.subplot(1, 1, 1)
ov.pl.embedding(
    velo_obj.adata,
    basis='X_umap',
    color='clusters',
    ax=ax,
    show=False,
    size=100,
    alpha=0.3
)
ov.pl.add_streamplot(
    velo_obj.adata,
    basis='X_umap',
    velocity_key='velo_latentvelo_umap',
    ax=ax,
)
ov.plt.title('Velocity: LatentVelo')

Out[22]:

Text(0.5, 1.0, 'Velocity: LatentVelo')

In [23]:

Copied!

velo_obj.adata.write('data/velo/den_latentvelo.h5ad')
velo_obj.adata.write('data/velo/den_latentvelo.h5ad')

GraphVelo¶

RNA velocities and generalizations emerge as powerful approaches for extracting time-resolved information from high-throughput snapshot single-cell data. Yet, several inherent limitations restrict applying the approaches to genes not suitable for RNA velocity inference due to complex transcriptional dynamics, low expression, or lacking splicing dynamics, or data of non-transcriptomic modality. Here, we present GraphVelo, a graph-based machine learning procedure that uses as input the RNA velocities inferred from existing methods and infers velocity vectors lying in the tangent space of the low-dimensional manifold formed by the single cell data. GraphVelo preserves vector magnitude and direction information during transformations across different data representations. Tests on synthetic and experimental single-cell data, including viral-host interactome, multi-omics, and spatial genomics datasets demonstrate that GraphVelo, together with downstream generalized dynamo analyses, extends RNA velocities to multi-modal data and reveals quantitative nonlinear regulation relations between genes, virus, and host cells, and different layers of gene regulation.

Cite: Chen, Y., Zhang, Y., Gan, J. et al. GraphVelo allows for accurate inference of multimodal velocities and molecular mechanisms for single cells. Nat Commun 16, 7831 (2025). https://doi.org/10.1038/s41467-025-62784-w

In [2]:

Copied!





adata = scv.datasets.dentategyrus()
velo_obj=ov.single.Velo(adata)
velo_obj.filter_genes(min_shared_counts=20)
velo_obj.preprocess()
velo_obj.moments(backend='dynamo')
velo_obj.dynamics(backend='dynamo')
adata = scv.datasets.dentategyrus()
velo_obj=ov.single.Velo(adata)
velo_obj.filter_genes(min_shared_counts=20)
velo_obj.preprocess()
velo_obj.moments(backend='dynamo')
velo_obj.dynamics(backend='dynamo')

In Velo module, you should keep all genes' expression not normalized.
Filtered out 10340 genes that are detected 20 counts (shared).
|-----> Running monocle preprocessing pipeline...
|-----------> filtered out 13 outlier cells
|-----------> filtered out 797 outlier genes
|-----> PCA dimension reduction
|-----> <insert> X_pca to obsm in AnnData Object.
|-----> computing cell phase...
|-----> [Cell Phase Estimation] completed [1.4094s]
|-----> [Cell Cycle Scores Estimation] completed [0.1576s]
|-----> [Preprocessor-monocle] completed [1.1387s]
🖥️ Using Scanpy CPU to calculate neighbors...

🔍 K-Nearest Neighbors Graph Construction:
   Mode: cpu
   Neighbors: 30
   Method: umap
   Metric: euclidean
   Representation: X_pca
   PCs used: 30
computing neighbors
   🔍 Computing neighbor distances...
   🔍 Computing connectivity matrix...
   💡 Using UMAP-style connectivity
   ✓ Graph is fully connected

✅ KNN Graph Construction Completed Successfully!
   ✓ Processed: 2,917 cells with 30 neighbors each
   ✓ Results added to AnnData object:
     • 'neighbors': Neighbors metadata (adata.uns)
     • 'distances': Distance matrix (adata.obsp)
     • 'connectivities': Connectivity matrix (adata.obsp)
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:01)
|-----> calculating first/second moments...
|-----> [moments calculation] completed [11.2909s]
|-----> dynamics_del_2nd_moments_key is None. Using default value from DynamoAdataConfig: dynamics_del_2nd_moments_key=False

stimating gamma: 100%|██████████| 2000/2000 [00:55<00:00, 36.14it/s]

In [4]:

Copied!





velo_obj.cal_velocity(
    method='graphvelo',
    velocity_key='velo_graphvelo',
    n_jobs=1,
)
velo_obj.cal_velocity(
    method='graphvelo',
    velocity_key='velo_graphvelo',
    n_jobs=1,
)

|-----> Start computing neighbor graph...
|-----------> X_data is None, fetching or recomputing...
|-----> fetching X data from layer:None, basis:pca
|-----> method arg is None, choosing methods automatically...
|-----------> method ball_tree selected
Using existing pearson_transition_matrix found in .obsp.
|-----> [projecting velocity vector to low dimensional embedding] completed [0.3904s]
|-----> method arg is None, choosing methods automatically...
|-----------> method kd_tree selected




rojecting velocity vector to low dimensional embedding: 100%|██████████| 2917/2917 [00:00<00:00, 11122.38it/s]

In [ ]:

Copied!





velo_obj.velocity_graph(
    vkey='velo_graphvelo',
    xkey='Ms',
    n_jobs=8,
)
velo_obj.velocity_graph(
    vkey='velo_graphvelo',
    xkey='Ms',
    n_jobs=8,
)

In [6]:

Copied!





velo_obj.velocity_embedding(
    basis='umap',
    vkey='velo_graphvelo',
)
velo_obj.velocity_embedding(
    basis='umap',
    vkey='velo_graphvelo',
)

computing velocity embedding
    finished (0:00:00) --> added
    'velo_graphvelo_umap', embedded velocity vectors (adata.obsm)

/home/groups/xiaojie/steorra/env/omicverse/lib/python3.10/site-packages/IPython/core/pylabtools.py:77: DeprecationWarning: backend2gui is deprecated since IPython 8.24, backends are managed in matplotlib and can be externally registered.
  warnings.warn(

In [11]:

Copied!





fig = ov.plt.figure(figsize=(4, 4))
ax = ov.plt.subplot(1, 1, 1)
ov.pl.embedding(
    velo_obj.adata,
    basis='X_umap',
    color='clusters',
    ax=ax,
    show=False,
    size=100,
    alpha=0.3
)
ov.pl.add_streamplot(
    velo_obj.adata,
    basis='X_umap',
    velocity_key='velo_graphvelo_umap',
    ax=ax,
)
ov.plt.title('Velocity: GraphVelo')
fig = ov.plt.figure(figsize=(4, 4))
ax = ov.plt.subplot(1, 1, 1)
ov.pl.embedding(
    velo_obj.adata,
    basis='X_umap',
    color='clusters',
    ax=ax,
    show=False,
    size=100,
    alpha=0.3
)
ov.pl.add_streamplot(
    velo_obj.adata,
    basis='X_umap',
    velocity_key='velo_graphvelo_umap',
    ax=ax,
)
ov.plt.title('Velocity: GraphVelo')

Out[11]:

Text(0.5, 1.0, 'Velocity: GraphVelo')

In [12]:

Copied!

velo_obj.adata.write('data/velo/den_graphvelo.h5ad')
velo_obj.adata.write('data/velo/den_graphvelo.h5ad')

In [ ]: