TxPert: using multiple knowledge graphs for prediction of transcriptomic perturbation effects

arXiv:10.1038/s41587-026-03113-4[2026]

AI Summary

Accurately predicting cellular responses to genetic perturbations is essential for understanding disease mechanisms and designing effective therapies. Yet, exhaustively exploring the space of possible perturbations (for example, multigene perturbations or across tissues and cell types) is prohibitively expensive, motivating methods that can generalize to unseen conditions. We present TxPert, a latent-transfer-based deep learning method that uses multiple knowledge graphs of gene (product)–gene (product) relationships to predict transcriptomic perturbation effects. Different knowledge graphs encode complementary information and we show that a combination of graphs derived from biological databases and high-throughput perturbation screens yields the best performance. For predictions of single unseen perturbations, TxPert approaches the performance of split-half experimental reproducibility. For double unseen perturbations and single perturbations in a different cell line, its predictions increase Person Δ for unseen single perturbations by 8–25% over existing methods.

AI Metadata Extraction

Extract authors, key findings, references, and an executive summary using AI.

Version:· 3 versions extracted

Extraction v3google/gemini-3.1-flash-lite5/15/2026

Executive Summary

TxPert is a novel deep learning framework designed to solve the problem of predicting cellular responses to genetic perturbations across diverse out-of-distribution (OOD) contexts. By employing a latent-transfer mechanism, TxPert combines a basal state representation with perturbation-specific embeddings learned through Graph Neural Networks (GNNs). The integration of multiple knowledge graphs, including curated biological databases and internal proprietary screen data (PxMap, TxMap), allows the model to leverage complementary biological knowledge for robust generalization. The framework excels in three primary OOD tasks: predicting unseen single perturbations in known cell lines, forecasting effects for novel combinatorial (double) perturbations, and generalising to entirely new biological contexts (cell lines) not seen during training. Systematic benchmarking against previous models like GEARS and scLAMBDA, as well as a nonlearned general baseline, shows that TxPert consistently achieves superior predictive performance. Notably, its accuracy on single unseen perturbations approaches the theoretical ceiling of split-half experimental reproducibility. Beyond model performance, this work contributes a modular framework and best practices for transcriptomic perturbation analysis, including batch-appropriate control matching and evaluation using retrieval-based metrics. Although the model exhibits robust performance, analysis reveals a specific failure mode regarding the accurate prediction of the downregulation of perturbation targets. Overall, TxPert provides a strong foundation for future research, moving the field toward more reliable, scalable virtual assays that could significantly accelerate therapeutic discovery and personalized medicine.

Authors (15)

Frederik WenkelFirst Author

Valence Labs, Montréal, Quebec, Canada

frederik@valencelabs.com

Wilson Tu

Valence Labs, Montréal, Quebec, Canada

ali@valencelabs.com

Cassandra Masschelein

Valence Labs, Montréal, Quebec, Canada

Abstract

Fields of Study

Transcriptomic Perturbation PredictionComputational BiologyDeep LearningFunctional GenomicsDrug DiscoverySystems BiologyBioinformaticsArtificial IntelligenceGenomicsBiomedical Research

Key Findings (20)

1.TxPert effectively integrates multiple biological knowledge graphs (KGs) to improve perturbation effect prediction.

2.The model utilizes a latent-transfer-based deep learning architecture.

3.TxPert outperforms established baselines such as GEARS and scLAMBDA across various out-of-distribution (OOD) tasks.

Discussion & Future Directions

The discussion emphasizes the necessity of rigorous benchmarking in the field of transcriptomics-focused foundation models, as many prior models failed to outperform basic baselines. TxPert demonstrates the success of integrating diverse curated databases with large-scale high-throughput screening data via advanced graph modeling. Future directions include leveraging newly released single-cell datasets, extending to few-shot or active learning settings, and improving model capabilities for human primary tissues beyond immortalized cell lines. A crucial next step is the adoption of metrics that specifically evaluate the conditionality and specificity of perturbation effects in novel, unseen contexts.

References (47)

[1]Adduri, A. K. et al. Predicting cellular responses to perturbation across diverse contexts with State. Preprint at bioRxiv https://doi.org/10.1101/2025.06.26.661135 (2025).
Create publication
[2]Ahlmann-Eltze, C., Huber, W. & Anders, S. Deep learning-based predictions of gene perturbation effects do not yet outperform simple linear methods. Nat. Methods 22, 1657–1661 (2025).
Create publication
[3]Bendidi, I. et al. Benchmarking transcriptomics foundation models for perturbation analysis: one PCA still rules them all. Preprint at https://doi.org/10.48550/arXiv.2410.13956 (2024).
Create publication