Accurately predicting cellular responses to genetic perturbations is essential for understanding disease mechanisms and designing effective therapies. Yet, exhaustively exploring the space of possible perturbations (for example, multigene perturbations or across tissues and cell types) is prohibitively expensive, motivating methods that can generalize to unseen conditions. We present TxPert, a latent-transfer-based deep learning method that uses multiple knowledge graphs of gene (product)–gene (product) relationships to predict transcriptomic perturbation effects. Different knowledge graphs encode complementary information and we show that a combination of graphs derived from biological databases and high-throughput perturbation screens yields the best performance. For predictions of single unseen perturbations, TxPert approaches the performance of split-half experimental reproducibility. For double unseen perturbations and single perturbations in a different cell line, its predictions increase Person Δ for unseen single perturbations by 8–25% over existing methods.
Extract authors, key findings, references, and an executive summary using AI.
TxPert is a novel deep learning framework designed to solve the problem of predicting cellular responses to genetic perturbations across diverse out-of-distribution (OOD) contexts. By employing a latent-transfer mechanism, TxPert combines a basal state representation with perturbation-specific embeddings learned through Graph Neural Networks (GNNs). The integration of multiple knowledge graphs, including curated biological databases and internal proprietary screen data (PxMap, TxMap), allows the model to leverage complementary biological knowledge for robust generalization. The framework excels in three primary OOD tasks: predicting unseen single perturbations in known cell lines, forecasting effects for novel combinatorial (double) perturbations, and generalising to entirely new biological contexts (cell lines) not seen during training. Systematic benchmarking against previous models like GEARS and scLAMBDA, as well as a nonlearned general baseline, shows that TxPert consistently achieves superior predictive performance. Notably, its accuracy on single unseen perturbations approaches the theoretical ceiling of split-half experimental reproducibility. Beyond model performance, this work contributes a modular framework and best practices for transcriptomic perturbation analysis, including batch-appropriate control matching and evaluation using retrieval-based metrics. Although the model exhibits robust performance, analysis reveals a specific failure mode regarding the accurate prediction of the downregulation of perturbation targets. Overall, TxPert provides a strong foundation for future research, moving the field toward more reliable, scalable virtual assays that could significantly accelerate therapeutic discovery and personalized medicine.
Accurately predicting cellular responses to genetic perturbations is essential for understanding disease mechanisms and designing effective therapies. Yet, exhaustively exploring the space of possible perturbations (for example, multigene perturbations or across tissues and cell types) is prohibitively expensive, motivating methods that can generalize to unseen conditions. We present TxPert, a latent-transfer-based deep learning method that uses multiple knowledge graphs of gene (product)–gene (product) relationships to predict transcriptomic perturbation effects. Different knowledge graphs encode complementary information and we show that a combination of graphs derived from biological databases and high-throughput perturbation screens yields the best performance. For predictions of single unseen perturbations, TxPert approaches the performance of split-half experimental reproducibility. For double unseen perturbations and single perturbations in a different cell line, its predictions increase Person Δ for unseen single perturbations by 8–25% over existing methods.
1.TxPert effectively integrates multiple biological knowledge graphs (KGs) to improve perturbation effect prediction.
2.The model utilizes a latent-transfer-based deep learning architecture.
3.TxPert outperforms established baselines such as GEARS and scLAMBDA across various out-of-distribution (OOD) tasks.
The discussion emphasizes the necessity of rigorous benchmarking in the field of transcriptomics-focused foundation models, as many prior models failed to outperform basic baselines. TxPert demonstrates the success of integrating diverse curated databases with large-scale high-throughput screening data via advanced graph modeling. Future directions include leveraging newly released single-cell datasets, extending to few-shot or active learning settings, and improving model capabilities for human primary tissues beyond immortalized cell lines. A crucial next step is the adoption of metrics that specifically evaluate the conditionality and specificity of perturbation effects in novel, unseen contexts.