Inference and validation of large-scale causal gene regulatory networks from transcriptomic data
Presenter
September 29, 2016
Abstract
It is now established that complex biological phenotypes are not governed by single genes but instead by networks of interacting genes and gene products. As a consequence, deciphering the structure of the gene regulatory network (GRN) is crucial to further our understanding of fundamental processes in human cells. However, the mapping of molecular interactions in the intracellular realm remains a major bottleneck in the pipeline to produce biological knowledge from high-throughput biological data.
Multiple methods exist to infer undirected large-scale regulatory networks from collections of transcriptomic data. However very few network inference methods can infer the directionality of predicted gene interactions, despite this being key in the process of better interpreting GRNs. Another challenge when inferring large-scale GRNs consists in quantitatively assessing their validity. Popular, however weak, validation procedures include (i) simulation; (ii) using incomplete ‘gold standard’ datasets, such as known transcription factors and their targets, which only partially recapitulate the interactions that can be inferred from transcriptomic data; and (iii) using low-throughput laboratory experiments to validate a few predicted interactions, which represent only a very small and potentially biased part of the inferred GRN.
To address these issues, we have developed mRMRe, an ensemble approach for network and causality inference, and their integration of priors. We applied our new method on a large collection of nearly 500,000 shRNA experiments with gene expression profiles of cancer cell lines before and after knockdown of 3500 genes. This unique dataset allowed us to infer a regulatory networks for 978 landmark genes in multiple cell types, and quantitatively assess their quality. Our results suggest that the complexity of the underlying biology, and the noise present in the shRNA experiments and gene expression profiling make it very challenging to infer meaningful gene-gene interactions. Not only our study highlights the need for quantitatively assessing the predictive value of regulatory networks, but also provides evidence that very large sample size does not necessarily yield high quality networks. These results may open new avenues of research for integrative analysis of multiple data types.