Ivan Iossifov • Research

Research Interests

I work in the field of computational biology. I apply advanced machine learning and statistical modeling techniques on massive amounts of biomedical data. In my Ph.D. work I focused on building and refining models of molecular networks and in applying these models to address biological and medical problems.

Using the Molecular Network to Analyze Common Hereditary Disorders

Common complex hereditary disorders are believed to be both multifactorial and heterogeneous. Traditional methods of genetic analysis, which work successfully on "simple" Mendelian disorders in which variation at a single genomic locus almost deterministically influences whether the individual will get the disease, fail when applied to complex (multifactorial and heterogeneous) diseases. We propose an extension of the classical linkage formalism based on an intuitive assumption that the loci that contribute to the risk of given common disorders are functionally related and, therefore, the corresponding genes and proteins form a tight cluster in a molecular network.

Ivan Iossifov, Tian Zheng, Miron Baron, T. Conrad Gilliam, and Andrey Rzhetsky Genetic-linkage Mapping of Complex Hereditary Disorders to a Whole-genome Molecular-interaction Network. (2008) Genome Research PubMed | Read the paper

Molecular Interaction Data Integration

There are multiple sources of information about molecular interactions, with the scientific literature and large scale yeast two-hybrid assays being among the most utilized today. These two sources of knowledge describing the same biological phenomena have distinct respective biases and error patterns. For example, biologists tend to publish studies on areas of the network that have been popular at some time (e.g., neighborhoods of genes implicated in certain human diseases), while the high-throughput yeast two-hybrid screen is not equally easy to apply across all proteins (e.g., the yeast two-hybrid method is known not to work for membrane proteins). Hence, it is reasonable to hypothesize that describing and analyzing the two data sources in the context of a unified probabilistic model will lead to better coverage and potentially higher quality networks. We built a composite model with separate probabilistic components describing the discovery and publication of statements about molecular interactions in the literature, the process of generating results from high-throughput experiments, and a model of protein-protein interactions. The protein-protein interaction component connects the whole model and is based on features of the proteins. Fitting the joint model to the available data allows us to estimate the error rates associated with the text-mined and yeast two-hybrid datasets and predict novel protein-protein interactions.

Ivan Iossifov, Michael Krauthammer, Carol Friedman, Vasileios Hatzivassiloglou, Joel S. Bader, Kevin P. White, and Andrey Rzhetsky Probabilistic pathway inference from noisy data sources. (2004) Bioinformatics 22: 1205–13 PubMed | Read the paper

Biomedical Text-mining

The Biomedical literature is a rich source of unstructured data about molecular interactions. Both the tremendous growth of the scientific literature and the need to look at the biological systems at a global/system level call for automated approaches for extracting information locked in text form. In our group we have developed the GeneWays system which addresses this need.

Andrey Rzhetsky, Ivan Iossifov, Tomohiro Koike, Michael Krauthammer, Pauline Kra, Mitzi Morris, Hong Yu, Pablo A. Duboué, Wubin Weng, W. John Wilbur, Vasileios Hatzivassiloglou, Carol Friedman GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. (2004) J Biomed Inform. 37: 43–53 PubMed | Read the paper

Any text-mining system is bound to make mistakes. We have developed a statistical approach to automate the identification of facts incorrectly extracted by the rule based GeneWays system. This ability can direct the future improvement of the system and, more importantly, can substantially improve the precision of the system.

Raul Rodriguez-Esteban, Ivan Iossifov, and Andrey Rzhetsky Imitating manual curation of text-mined facts in biomedicine. (2006) PLoS Comput Biol. 2: e118 PubMed | Read the paper

A more serious problem is that not all published statements are correct. An indication that such errors exist is given both by the fact that some papers are eventually retracted and by observed inconsistencies between statements from different articles. In particular, in our automatically extracted data, we observe sets of statements about a particular interaction containing both positive (claiming the two entities interact) and negative (claiming that the two entries do not interact) assertions. We developed global statistical models describing the database of all extracted facts which allowed us to resolve such inconsistencies and also to uncover curious trends in the collaborative dynamics of a scientific community. For example, we observed with high statistical confidence that already published statements do influence the interpretation of current experimental results.

Andrey Rzhetsky, Ivan Iossifov, Ji-Meng Loh, and Kevin P. White Microparadigms: chains of collective reasoning in publications about molecular interactions. (2006) PNAS 103: 4940–5 PubMed | Read the paper

Murat Cokol, Ivan Iossifov, Chani Weinreb, and Andrey Rzhetsky Emergent behavior of growing knowledge about molecular interactions. (2005) Nat Biotechnol. 23: 1243–7 PubMed | Read the paper

Murat Cokol, Ivan Iossifov, Raul Rodriguez-Esteban, and Andrey Rzhetsky How many scientific papers should be retracted? (2007) EMBO reports 8: 422–3 PubMed | Read the paper