Research Article Volume 1 Issue 2
Department of Biostatistics, University of Alabama at Birmingham, USA
Correspondence: Xiang-Yang Lou, Department of Biostatistics, University of Alabama at Birmingham 1665 University Boulevard, RPHB 327, Birmingham, Alabama 35294-0022, USA, Tel 205-975-9145, Fax 205-975-2541
Received: September 18, 2014 | Published: October 8, 2014
Citation: Xiang-Yang L. Gene-gene and gene-environment interactions underlying complex traits and their detection. Biom Biostat Int J. 2014;1(2):30-32. DOI: 10.15406/bbij.2014.01.00007
No genes or environmental factors are isolated from the interactive genomic and epigenomic networks in shaping a biological phenotype. Non intuitivity and nonlinearity are a natural property of the network"s architecture.1–4 Consequently, the existence of interactions among genes, called gene-gene (also known as epistatic) interactions, and between genes and environmental factors (broadly defined as all non-genetic exposures), called gene-environment (GE) interactions, is the normal rather than an exception.5–8 Several converging lines of evidence have pointed to the dominant role of interactions in the inherited traits;6–9 in particular, epistatic and GE interactions are considered as one of the primary culprits for missing heritability,10,11 referred to the majority of the genetic variation that is not yet identified by the more than a decade"s practice of genome-wide association studies.12–14 Identification of background-specific factors among genes in combination with lifestyles and environmental exposures is an important scientific topic in genetics, breeding, and genetic epidemiology.
A high degree of context dependence of genetic architecture likely results in a relatively weak marginal genotype-phenotype correlations for complex traits, making traditional univariate approaches that test for association one factor at a time futile.5,11 The multi factorial strategies are thus critical in hunting highly mutually dependent factors underlying a trait. However, such a search has to face a significant obstacle called “the curse of dimensionality”, a problem caused by the exponential increase in volume of possible interactions with the number of factors to consider.15 The conventional regression methods, established by the extension under the concept of single factor-based approaches, are hardly appropriate for tackling ubiquitous yet elusive interactions because of several problems, heavy computational burden (usually computationally intractable), increased Type I and II errors, and reduced robustness and potential bias as a result of highly sparse data in a multi factorial model.16 Diverse novel approaches such as data mining and machine learning have been explored recently for various kinds of phenotypes,17–19 namely, Bayesian belief network,20,21 tree-based algorithms including multivariate adaptive regression spline (MARS),22 classification and regression trees (CART) or recursive partitioning methods23–25 and random forests approach,26,27 pattern recognition approaches including neural network strategies such as the parameter decreasing method (PDM)28 and genetic programming optimized neural network (GPNN),29 genetic algorithm strategies,30 and cellular automata (CA) approach,31 support vector machine (SVM),32 penalized regression,33 and Bayesian methods.34,35
Among these methods emerged recently, data reduction approaches (a constructive induction strategy) such as the multifactor dimensionality reduction method (MDR),36,37 the combinatorial partitioning method,38 and the restricted partition method,39 are promising to address the multidimensionality problems. Rather than modeling the interaction term per se as with regression methods, a data reduction strategy seeks for a pattern in a combination of factors/attributes of interest that maximizes the phenotypic variation it explains. It treats the joint action as a whole, coinciding to the very original epitasis coined by Bateson,40 offering a solution that avoids decomposition as in regression methods where the number of interaction parameters grows exponentially as each new variable is added. It also has a straightforward correspondence to the concept of the phenotypic landscape that unifies biological, statistical genetics and evolutionary theories.41–45 Notably the pioneering MDR method has sustained its popularity in detection of interactions since its launch.46
Several extensions of the MDR have been made for analyzing different traits, e.g., binary, count, continuous, polytomous, ordinal, time-to-onset, multivariate and others, as well as combinations of those, and also entertaining various study designs including homogeneous and admixed unrelated-subject and family as well as mixtures of them.47 Such extensions include to inclusion of covariates,48,49 to continuous traits,49 to survival data,50,51 to multivariate phenotypes,52,53 to multi-categorical or ordinal phenotypes,47,54 to case-control study in structured populations,55,56 to family study,57,58 and to unified analysis of both unrelated and related samples.59 With these extensions, the MDR-type methods offer a powerful tool for handling the breadth of data types and addressing statistical issues associated with study design and sampling scheme.
Despite the methodological progresses in detection of multifactor interactions, there are still difficult computational challenges and multiple hypothesis testing problems in practice, especially for detecting high-order interactions for the large-scale such as whole genome data. Further theoretical and computational work is required for effective identification of interacting factors underlying the complex traits.
The author thanks Guo-Bo Chen, Hai-Ming Xu, Xi-Wei Sun, and Lei Yan for their contributions to the development of GMDR. This project was supported in part by NIH Grant DA025095 to X.-Y.L.
The author declares no conflict of interest.
None.
©2014 Xiang-Yang. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7