Workshop on Structure Adapting Methods - Abstract

Krämer, Nicole

Regularized estimation of large-scale gene association networks using graphical Gaussian models

(joint work with Juliane Schäfer and Anne-Laure Boulesteix) Graphical Gaussian models are popular tools for the estimation of dependency structures of a (possibly) large set of variables. Typically, little is known about the underlying structure apart from the assumption that the graph that represents the dependencies is sparse. Therefore, the reliable reconstruction of the graph structure in high-dimensional scenarios remains a difficult task. In this talk, we study a general framework that combines regularized regression methods with the estimation of Graphical Gaussian models. This framework includes existing methods (based on the Lasso and Partial Least Squares (PLS)) as well as two new approaches (based on Ridge Regression and the adaptive Lasso). We investigate extensively the ability of the various methods to extract structural information from data within a simulation study and through an application to six diverse real data sets from systems biology. All proposed algorithms are implemented in the R package parcor. Our findings are as follows: Performance: In the simulations, the non-sparse regression methods (Ridge Regression and PLS) exhibit a rather low power when combined with (local) false discovery rate multiple testing. Both sparse and non-sparse methods can deal with cluster topologies in the network. For sparse networks, we confirm the Lasso's well known tendency towards selecting too many edges, whereas the two-stage adaptive Lasso is an interesting alternative that provides sparser solutions. For PLS, we observe both a high MSE in the simulations and a high percentage of selected edges in some of the real data. This indicates that PLS might not be too well-suited for the reconstruction of network structures. On six real data sets, we also clearly distinguish the results obtained using sparse and non-sparse methods. For data that violate the assumption of uncorrelated observations (due to replications), the Lasso and the adaptive Lasso yield very complex structures, indicating that they might not be suited under these conditions, as they seem to be unable to adapt automatically to the underlying data structure. Stability: All regression-based methods are less stable over different subsamples of the data, if compared to shrinkage based matrix approaches, but here is no clear difference between sparse and non-sparse methods. Moreover, the Lasso and the adaptive Lasso seem to be unstable with respect to violations of the i.i.d. assumption of the samples. Runtime: The computational load for the Lasso and in particular for the adaptive Lasso is considerable. For very high-dimensional data, this can constitute a severe limitation. While PLS and Ridge Regression are slower than shrinkage based matrix approaches, both of them are fairly fast to compute, as they allow a kernel representation, i.e. most of the computation scales in the number of samples and not in the number of variables.