AG DANK Herbsttagung 2016
Programmdetails
13:30-14:00 | Registration |
|
14:00-14:10 | Hans-Joachim Mucha (WIAS Berlin) Welcome and Opening |
|
Session 1: Chair Christian Hennig |
||
14:10-14:50 | Karsten Tabelow (WIAS Berlin) Functional Magnetic Resonance Imaging: Processing Large Dataset Functional Magnetic Resonance Imaging (fMRI) is a versatile imaging technique to observe the human brain at work. Besides the scientific value for understanding the principles of our mind the analysis of fMRI data is now standard in clinical applications as well. In this talk we will give a (surely incomplete) survey of fMRI analysis and data processing. Download pdf here |
|
14:50-15:30 | Willi Sauerbrei (Universität Freiburg) Regression model-building with continuous variables -- The multivariable fractional polynomial (MFP) approach Download pdf here |
|
15:30-16:00 | Kaffeepause | |
Session 2: Chair Ulrich Müller-Funk |
||
16:00-16:30 | Thorsten Dickhaus (Universität Bremen) COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position on the human genome individually for statistical significance of its association with the (binary) phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the considered set of genetic markers, in our case single nucleotide polymorphisms (SNPs), in a mathematically well-controlled manner into account. Our novel two-stage algorithm, COMBI, first learns a high-dimensional classification model by training a support vector machine to determine a subset of candidate SNPs. Then, in a second stage of data analysis, a multiple hypotheses test is carried out for these candidate SNPs, employing a resampling-based $p$-value threshold correction guaranteeing type I error control for the entire two-stage method. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw $p$-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.\ e., non-replicated) and more true (i.\ e., replicated) discoveries when its results are validated on later GWAS. More than 80\% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. These findings are confirmed by computer simulations utilizing semi-synthetic data. The presentation is based on Mieth et al. (2016). Download pdf here |
|
16:30-17:00 | Marcus Weber & Konstantin Fackeldey (ZIB Berlin) GenPCCA: Markov State Models for Non-Equilibrium Steady States For equilibrium systems Markov State Models (MSM) are a powerful tool for grouping states according to a metastability criterion. Given a reversible Markov chain, in MSM the eigenvalue structure of the underlying Markov chain is exploited for detecting metastable sets, such that the dynamics of a system in a high dimensional space can be described by the entries of a small transition probability matrix. Considering Non-Equilibrium Steady States the underlying Markov chain is no longer reversible and thus the eigenvalue structure, being the backbone for MSM can no longer be employed. To overcome this, we present a novel MSM method (GenPCCA) being capable to find a low dimensional description of even non reversible Markov processes by using a Schur decomposition instead of using eigen vectors. We show the performance of GenPCCA on networks for gene expression. Download pdf here |
|
17:00-17:40 | Andreas Geyer-Schulz (Universität Karlsruhe) Recommender Systems for (Scientific) Libraries Download pdf here | |
17:40-17:50 | Competition dataset: Presentation of Results (Chairs: C. Hennig / A. Mucha) | |
17:50-18:00 | Presentation of Gero Szepannek (Unitversität Stralsund) | |
18:00-18:10 | Presentation of Gunter Ritter (Unitversität Passau) | |
18:10-18:20 | Presentation of Markus Weber (ZIB Berlin) | |
18:20-18:30 | Presentation of Reinhard Schachtner (Infineon AG) | |
19:30 | Workshop Dinner im Restaurant Mutter Hoppe.
Die Kosten trägt jeder Teilnehmer selbst. |
Session 3: Chair Hans-Joachim Mucha |
||
09:00-09:30 | Christian Hennig (University College London) Preprocessing, Distanzen und Fussball Using a dataset of football player performance data, we discuss exemplarily different decisions by the user that are required for dissimilarity defnition and clustering, namely representation, transformation, standardisation and variable weighting. |
|
09:30-10:00 | Gero Szepannek (Fachhochschule Stralsund) On the Practical Relevance of Modern Machine Learning Algorithms for Credit Scoring Applications Although many new algorithms like e.g. support vector machines, boosting, random forests or neural networks have been proposed in the recent past logistic regression does still represent the gold standard in industrial praxis. Benchmarking studies show the general superiority of flexible learning techniques that are able to detect complex structures. These studies typically restrict to the evaluation of one or several performance measures (like misclassification rate) and ignore further aspects of practical feasibility. In this paper a critical investigation of pros and cons of modern machine learning techniques with respect to business requirements and their practical relevance is worked out. An exemplary case study based on credit scoring using random forests is executed. |
|
10:00-10:30 | Gunter Ritter (Universität Passau) Probabilistische Variablenselektion in der Clusteranalyse Download pdf here |
|
10:30-11:00 | Kaffeepause | |
Session 4: Chair Berthold Lausen |
||
11:00-11:30 | Andreas Geyer-Schulz (Universität Karlsruhe) On the Analysis of Irrational Behavior in Car Configuration Data Download pdf here |
|
11:30-12:00 | Adalbert Wilhelm (Jacobs University Bremen) Predicting military conflicts by data-driven techniques Download pdf here |
|
12:00-12:30 | Bernd Fischer (DKFZ Heidelberg) Inferring Directional Genetic Interactions from Combinatorial, Multi-parametric, Replicated Data Genes display epistatic (genetic) interactions, whereby the presence of one genetic variant can mask, alleviate or amplify the phenotypic effect of other variants. We have developed computational and statistical methods for the analysis of large-scale, image-based genetic interaction screens. In the presented screen (Fischer et al., eLife, 2015) all pairwise geneknock downs of 1367 * 72 genes. This work presents the preprocessing, normalization, and quality control for a large scale, image-basaed genetic interaction screens. Furthermore, we developed a new feture selection methods that aims to separate the biological relevant information from technical noise. This feature selection method used information from replicated experiments. In the down-stream analysis it is a problem to estimate directional genetic interactions. Such a directional relationship is present, for instance, if one gene product positively or negatively regulates the activity of the other, if its function temporally precedes that of the other, or if its function is a necessary requirement for the action of the other. We developed a new method to detect directional interactions that requires multi-parametric data. The approach has shown to recover known biological processes as well as a novel protein complex that reverses the effect of a signaling pathway in cancer. Download pdf here |
|
12:30-13:00 | Hans-Joachim Mucha (WIAS Berlin) & Tatjana Mirjam Gluhak (Universität Mainz) Finding Groups in Compositional Data The talk is concerned with finding groups (clusters) in compositional data, that is nonnegative data with row sums (or column sums, respectively) equal to a constant, usually 1 in case of proportions or 100 in case of percentages. Without loss of generality, the cluster analysis of observations (row points) of compositional data is considered here, where the row profiles contains parts of some whole. Special distance functions between the profiles are proposed. Finally, applications to archaeometry are presented. Download pdf here |