Program at a glance

Conference Brochure

Conference Mobile App

We created an event for the conference on EventBase with all details about the program. You can browse it by downloading the EventBase app from the app store and searching for “ECML PKDD 2017”.

Download Eventbase for :

Journal Special Issues

Pierre-Philippe Mathieu

ESA/ESRIN, EO Science, Applications and New Technologies

Supported by :

Enabling a smarter planet with Earth Observation

Frank Hutter

University of Freiburg

Supported by :

Towards end-to-end learning & optimization

John Quackenbush

Dana-Farber Cancer Institute
and Harvard TH Chan School of Public Health

Using Networks to Link Genotype to Phenotype

Alex Graves

Google DeepMind

Frontiers in Recurrent Neural Network Research

Cordelia Schmid

INRIA

Automatic Understanding of the Visual World

Inderjit Dhillon

University of Texas at Austin

Multi-Target Prediction via Low-Rank Embeddings

View article

Conference Track

September 19 @ 10:00

CONGRESS HALL 1

Arbitrated Ensemble for Time Series Forecasting

Vitor Cerqueira, Luis Torgo, Fábio Pinto, Carlos Soares

Best Student ML paper

Reproducible Research

This paper proposes an ensemble method for time series forecasting tasks. Combining different forecasting models is a common approach to tackle these tasks. State-of-the-art methods track the loss of the available models and adapt their weights accordingly. Metalearning strategies such as stacking are also used in these tasks. We propose a metalearning approach for adaptively combining forecasting models that specializes them across the time series. Our assumption is that different forecasting models have different areas of expertise and a varying relative performance. Moreover, many time series show recurring structures due to factors such as seasonality. Therefore, the ability of a method to deal with changes in relative performance of models as well as recurrent changes in the data distribution can be very useful in dynamic environments. Our approach is based on an ensemble of heterogeneous forecasters, arbitrated by a metalearning model. This strategy is designed to cope with the different dynamics of time series and quickly adapt the ensemble to regime changes. We validate our proposal using time series from several real world domains. Empirical results show the competitiveness of the method in comparison to state-of-the-art approaches for combining forecasters.

Download article

September 19 @ 11:00

CONGRESS HALL 1

FCNNs: Fourier Convolutional Neural Networks

Harry Pratt, Bryan Williams, Frans Coenen, Yalin Zheng

session: Neural Networks and Deep Learning I

Reproducible Research

The Fourier domain is used in computer vision and machine learning as image analysis tasks in the Fourier domain are analogous to spatial domain methods but are achieved using different operations. Convolutional Neural Networks (CNNs) use machine learning to achieve state-of-the-art results with respect to many computer vision tasks. One of the main limiting aspects of CNNs is the computational cost of updating a large number of convolution parameters. Further, in the spatial domain, larger images take exponentially longer than smaller image to train on CNNs due to the operations involved in convolution methods. Consequently, CNNs are often not a viable solution for large image computer vision tasks. In this paper a Fourier Convolution Neural Network (FCNN) is proposed whereby training is conducted entirely within the Fourier domain. The advantage offered is that there is a significant speed up in training time without loss of effectiveness. Using the proposed approach larger images can therefore be processed within viable computation time. The FCNN is fully described and evaluated. The evaluation was conducted using the benchmark Cifar10 and MNIST datasets, and a bespoke fundus retina image dataset. The results demonstrate that convolution in the Fourier domain gives a significant speed up without adversely affecting accuracy. For simplicity the proposed FCNN concept is presented in the context of a basic CNN architecture, however, the FCNN concept has the potential to improve the speed of any neural network system involving convolution.

Download article

September 19 @ 11:00

CONGRESS HALL 2

Cost-Sensitive Perceptron Decision Trees for Imbalanced Drifting Data Streams

Bartosz Krawczyk, Przemyslaw Skryjomski

session: Time Series and Streams I

Mining streaming and drifting data is among the most popular contemporary applications of machine learning methods. Due to the potentially unbounded number of instances arriving rapidly, evolving concepts and limitations imposed on utilized computational resources, there is a need to develop efficient and adaptive algorithms that can handle such problems. These learning difficulties can be further augmented by appearance of skewed distributions during the stream progress. Class imbalance in non-stationary scenarios is highly challenging, as not only imbalance ratio may change over time, but also the class relationships. In this paper we propose an efficient and fast cost-sensitive decision tree learning scheme for handling online class imbalance. In each leaf of the tree we train a perceptron with output adaptation to compensate for skewed class distributions, while McDiarmid's bound is used for controlling the splitting attribute selection. The cost matrix automatically adapts itself to the current imbalance ratio in the stream, allowing for a smooth compensation of evolving class relationships. Furthermore, we analyze the characteristics of minority class instances and incorporate this information during the training process. It allows our classifier to focus on most difficult instances, while a sliding window keeps track of the changes in class structures. Experimental analysis carried out on a number of binary and multi-class imbalanced data streams indicate the usefulness of the proposed approach.

Download article

September 19 @ 11:20

CONGRESS HALL 1

Multimodal Classification for Analysing Social Media

~~Chi Thang Duong, Remi Lebret, Karl Aberer~~

session: Neural Networks and Deep Learning I

This paper was accepted for presentation. However, it was not presented at the conference and is thus not published in the conference proceedings.

September 19 @ 11:20

CONGRESS HALL 2

Learning TSK Fuzzy Rules from Data Streams

Ammar Shaker, Waleri Heldt, Eyke Huellermeier

session: Time Series and Streams I

Learning from data streams has received increasing attention in recent years, not only in the machine learning community but also in other research fields, such as computational intelligence and fuzzy systems. In particular, several rule-based methods for the incremental induction of regression models have been proposed. In this paper, we develop a method that combines the strengths of two existing approaches rooted in different learning paradigms. Our method induces a set of fuzzy rules, which, compared to conventional rules with Boolean antecedents, has the advantage of producing smooth regression functions. To do so, it makes use of an induction technique inspired by AMRules, a very efficient and effective learning algorithm that can be seen as the state of the art in machine learning. We conduct a comprehensive experimental study showing that a combination of the expressiveness of fuzzy rules with the algorithmic concepts of AMRules yields a learning system with superb performance.

Download article

September 19 @ 11:40

CONGRESS HALL 1

Sequence Generation with Target Attention

Yingce Xia, Fei Tian, Tao Qin, Nenghai Yu, Tie-Yan Liu

session: Neural Networks and Deep Learning I

Source-target attention mechanism (briefly, source attention) has become one of the key components in a wide range of sequence generation tasks, such as neural machine translation, image caption, and open-domain dialogue generation. In these tasks, the attention mechanism, typically in control of information flow from the encoder to the decoder, enables to generate every component in the target sequence relying on different source components. While source attention mechanism has attracted many research interests, few of them turn eyes to if the generation of target sequence can additionally benefit from attending back to itself, which however is intuitively motivated by the nature of attention. To investigate the question, in this paper, we propose a new target-target attention mechanism (briefly, target attention). Along the progress of generating target sequence, target attention mechanism takes into account the relationship between the component to generate and its preceding context within the target sequence, such that it can better keep the coherent consistency and improve the readability of the generated sequence. Furthermore, it complements the information from source attention so as to further enhance semantic adequacy. After designing an effective approach to incorporate target attention in encoder-decoder framework, we conduct extensive experiments on both neural machine translation and image caption. Experimental results clearly demonstrate the effectiveness of our design of integrating both source and target attention for sequence generation tasks.

Download article

September 19 @ 11:40

CONGRESS HALL 2

Non-Parametric Online AUC Maximization

Balazs Szorenyi, Snir Cohen, Shie Mannor

session: Time Series and Streams I

We consider the problems of online and one-pass maximization of the area under the ROC curve (AUC). AUC maximization is hard even in the offline setting and thus solutions often make some compromises. Existing results for the online problem typically optimize for some proxy defined via surrogate losses instead of maximizing the real AUC. This approach is confirmed by results showing that the optimum of these proxies, over the set of all (measurable) functions, maximize the AUC. The problem is that---in order to meet the strong requirements for per round run time complexity---online methods typically work with restricted hypothesis classes and this, as we show, corrupts the above compatibility and causes the methods to converge to suboptimal solutions even in some simple stochastic cases. To remedy this, we propose a different approach and show that it leads to asymptotic optimality. Our theoretical claims and considerations are tested by experiments on real datasets, which provide empirical justification to them.

Download article

September 19 @ 11:40

CONGRESS HALL 3

Dynamic Ensemble Selection with Probabilistic Classifier Chains

Anil Narassiguin, Haytham Elghazel, Alexandre Aussem

session: Ensembles and Meta Learning

Reproducible Research

Dynamic ensemble selection (DES) is the problem of finding, given an input x, a subset of models among the ensemble that achieves the best possible prediction accuracy. Recent studies have reformulated the DES problem as a multi-label classification problem and promising performance gains have been reported. However, their approaches may converge to an incorrect, and hence suboptimal, solution as they don't optimize the true - but non standard - loss function directly. In this paper, we show that the label dependencies have to be captured explicitly and propose a DES method based on Probabilistic Classifier Chains. Experimental results on 20 benchmark data sets show the effectiveness of the proposed method against competitive alternatives, including the aforementioned multi-label approaches. Keywords: Dynamic ensemble selection, Multi-label learning, Probabilistic Classifier Chains

Download article

September 19 @ 12:00

CONGRESS HALL 3

Nikolaos Tziortziotis, Christos Dimitrakakis

session: Probabilistic Models and Methods I

This paper proposes a fully Bayesian approach for Least-Squares Temporal Differences (LSTD), resulting in fully probabilistic inference of value functions that avoids the overfitting commonly experienced with classical LSTD when the number of features is larger than the number of samples. Sparse Bayesian learning provides an elegant solution through the introduction of a prior over value function parameters. This gives us the advantages of probabilistic predictions, a sparse model, and good generalisation capabilities, as irrelevant parameters are marginalised out. The algorithm efficiently approximates the posterior distribution through variational inference. We demonstrate the ability of the algorithm in avoiding overfitting experimentally.

Download article

September 19 @ 14:20

CONGRESS HALL 3

Behavioral Constraint Template-Based Sequence Classification

Johannes De Smedt, Galina Deeva, Jochen De Weerdt

session: Pattern and Sequence Mining

Reproducible Research

In this paper we present the interesting Behavioral Constraint Miner (iBCM), a new approach towards classifying sequences. The prevalence of sequential data, i.e., a collection of ordered items such as text, website navigation patterns, traffic management, and so on, has incited a surge in research interest towards sequence classification. Existing approaches mainly focus on retrieving sequences of itemsets and checking their presence in labeled data streams to obtain a classifier. The proposed iBCM approach, rather than focusing on plain sequences, is template-based and draws its inspiration from behavioral patterns used for software verification. These patterns have a broad range of characteristics and go beyond the typical sequence mining representation, allowing for a more precise and concise way of capturing sequential information in a database. Furthermore, it is possible to also mine for negative information, i.e., sequences that do not occur. The technique is benchmarked against other state-of-the-art approaches and exhibits a strong potential towards sequence classification.

Download article

September 19 @ 14:40

CONGRESS HALL 2

K-clique-graphs for Dense Subgraph Discovery

Giannis Nikolentzos, Polykarpos Meladianos, Yannis Stavrakas, Michalis Vazirgiannis

session: Networks and Graphs I

Reproducible Research

Finding dense subgraphs in a graph is a fundamental graph mining task, with applications in several fields. Algorithms for identifying dense subgraphs are used in biology, in finance, in spam detection, etc. Standard formulations of this problem such as the problem of finding the maximum clique of a graph are hard to solve. However, some tractable formulations of the problem have also been proposed, focusing mainly on optimizing some density function, such as the degree density and the triangle density. However, maximization of degree density usually leads to large subgraphs with small density, while maximization of triangle density does not necessarily lead to subgraphs that are close to being cliques. In this paper, we introduce the k-clique-graph densest subgraph problem, k <= 3, a novel formulation for the discovery of dense subgraphs. Given an input graph, its k-clique-graph is a new graph created from the input graph where each vertex of the new graph corresponds to a k-clique of the input graph and two vertices are connected with an edge if they share a common (k - 1)-clique. We define a simple density function, the k-clique-graph density, which gives compact and at the same time dense subgraphs, and we project its resulting subgraphs back to the input graph. In this paper we focus on the triangle-graph densest subgraph problem obtained for k = 3. To optimize the proposed function, we provide an exact algorithm. Furthermore, we present an efficient greedy approximation algorithm that scales well to larger graphs. We evaluate the proposed algorithms on real datasets and compare them with other algorithms in terms of the size and the density of the extracted subgraphs. The results verify the ability of the proposed algorithms in finding high-quality subgraphs in terms of size and density. Finally, we apply the proposed method to the important problem of keyword extraction from textual documents.

Download article

September 19 @ 14:40

CONGRESS HALL 3

Efficient Sequence Regression by Learning Linear Models in All-Subsequence Space

Severin Gsponer, Barry Smyth, Georgiana Ifrim

session: Pattern and Sequence Mining

Reproducible Research

We present a new approach for learning a sequence regression function, i.e., a mapping from sequential observations to a numeric score. Our learning algorithm employs coordinate gradient descent with Gauss-Southwell optimization in the feature space of all subsequences. We give a tight upper bound for the coordinate wise gradients of squared error loss which enables efficient Gauss-Southwell selection. The proposed bound is built by separating the positive and the negative gradients of the loss function and exploits the structure of the feature space. Extensive experiments on simulated as well as real-world sequence regression benchmarks show that the bound is effective and our proposed learning algorithm is efficient and accurate. The resulting linear regression model provides the user with a list of the most predictive features selected during the learning stage, adding to the interpretability of the method.

Download article

September 19 @ 14:40

CONGRESS HALL 1

Discovery of Causal Models that Contain Latent Variables through Bayesian Scoring of Independence Constraints

Fattaneh Jabbari, Gregory Cooper, Joseph Ramsey, Peter Spirtes

session: Probabilistic Models and Methods I

Reproducible Research

Discovering causal structure from observational data in the presence of latent variables remains an active research area. Constraint-based causal discovery algorithms are relatively efficient at discovering such causal models from data using independence tests. Typically, however, they derive and output only one such model. In contrast, Bayesian methods can generate and probabilistically score multiple models, outputting the most probable one; however, they are often computationally infeasible to apply when modeling latent variables. We introduce a hybrid method that derives a Bayesian probability that the set of independence tests associated with a given causal model are jointly correct. Using this constraint-based scoring method, we are able to score multiple causal models, which possibly contain latent variables, and output the most probable one. The structure-discovery performance of the proposed method is compared to an existing constraint-based method (RFCI) using data generated from several previously published Bayesian networks. The structural Hamming distances of the output models improved when using the proposed method compared to RFCI, especially for small sample sizes.

Download article

September 19 @ 15:20

CONGRESS HALL 2

Local Lanczos Spectral Approximation for Membership Identification

Pan Shi, Kun He, David Bindel, John Hopcroft

session: Networks and Graphs I

Reproducible Research

We propose a novel approach called the Local Lanczos Spectral Approximation (LLSA) for identifying all latent members of a local community from very few seed members. To reduce the computation complexity, we first apply a fast heat kernel diffusing to sample a comparatively small subgraph covering almost all possible community members around the seeds. Then starting from a normalized indicator vector of the seeds and by a few steps of Lanczos iteration on the sampled subgraph, a local eigenvector is gained for approximating the eigenvector of the transition matrix with the largest eigenvalue. Elements of this local eigenvector is a relaxed indicator for the affiliation probability of the corresponding nodes to the target community. We conduct extensive experiments on real-world datasets in various domains as well as synthetic datasets. Results show that the proposed method outperforms state-of-the-art local community detection algorithms. To the best of our knowledge, this is the first work to adapt the Lanczos method for local community detection, which is natural and potentially effective. Also, we did the first attempt of using heat kernel as a sampling method instead of detecting communities directly, which is proved empirically to be very efficient and effective.

Download article

September 19 @ 15:20

CONGRESS HALL 3

Subjectively Interesting Connecting Trees

Florian Adriaens, Jefrey Lijffijt, Tijl De Bie

session: Pattern and Sequence Mining

Reproducible Research

Consider a large network, and a user-provided set of query nodes between which the user wishes to explore relations. For example, a researcher may want to connect research papers in a citation network, an analyst may wish to connect organized crime suspects in a communication network, or an internet user may want to organize their bookmarks given their location in the world wide web. A natural way to show how query nodes are related is in the form of a tree in the network that connects them. However, in sufficiently dense networks, most such trees will be large or somehow trivial (e.g. involving high degree nodes) and thus not insightful. In this paper, we define and investigate the new problem of mining subjectively interesting trees connecting a set of query nodes in a network, i.e., trees that are highly surprising to the specific user at hand. Using information theoretic principles, we formalize the notion of interestingness of such trees mathematically, taking in account any prior beliefs the user has specified about the network. We then propose heuristic algorithms to find the best trees efficiently, given a specified prior belief model. Modeling the userâ€™s prior belief state is however not necessarily computationally tractable. Yet, we show how a highly generic class of prior beliefs, namely about individual node degrees in combination with the density of particular sub-networks, can be dealt with in a tractable manner. Such types of beliefs can be used to model knowledge of a partial or total order of the network nodes, e.g. where the nodes represent events in time (such as papers in a citation network). An empirical validation of our methods on a large real network evaluates the different heuristics and validates the interestingness of the given trees.

Download article

September 19 @ 16:00

CONGRESS HALL 3

Adaptive Skip-Train Structured Regression for Temporal Networks

Martin Pavlovski, Fang Zhou, Ivan Stojkovic, Ljupco Kocarev, Zoran Obradovic

session: Regression

Reproducible Research

A broad range of high impact applications involve learning a predictive model in a temporal network environment. In weather forecasting, predicting effectiveness of treatments, outcomes in healthcare and in many other domains, networks are often large, while intervals between consecutive time moments are brief. Therefore, models are required to forecast in a more scalable and efficient way, without compromising accuracy. The Gaussian Conditional Random Field (GCRF) is a widely used graphical model for performing structured regression on networks. However, GCRF is not applicable to large networks and it cannot capture different network substructures (communities) since it considers the entire network while learning. In this study, we present a novel model, Adaptive Skip-Train Structured Ensemble (AST-SE), which is a sampling-based structured regression ensemble for prediction on top of temporal networks. AST-SE takes advantage of the scheme of ensemble methods to allow multiple GCRFs to learn from several subnetworks. The proposed model is able to automatically skip the entire training or some phases of the training process. The prediction accuracy and efficiency of AST-SE were assessed and compared against alternatives on synthetic temporal networks and the H3N2 Virus Influenza network. The obtained results provide evidence that (1) AST-SE is ~140 times faster than GCRF as it skips retraining quite frequently; (2) It still captures the original network structure more accurately than GCRF while operating solely on partial views of the network; (3) It outperforms both unweighted and weighted GCRF ensembles which also operate on subnetworks but require retraining at each timestep.

Download article

September 19 @ 16:20

CONGRESS HALL 2

Generalized Inverse Reinforcement Learning on Linearly Solvable MDP

Masahiro Kohjima, Tatsushi Matsubayashi, Hiroshi Sawada

session: Reinforcement Learning

In this paper, we consider a generalized variant of inverse reinforcement learning (IRL) that estimates both a cost (negative reward) function and a transition probability from observed optimal behavior. In theoretical studies of standard IRL, which estimates only the cost function, it is well known that IRL involves a non-identifiable problem, i.e., the cost function cannot be determined uniquely. This problem has been solved by using a new class of Markov decision process (MDP) called a linearly solvable MDP (LMDP). In this paper, we investigate whether a non-identifiable problem occurs in the generalized variant of IRL (gIRL) using the framework of LMDP and construct a new gIRL method. The contributions of this study are summarized as follows: (i) We point out that gIRL with LMDP suffers from a non-identifiable problem. (ii) We propose a Bayesian method to escape the non-identifiable problem. (iii) We validate the proposed method by performing an experiment on synthetic data and real car probe data.

Download article

September 19 @ 16:20

CONGRESS HALL 4

Malware Detection by Analysing Encrypted Network Traffic with Neural Networks

Paul Prasse, Lukas Machlika, Tomas Pevny, Jiri Havelka, Tobias Scheffer

session: Privacy and Security

We study the problem of detecting malware on client computers based on the analysis of HTTPS traffic. Here, malware has to be detected based on the host address, timestamps, and data volume information of the computer's network traffic. We develop a scalable protocol that allows us to collect network flows of known malicious and benign applications as training data and derive a malware-detection method based on a neural embedding of domain names and a long short-term memory network that processes network flows. We study the method's ability to detect new malware in a large-scale empirical study.

Download article

September 19 @ 16:20

CONGRESS HALL 3

ALADIN: A New Approach for Drug--Target Interaction Prediction

Krisztian Buza, Ladislav Peska

session: Regression

Reproducible Research

Due to its pharmaceutical applications, one of the most prominent machine learning challenges in bioinformatics is the prediction of drug--target interactions. State-of-the-art approaches are based on various techniques, such as matrix factorization, restricted Boltzmann machines, network-based inference and bipartite local models (BLM). In this paper, we extend BLM by the incorporation of a hubness-aware regression technique coupled with an enhanced representation of drugs and targets in a multi-modal similarity space. Additionally, we propose to build a projection-based ensemble. Our Advanced Local Drug-Target Interaction Prediction technique (ALADIN) is evaluated on publicly available real-world drug-target interaction datasets. The results show that our approach statistically significantly outperforms BLM-NII, a recent version of BLM, as well as NetLapRLS and WNN-GIP.

Download article

September 19 @ 16:20

CONGRESS HALL 1

Multi-view Generative Adversarial Networks

Mickael Chen, Ludovic Denoyer

session: Probabilistic Models and Methods II

Learning over multi-view data is a challenging problem with strong practical applications. Most related studies focus on the classification point of view and assume that all the views are available at any time. We consider an extension of this framework in two directions. First, based on the BiGAN model, the Multi-view BiGAN (MV-BiGAN) is able to perform density estimation from multi-view inputs. Second, it can deal with missing views and is able to update its prediction when additional views are provided. We illustrate these properties on a set of experiments over different datasets.

Download article

September 19 @ 16:40

CONGRESS HALL 3

Co-Regularised Support Vector Regression

Katrin Ullrich, Michael Kamp, Thomas Gärtner, Martin Vogt, Stefan Wrobel

Sylvain Lamprier, Thibault Gisselbrecht, Patrick Gallinari

session: Reinforcement Learning

In this paper, we introduce a novel non-stationnary bandit setting, called relational recurrent bandit, where reward expectations at successive time steps are interdependent. The aim is to discover temporal and structural dependencies between arms in order to maximize the cumulative collected reward. Two algorithms are proposed: while a first one directly models temporal dependencies between arms, a second one assumes the existence of hidden states of the system to explain rewards of arms. For both approaches, we develop a Variational Thompson Sampling method, which approximates distributions on every hidden variable via variational inference, and uses the estimated distributions to sample reward expectations at each iteration of the process. Experiments conducted on both synthetic and real data demonstrate the effectiveness of our approaches.

Download article

September 19 @ 17:20

CONGRESS HALL 1

Semi-supervised Bayesian Deep Multi-modal Emotion Recognition

~~Changde Du, Changying Du, Jinpeng Li, Wei-long Zheng, Baoliang Lv, Huiguang He~~

session: Probabilistic Models and Methods II

This paper was accepted for presentation. However, it was not presented at the conference and is thus not published in the conference proceedings.

September 20 @ 11:00

CONGRESS HALL 3

Deep Discrete Hashing with Self-supervised Labels

Jingkuan Song, Tao He, Hangbo Fan, lianli Gao

session: Feature Selection and Extraction

Reproducible Research

Hashing methods have been widely used for applications of large-scale image retrieval and classification. Non-deep hashing methods using handcrafted features have been significantly outperformed by deep hashing methods due to their better feature representation and end-to-end learning framework. However, the most striking successes in deep hashing have mostly involved discriminative models, which require labels. In this paper, we propose a novel unsupervised deep hashing method, named Deep Discrete Hashing (DDH), for large-scale image retrieval and classification. In the proposed framework, we address two main problems: 1) how to directly learn discrete binary codes? 2) how to equip the binary representation with the ability of accurate image retrieval and classification in an unsupervised way? We resolve these problems by introducing an intermediate variable and a loss function steering the learning process, which is based on the neighborhood structure in the original space. Experiments on real datasets show that our method can significantly outperform other unsupervised methods to achieve the state-of-the-art performance for image retrieval and object recognition.

Download article

September 20 @ 11:20

CONGRESS HALL 2

Yochai Blau, Tomer Michaeli

session: Feature Selection and Extraction

Spectral dimensionality reduction algorithms are widely used in numerous domains, including for recognition, segmentation, tracking and visualization. However, despite their popularity, these algorithms suffer from a major limitation known as the ``repeated Eigen-directions'' phenomenon. That is, many of the embedding coordinates they produce typically capture the same direction along the data manifold. This leads to redundant and inefficient representations that do not reveal the true intrinsic dimensionality of the data. In this paper, we propose a general method for avoiding redundancy in spectral algorithms. Our approach relies on replacing the orthogonality constraints underlying those methods by unpredictability constraints. Specifically, we require that each embedding coordinate be unpredictable (in the statistical sense) from all previous ones. We prove that these constraints necessarily prevent redundancy, and provide a simple technique to incorporate them into existing methods. As we illustrate on challenging high-dimensional scenarios, our approach produces significantly more informative and compact representations, which improve visualization and classification tasks.

Download article

September 20 @ 12:00

CONGRESS HALL 1

Speeding up Hyper-parameter Optimization by Extrapolation of Learning Curves using Previous Builds

Akshay Chandrashekaran, Ian Lane

session: Learning and Optimization I

Reproducible Research

Recent work has shown that the usage of extrapolation of learning curves to determine when to terminate a training build has been shown to be effective in reducing the number of epochs of training required for finding a good performing hyper-parameter configuration. However, the current technique uses the information only from the current build to make the prediction. We propose the usage of a simple regression based extrapolation model that uses the trajectories from previous builds to make predictions of new builds. This can be used to terminate poorly performing builds and thus, speed up hyper-parameter search with performance comparable to non-augmented hyper-parameter optimization techniques. We compare the predictions made by our model against that of the existing extrapolation technique in different tasks. We incorporate our approach into a pre-existing termination criterion. We incorporate this termination criterion into an existing hyper-parameter optimization toolkit. We analyze the performance of our approach and contrast it against a baseline in terms of quality of prediction in three different tasks. We show that our approach yields builds with performance comparable to the non-augmented version with fewer epochs, and outperforms an existing parametric extrapolation technique for two out of three tasks in terms of number of required epochs.

Download article

September 20 @ 12:00

CONGRESS HALL 3

Rethinking Unsupervised Feature Selection: From Pseudo Labels to Pseudo Must-links

Xiaokai Wei, Sihong Xie, Bokai Cao, Philip Yu

session: Feature Selection and Extraction

Reproducible Research

High-dimensional data are prevalent in various machine learning applications. Feature selection is a useful technique for alleviating the curse of dimensionality. Unsupervised feature selection problem tends to be more challenging than its supervised counterpart due to the lack of class labels. State-of-the-art approaches usually use the concept of pseudo labels to select discriminative features by their regression coefficients and the pseudo-labels derived from clustering is usually inaccurate. In this paper, we propose a new perspective for unsupervised feature selection by Discriminatively Exploiting Similarity (DES). Through forming similar and dissimilar data pairs, implicit discriminative information can be exploited. The similar/dissimilar relationship of data pairs can be used as guidance for feature selection. Based on this idea, we propose hypothesis testing based and classification based methods as instantiations of the DES framework. We evaluate the proposed approaches extensively using six real-world datasets. Experimental results demonstrate that our approaches achieve significantly outperforms the state-of-the-art unsupervised methods. More surprisingly, our unsupervised method even achieves performance comparable to a supervised feature selection method.

Download article

September 20 @ 12:20

CONGRESS HALL 1

Adrian Perez-Suay, Valero Laparra, Gonzalo Mateo-García, Jordi Muñoz-Marí, Luis Gómez-Chova, Gustau Camps-Valls

session: Kernel Methods I

New social and economic activities massively exploit big data and machine learning algorithms to do inference on people's lives. Applications include automatic curricula evaluation, wage determination, and risk assessment for credits and loans. Recently, many governments and institutions have raised concerns about the lack of fairness, equity and ethics in machine learning to treat these problems. It has been shown that not including sensitive features that bias fairness, such as gender or race, is not enough to mitigate the discrimination when other related features are included. Instead, including fairness in the objective function has been shown to be more efficient. We present novel fair regression and dimensionality reduction methods built on a previously proposed fair classification framework. Both methods rely on using the Hilbert Schmidt independence criterion as the fairness term. Unlike previous approaches, this allows us to simplify the problem and to use multiple sensitive variables simultaneously. Replacing the linear formulation by kernel functions allows the methods to deal with nonlinear problems. For both linear and nonlinear formulations the solution reduces to solving simple matrix inversions or generalized eigenvalue problems. This simplifies the evaluation of the solutions for different trade-off values between the predictive error and fairness terms. We illustrate the usefulness of the proposed methods in toy examples, and evaluate their performance on real world datasets to predict income using gender and/or race discrimination as sensitive variables, and contraceptive method prediction under demographic and socio-economic sensitive descriptors.

Download article

September 20 @ 14:00

CONGRESS HALL 1

CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec

Tanay Kumar Saha, Shafiq Joty, Mohammad Al Hasan

session: Neural Networks and Deep Learning II

Reproducible Research

We present a novel approach to learning distributed representation of sentences from unlabeled data by modeling content and context of a sentence. The content model learns sentence representation by predicting its words. The context model comprises a neighbor prediction component and a regularizer to model distributional and proximity hypotheses, respectively. We propose an online algorithm to train the model components jointly. For the first time, we evaluate the models in a setup, where contextual information is available to infer the sentence vectors. The experimental results on tasks involving classifying, clustering, and ranking sentences show that our model outperforms best existing models by a wide margin across multiple datasets.

Download article

September 20 @ 14:20

CONGRESS HALL 1

Antti Ukkonen, Vladimir Dzyuba, Matthijs Van Leeuwen

session: Subgroup Discovery

We propose a novel approach to finding explanations of deviating subsets, often called subgroups. Existing approaches for subgroup discovery rely on various quality measures that nonetheless often fail to find subgroup sets that are diverse, of high quality, and most importantly, provide good explanations of the deviations that occur in the data. To tackle this issue we introduce explanation networks, which provide a holistic view on all candidate subgroups and how they relate to each other, offering elegant ways to select high-quality yet diverse subgroup sets. Explanation networks are constructed by representing subgroups by nodes and having weighted edges represent the extent to which one subgroup explains another. Explanatory strength is defined by extending ideas from database causality, in which interventions are used to quantify the effect of one query on another. Given an explanatory network, existing network analysis techniques can be used for subgroup discovery. In particular, we study the use of PageRank for pattern ranking and seed selection (from influence maximization) for pattern set selection. Experiments on synthetic and real data show that the proposed approach finds subgroup sets that are more likely to capture the generative processes of the data than other methods.

Download article

September 20 @ 16:00

CONGRESS HALL 2

Bayesian Nonlinear Support Vector Machines for Big Data

Florian Wenzel, Theo Galy-Fajou, Matthäus Deutsch, Marius Kloft

session: Kernel Methods II

Reproducible Research

We propose a fast inference method for Bayesian nonlinear support vector machines that leverages stochastic variational inference and inducing points. Our experiments show that the proposed method is faster than competing Bayesian approaches and scales easily to millions of data points. It provides additional features over frequentist competitors such as accurate predictive uncertainty estimates and automatic hyperparameter search.

Download article

September 20 @ 16:20

CONGRESS HALL 2

Entropic Trace Estimation for Log Determinants

Jack Fitzsimons, Diego Granziol, Kurt Cutajar, Michael Osborne, Maurizio Filippone, Stephen Roberts

session: Kernel Methods II

Reproducible Research

The scalable calculation of matrix determinants has been a bottleneck to the widespread application of many machine learning methods such as determinantal point processes, Gaussian processes, generalised Markov random fields, graph models and many others. In this work, we estimate log determinants under the framework of maximum entropy, given information in the form of moment constraints from stochastic trace estimation. The estimates demonstrate a significant improvement on state-of-the-art alternative methods, as shown on a wide variety of matrices from the SparseSuite Matrix Collection. By taking the example of a general Markov random field, we also demonstrate how this approach can significantly accelerate inference in large-scale learning methods involving the log determinant.

Download article

September 20 @ 16:20

CONGRESS HALL 4

Flash points: Discovering exceptional pairwise behaviors in vote or rating data

Adnene Belfodil, Sylvie Cazalens, Philippe Lamarre, Marc Plantevit

session: Subgroup Discovery

Reproducible Research

We address the problem of discovering contexts that lead well-distinguished collections of individuals to change their pairwise agreement w.r.t. their usual one. For instance, in the European parliament, while in overall, a strong disagreement is witnessed between deputies of the far-right French party Front National and deputies of the left party Front de Gauche, a strong agreement is observed between these deputies in votes related to the thematic: External relations with the union. We devise the method DSC (Discovering Similarities Changes) which relies on exceptional model mining to uncover three-set patterns that identify contexts and two collections of individuals where an unex- pected strengthening or weakening of pairwise agreement is observed. To efficiently explore the search space, we define some closure operators and pruning techniques using upper bounds on the quality measure. In addition of handling usual attributes (e.g. numerical, nominal), we propose a novel pattern domain which involves hierarchical multi-tag attributes that are present in many datasets. A thorough empirical study on two real-world datasets (i.e., European parliament votes and collaborative movie reviews) demonstrates the efficiency and the effectiveness of our approach as well as the interest and the actionability of the patterns.

Download article

September 20 @ 16:20

CONGRESS HALL 1

The network-untangling problem: From interactions to activity timelines

Polina Rozenstein, Nikolaj Tatti, Aristides Gionis

session: Networks and Graphs II

Reproducible Research

In this paper we study a problem of determining when entities are active based on their interactions with each other. More formally, we consider a set of entities V and a sequence of time-stamped edges E among the entities. Each edge (u,v,t) in E denotes an interaction between entities u and v that takes place at time t. We view this input as a temporal network. We then assume a simple activity model in which each entity is active during a short time interval. An interaction (u,v,t) can be explained if at least one of u or v are active at time t. Our goal is to reconstruct the activity intervals, for all entities in the network, so as to explain the observed interactions. This problem, which we refer to as the network-untangling problem, can be applied to discover timelines of events from complex interactions among entities. We provide two formulations for the network-untangling problem: (i) minimizing the total interval length over all entities, and (ii) minimizing the maximum interval length. We show that the sum problem is NP-hard, while, surprisingly, the max problem can be solved optimally in linear time, using a mapping to 2-SAT. For the sum problem we provide efficient and effective algorithms based on realistic assumptions. Furthermore, we complement our study with an extensive evaluation on synthetic and real-world datasets, which demonstrates the validity of our concepts and the good performance of our algorithms.

Download article

September 20 @ 16:20

CONGRESS HALL 3

Distributed Stochastic Optimization of the Regularized Risk via Saddle-point Problem

Shin Matsushima, Hyokun Yun, Xinhua Zhang, S.V.N. Vishwanathan

session: Learning and Optimization II

Many machine learning algorithms minimize a regularized risk, and stochastic optimization is widely used for this task. When working with massive data, it is desirable to perform stochastic optimization in parallel. Unfortunately, many existing stochastic optimization algorithms cannot be parallelized efficiently. In this paper we show that one can rewrite the regularized risk minimization problem as an equivalent saddle-point problem, and propose an efficient distributed stochastic optimization (DSO) algorithm. We prove the algorithm's rate of convergence; remarkably, our analysis shows that the algorithm scales almost linearly with the number of processors. We also verify with empirical evaluations that the proposed algorithm is competitive with other parallel, general purpose stochastic and batch optimization algorithms for regularized risk minimization.

Download article

September 20 @ 16:40

CONGRESS HALL 1

TransT: Type-based Multiple Embedding Representations for Knowledge Graph Completion

Shiheng Ma, Jianhui Ding, Weijia Jia, Kun Wang, Minyi Guo

session: Networks and Graphs II

Reproducible Research

Knowledge graph completion with representation learning predicts new entity-relation triples from the existing knowledge graphs by embedding entities and relations into a vector space. Most existing methods focus on the structured information of triples and maximize the likelihood of them. However, they neglect semantic information contained in most knowledge graphs and the prior knowledge indicated by the semantic information. To overcome this drawback, we propose an approach that integrates the structured information and entity types which describe the categories of entities. Our approach constructs relation types from entity types and utilizes type-based semantic similarity of the related entities and relations to capture prior distributions of entities and relations. With the type-based prior distributions, our approach generates multiple embedding representations of each entity in different contexts and estimates the posterior probability of entity and relation prediction. Extensive experiments show that our approach outperforms previous semantics-based methods.

Download article

September 20 @ 16:40

CONGRESS HALL 2

Nystrom sketching

Daniel Perry, Braxton Osting, Ross Whitaker

session: Kernel Methods II

Despite prolific success, kernel methods become difficult to use in many large-scale unsupervised problems because of the evaluation and storage of the full Gram matrix. Here we overcome this difficulty by proposing a novel approach: compute the optimal small, out-of-sample Nystrom sketch which allows for fast approximation of the Gram matrix via the Nystrom method. We demonstrate and compare several methods for computing the optimal Nystrom sketch and show how this approach outperforms previous state-of-the-art Nystrom subset-based methods of similar size.

Download article

September 21 @ 10:00

CONGRESS HALL 1

Learning and Scaling Directed Networks via Graph Embedding

Mikhail Drobyshevskiy, Anton Korshunov, Denis Turdakov

Best Student KDD paper

Reliable evaluation of network mining tools implies significance and scalability testing. This is usually achieved by picking several graphs of various size from different domains. However, graph properties and thus evaluation results could be dramatically different from one domain to another. Hence the necessity of aggregating results over a multitude of graphs within each domain. The paper introduces an approach to automatically learn features of a directed graph from any domain and generate similar graphs while scaling input graph size with a real-valued factor. Generating multiple graphs with similar size allows significance testing, while scaling graph size makes scalability evaluation possible. The proposed method relies on embedding an input graph into low-dimensional space, thus encoding graph features in a set of node vectors. Edge weights and node communities could be imitated as well in optional steps. We demonstrate that embedding-based approach ensures variability of synthetic graphs while keeping degree and subgraphs distributions close to the original graphs. Therefore, the method could make significance and scalability testing of network algorithms more reliable without the need to collect additional data. We also show that embedding-based approach preserves various features in generated graphs which can't be achieved by other generators imitating a given graph.

Download article

September 21 @ 11:00

CONGRESS HALL 3

Concentration Free Outlier Detection

Fabrizio Angiulli

session: Anomaly Detection

We present a novel notion of outlier, called Concentration Free Outlier Factor (CFOF), having the peculiarity to resist concentration phenomena that affect other scores when the dimensionality of the feature space increases. Indeed we formally prove that CFOF does not concentrate in intrinsically high-dimensional spaces. Moreover, CFOF is adaptive to different local density levels and it does not require the computation of exact neighbors in order to be reliably computed. We present a very efficient technique, named fast-CFOF, for detecting outliers in very large high-dimensional datasets. The technique is efficiently parallelizable, and we provide a MIMD-SIMD implementation. Experimental results witness for scalability and effectiveness of the technique and highlight that CFOF exhibits state of the art detection performances.

Download article

September 21 @ 11:00

CONGRESS HALL 2

An Exponential Family Framework For Learning to Predict Unseen Classes

Vinay Verma, Wenlin Wang, Piyush Rai

session: Unsupervised and Semisupervised Learning II

Reproducible Research

We present a simple generative framework for learning to predict previously unseen classes, based on estimating class-attribute-gated class-conditional distributions. We model each class-conditional distribution as an exponential family distribution and the parameters of the distribution of each seen/unseen class are defined as functions of the respective observed class attributes. These functions can be learned using only the seen class data and can be used to predict the parameters of the class-conditional distribution of each unseen class. Unlike most existing methods for zero-shot learning that represent classes as fixed embeddings in some vector space, our generative model naturally represents each class as a probability distribution. It is simple to implement and also allows leveraging additional unlabeled data from unseen classes to improve the estimates of their class-conditional distributions using transductive/semi-supervised learning. Moreover, it extends seamlessly to few-shot learning by easily updating these distributions when provided with a small number of additional labelled examples from unseen classes. Through a comprehensive set of experiments on several benchmark data sets, we demonstrate the efficacy of our framework.

Download article

September 21 @ 11:20

CONGRESS HALL 3

Harsh Dani, Jundong Li, Huan Liu

session: Anomaly Detection

Cyberbullying is a phenomenon which negatively affects the individuals, the victims suffer from various mental issues, ranging from depression, loneliness, anxiety to low self-esteem. In parallel with the pervasive use of social media, cyberbullying is becoming more and more prevalent. Traditional mechanisms to fight against cyberbullying include the use of standards and guidelines, human moderators, and blacklists based on the profane words. However, these mechanisms fall short in social media and cannot scale well. Therefore, it is necessary to develop a principled learning framework to automatically detect cyberbullying behaviors. However, it is a challenging task due to short, noisy and unstructured content information and intentional obfuscation of the abusive words or phrases by social media users. Motivated by sociological and psychological findings on bullying behaviors and the correlation with emotions, we propose to leverage sentiment information to detect cyberbullying behaviors in social media by proposing a sentiment informed cyberbullying detection framework. Experimental results on two real-world, publicly available social media datasets show the superiority of the proposed framework. Further studies validate the effectiveness of leveraging sentiment information for cyberbullying detection.

Download article

September 21 @ 12:00

CONGRESS HALL 2

Local PurTree Subspace Spectral Clustering for Customer Transaction Data

~~Xiaojun Chen, JianZhe Zhang, Wenya Sun, Joshua Huang, Qingyao Wu~~

session: Unsupervised and Semisupervised Learning II

This paper was accepted for presentation. However, it was not presented at the conference and is thus not published in the conference proceedings.

September 21 @ 12:20

CONGRESS HALL 2

Romain Tavenard, Simon Malinowski, Laetitia Chapel, Adeline Bailly, Heider Sanchez, Benjamin Bustos

session: Time Series and Streams II

Reproducible Research

In the time-series classification context, the majority of the most accurate core methods are based on the Bag-of-Words framework, in which sets of local features are first extracted from time series. A dictionary of words is then learned and each time series is finally represented by a histogram of word occurrences. This representation induces a loss of information due to the quantization of features into words as all the time series are represented using the same fixed dictionary. In order to overcome this issue, we introduce in this paper a kernel operating directly on sets of features. Then, we extend it to a time-compliant kernel that allows one to take into account the temporal information. We apply this kernel in the time series classification context. Proposed kernel has a quadratic complexity with the size of input feature sets, which is problematic when dealing with long time series. However, we show that kernel approximation techniques can be used to define a good trade-off between accuracy and complexity. We experimentally demonstrate that the proposed kernel can significantly improve the performance of time series classification algorithms based on Bag-of-Words.

Download article

September 21 @ 14:40

CONGRESS HALL 2

Etienne Auclair, Nathalie Peyrard, Régis Sabbadin

session: Probabilistic Models and Methods III

Learning interactions between dynamical processes is a widespread but difficult problem in ecological or human sciences. Unlike in other domains (bioinformatics, for example), data is often scarce, but expert knowledge is available. We consider the case where knowledge is about a limited number of interactions that drive the processes dynamics, and on a community structure in the interaction network. We propose an original framework, based on Dynamic Bayesian Networks with labeled-edge structure and parsimonious parameterization, and a Stochastic Block Model prior, to integrate this knowledge. Then we propose a restoration-estimation algorithm, based on 0-1 Linear Programing, that improves network learning when these two types of expert knowledge are available. The approach is illustrated on a problem of ecological interaction network learning.

Download article

September 21 @ 16:20

CONGRESS HALL 2

Comparative Study of Inference Methods for Bayesian Nonnegative Matrix Factorisation

Thomas Brouwer, Jes Frellsen, Pietro Lio

session: Matrix and Tensor Factorization

Reproducible Research

In this paper, we study the trade-offs of different inference approaches for Bayesian matrix factorisation methods, which are commonly used for predicting missing values, and for finding patterns in the data. In particular, we consider Bayesian nonnegative variants of matrix factorisation and tri-factorisation, and compare non-probabilistic inference, Gibbs sampling, variational Bayesian inference, and a maximum-a-posteriori approach. The variational approach is new for the Bayesian nonnegative models. We compare their convergence, and robustness to noise and sparsity of the data, on both synthetic and real-world datasets. Furthermore, we extend the models with the Bayesian automatic relevance determination prior, allowing the models to perform automatic model selection, and demonstrate its efficiency.

Download article

September 21 @ 16:20

CONGRESS HALL 4

Early Active Learning with Pairwise Constraint for Person Re-identification

Wenhe Liu, xiaojun Chang, Ling Chen, Yi Yang, Alexander Hauptmann

session: Computer Vision

Research on person re-identification (re-id) has attached much attention in the machine learning field in recent years. With sufficient labeled training data, supervised re-id algorithm can obtain promising performance. However, producing labeled data for training supervised re-id models is an extremely challenging and time-consuming task because it requires every pair of images across no-overlapping camera views to be labeled. Moreover, in the early stage of experiments, when labor resources are limited, only a small number of data can be labeled. Thus, it is essential to design an effective algorithm to select the most representative samples. This is referred as early active learning or early stage experimental design problem. The pairwise relationship plays a vital role in the re-id problem, but most of the existing early active learning algorithms fail to consider this relationship. To overcome this limitation, we propose a novel and efficient early active learning algorithm with a pairwise constraint for person re-identification in this paper. By introducing the pairwise constraint, the closeness of similar representations of instances is enforced in active learning. This benefits the performance of active learning for re-id. Extensive experimental results on four benchmark datasets confirm the superiority of the proposed algorithm.

Download article

September 21 @ 16:20

CONGRESS HALL 3

Distributed Multi-task Learning for Sensor Network

Jiyi Li, Tomohiro Arai, Yukino Baba, Hisashi Kashima, Shotaro Miwa

session: Transfer and Multi-Task Learning II

A sensor in a sensor network is expected to be able to make prediction or decision utilizing the models learned from the data observed on this sensor. However, in the early stage of using a sensor, there may be not a lot of data available to train the model for this sensor. A solution is to leverage the observation data from other sensors which have similar conditions and models with the given sensor. We thus propose a novel distributed multi-task learning approach which incorporates neighborhood relations among sensors to learn multiple models simultaneously in which each sensor corresponds to one task. It may be not cheap for each sensor to transfer the observation data from other sensors; broadcasting the observation data of a sensor in the entire network is not satisfied for the reason of privacy protection; each sensor is expected to make real-time prediction independently from neighbor sensors. Therefore, this approach shares the model parameters as regularization terms in the objective function by assuming that neighbor sensors have similar model parameters. We conduct the experiments on two real datasets by predicting the temperature with the regression. They verify that our approach is effective, especially when the bias of an independent model which does not utilize the data from other sensors is high such as when there is not plenty of training data available.

Download article

September 21 @ 16:40

CONGRESS HALL 1

Ivan Stojkovic, Mohamed Ghalwash, Zoran Obradovic

session: Transfer and Multi-Task Learning II

Scoring functions are an important tool for quantifying properties of interest in many domains; for example, in healthcare, a disease severity scores are used to diagnose the patient's condition and to decide its further treatment. Scoring functions might be obtained based on the domain knowledge or learned from data by using classification, regression or ranking techniques - depending on the type of supervised information. Although learning scoring functions from collected data is beneficial, it can be challenging when limited data are available. Therefore, learning multiple distinct, but related, scoring functions together can increase their quality as shared regularities may be easier to identify. We propose a multitask formulation for ranking-based learning of scoring functions, where the model is trained from pairwise comparisons. The approach uses mixed-norm regularization to impose structural regularities among the tasks. The proposed regularized objective function is convex; therefore, we developed an optimization approach based on alternating minimization and proximal gradient algorithms to solve the problem. The increased predictive accuracy of the presented approach, in comparison to several baselines, is demonstrated on synthetic data and two different real-world applications; predicting exam scores and predicting tolerance to infections score.

Download article

September 21 @ 17:20

CONGRESS HALL 2

Structurally Regularized Non-negative Tensor Factorization for Spatio-temporal Pattern Discoveries

Koh Takeuchi, Yoshinobu Kawahara, Tomoharu Iwata

session: Matrix and Tensor Factorization

Reproducible Research

Understanding spatio-temporal activities in a city is a typical problem of spatio-temporal data analysis. For this analysis, tensor factorization methods have been widely applied for extracting a few essential patterns into latent factors. Non-negative Tensor Factorization (NTF) is popular because of its capability of learning interpretable factors from non-negative data, simple computation procedures, and dealing with missing observation. However, since existing NTF methods are not fully aware of spatial and temporal dependencies, they often fall short of learning latent factors where a large portion of missing observation exist in data. In this paper, we present a novel NTF method for extracting smooth and flat latent factors by leveraging various kinds of spatial and temporal structures. Our method incorporates a unified structured regularizer into NTF that can represent various kinds of auxiliary information, such as an order of timestamps, a daily and weekly periodicity, distances between sensor locations, and areas of locations. For the estimation of the factors for our model, we present a simple and efficient optimization procedure based on the alternating direction method of multipliers. In missing value interpolation experiments of traffic flow data and bike-sharing system data, we demonstrate that our proposed method improved interpolation performances from existing NTF, especially when a large portion of missing values exists.

Download article

September 21 @ 17:20

CONGRESS HALL 4

Scatteract: Automated extraction of data from scatter plots

Mathieu Cliche, David Rosenberg, Connie Yee, Dhruv Madeka

session: Computer Vision

Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points. We present a fully automated system for extracting the numerical values of data points from images of scatter plots. We use deep learning techniques to identify the key components of the chart, and optical character recognition together with robust regression to map from pixels to the coordinate system of the chart. We focus on scatter plots with linear scales, which already have several interesting challenges. Previous work has done fully automatic extraction for other types of charts, but to our knowledge this is the first approach that is fully automatic for scatter plots. Our method performs well, achieving successful data extraction on 89% of the plots in our test set.

Download article

September 21 @ 18:00

CONGRESS HALL 4

Unsupervised Diverse Colorization via Generative Adversarial Networks

Yun Cao, Zhiming Zhou, Weinan Zhang, Yong Yu

session: Computer Vision

Mohammad Tayebi, Uwe Glässer, Patricia Brantingham, Hamed Yaghoubi Shahir

Yifeng Gao, Qingzhe Li, Xiaosheng Li, Jessica Lin, Huzefa Rangwala, Ranjeev Mittu

Ivica Dimitrovski, Dragi Kocev, Suzana Loskovska, Sašo Džeroski

session: Nectar II

In this work, we summarize our work on using the predictive clustering framework for image analysis. More specifically, we used predictive clustering trees to generate image representations, that can then be used to perform image retrieval and/or image annotation. We evaluated the proposed method for performing image retrieval on general purpose images, and annotation of general purpose images, medical images and diatom images.

Download article

Monday, September 18th, 2017

MLSA 2017 - Machine Learning and Data Mining for Sports Analytics

Abstract:
Sports Analytics has been a steadily growing and rapidly evolving area over the last decade, both in US professional sports leagues and in European football leagues. The majority of techniques used in the field so far are statistical. However, there has been growing interest in the Machine Learning and Data Mining community about this topic as this setting is interesting, challenging and offers new sources of data. The workshop concerns all aspects of applying machine learning and data mining techniques for sports problems such as match strategy, tactics, and analysis; player acquisition, player valuation, and team spending; injury prediction and prevention; match outcome and league table prediction; and tournament design and scheduling among others.

Organizers:
Jesse Davis, KU Leuven, Belgium
Mehdi Kaytoue, INSA Lyon, France
Albrecht Zimmermann, University of Caen, France

Workshop web page

PAP 2017 – 1st International Workshop of Personal Analytics and Privacy

Abstract:
In the era of Big Data, every single user of our hyper-connected world leaves behind a myriad of digital breadcrumbs while performing her daily activities. In this context personal data analytics and individual privacy protection are the key elements to leverage nowadays services to a new type of systems. The availability of personal analytics tools able to extract hidden knowledge from individual data while protecting the privacy right can help the society to move from organization-centric systems to user-centric systems, where the user is the owner of her personal data and is able to manage, understand, exploit, control and share her own data and the knowledge deliverable from them in a completely safe way.

Organizers:
Serge Abiteboul, Inria, ENS Paris, France
Riccardo Guidotti, KDDLab, ISTI-CNR Pisa, Italy
Anna Monreale, University of Pisa, Italy
Dino Pedreschi, University of Pisa, Italy

Workshop web page

SoGood 2017 – Data Science for Social Good

Abstract:
This workshops aims to attract papers presenting applications of Data Science to Social Good, or else that take into account social aspects of Data Science methods and techniques. Application domains should be as varied as possible. The novelty of the application and its social impact will be major selection criteria.

Organizers:
Ricard Gavaldà, UPC BarcelonaTech, Spain
Irena Koprinska, University of Sidney, Australia
Stefan Kramer, JGU Mainz, Germany

Workshop web page

SURL – Scaling-Up Reinforcement Learning

Abstract:
Reinforcement Learning (RL) has achieved many successes over the years in training autonomous agents to perform simple tasks. However, one of the major remaining challenges in RL is scaling it to high-dimensional, real-world applications.

Although many works have already focused on strategies to scale-up RL techniques and to find solutions for more complex problems with reasonable successes, many issues still exist. This workshop encourages to discuss diverse approaches to accelerate and generalize RL, such as the use of approximations, abstractions, hierarchical approaches, and Transfer Learning.

Scaling-up RL methods has major implications on the research and practice of complex learning problems and will eventually lead to successful implementations in real-world applications.

This workshop intends to bridge the gap between conventional and scalable RL approaches. We aim to bring together resarchers working on different approaches to scale-up RL with the goal to solve more complex or larger scale problems. We intend to make this an exciting event for researchers worldwide, not only for the presentation of top quality papers, but also to spark the discussion of opportunities and challenges for future research directions.

Organizers:
Felipe Leno da Silva, University of São Paulo, Brazil
Ruben Glatt, University of São Paulo, Brazil

Workshop web page

MIDAS – 2nd Workshop on MIning DAta for financial applicationS

Abstract:
Like the famous King Midas, popularly remembered in Greek mythology for his ability to turn everything he touched with his hand into gold, we believe that the wealth of data generated by modern technologies, with widespread presence of computers, users and media connected by Internet, is a goldmine for tackling a variety of problems in the financial domain.

The MIDAS workshop is aimed at discussing challenges, potentialities, and applications of leveraging data-mining tasks to tackle problems in the financial domain. The workshop provides a premier forum for sharing findings, knowledge, insights, experience and lessons learned from mining data generated in various application domains.

Organizers:
Ilaria Bordino, UniCredit, R& D Dept., Italy
Guido Caldarelli, IMT Institute for Advanced Studies Lucca, Italy
Fabio Fumarola, UniCredit, R& D Dept., Italy Francesco Gullo, UniCredit, R& D Dept., Italy
Tiziano Squartini, IMT Institute for Advanced Studies Lucca, Italy

Workshop web page

DyNo 2017 – 3rd International Workshop on Dynamics in and of Networks

Abstract:
Network science, network analysis, and network mining are new scientific topics that emerged in recent years and are growing quickly. Instead of studying the properties of entities, network science focus on the interaction between these entities. The tremendous quantity of relational data that become available (Online Social Networks, cell phones, the Internet and the Web, trip datasets, etc.) encourage new research on the topic.

In the last years, we witnessed a shift from static network analysis to dynamic ones, i.e., the study of networks whose structure changes over time. As time goes by, all the perturbations which occur in the network topology due to the rise and fall of nodes and edges have repercussions on the network phenomena we are used to observing. As an example, evolution over time of social interactions in a network can play an important role in the diffusion of an infectious disease.

Nowadays, one of the most fascinating challenges is to analyze the structural dynamics of real world networks and how they impact on the processes which occur on them, i.e. the spreading of social influence and diffusion of innovations. Results in this field will enable a better understanding of important aspects of human behaviors as well as to a more detailed characterization of the complex interconnected society we inhabit. Since the last decades, diffusive and spreading phenomena were facilitated by the enormous popularity of the Internet and the evolution of social media that enable an unprecedented exchange of information. For this reason, understanding how social relationships unravel in these rapidly evolving contexts represents one of the most interesting fields of research. The purpose of the third edition of this workshop is to encourage research that will lead to the advancement of the social science in time-evolving networks.

Organizers:
Giulio Rossetti, KDD Laboratory, ISTI-CNR Pisa, Italy
Rémy Cazabet, LIP6, CNRS, Sorbonne Universités, France
Letizia Milli, Computer Science Department - University of Pisa, Italy

Workshop web page

TDLSG - Advances in Mining Large-Scale Time-Dependent Graphs

Abstract:
The aim of this workshop called Large-Scale Time Dependent Graphs (TD-LSG) is to bring together active scholars and practitioners of dynamic graphs. Graph models and algorithms are ubiquitous of a large number of application domains, ranging from transportation to social networks, semantic web, or data mining. However, many applications require graph models that are time dependent. For example, applications related to urban mobility analysis employ a graph structure of the underlying road network. Indeed, the nature of such networks are spatiotemporal. Therefore, the time a moving object takes to cross a path segment typically depends on the starting instant of time. So, we call time-dependent graphs, the graphs that have this spatiotemporal feature.

In this workshop, we aim to discuss the problem of mining large-scale time-dependent graphs, since there are many real world applications deal with a large volumes of spatio-temporal data (e.g. moving objects’ trajectories). Managing and analysing large-scale time-dependent graphs is very challenging since this requires sophisticated methods and techniques for creating, storing, accessing and processing such graphs in a distributed environment, because centralized approaches do not scale in a Big Data scenario. Contributions will clearly point out answers to one of these challenges focusing on large-scale graphs.

Organizers:
Sabeur Aridhi, University of Lorraine, France
José Fernandes de Macedo, Universidade Federale do Ceara, Fortaleza, Brazil
Engelbert Mephu Nguifo, LIMOS, Blaise Pascal University, France
Karine Zeitouni, DAVID, Université de Versailles Saint-Quentin, France

Workshop web page

Combined Workshops with Tutorials

IoT Large Scale Learning from Data Streams

Abstract:
The volume of data is rapidly increasing due to the development of the technology of information and communication. This data comes mostly in the form of streams. Learning from this ever-growing amount of data requires flexible learning models that self-adapt over time. In addition, these models must take into account many constraints: (pseudo) real-time processing, high-velocity, and dynamic multi-form change such as concept drift and novelty. This workshop welcomes novel research about learning from data streams in evolving environments. It will provide the researchers and participants with a forum for exchanging ideas, presenting recent advances and discussing challenges related to data streams processing. It solicits original work, already completed or in progress. Position papers are also considered. This workshop is combined with a tutorial treating the same topic and will be presented in the same day.

Organizers:
Moamar Sayed-Mouchaweh, Computer Science and Automatic Control Labs, High Engineering School of Mines, Douai
Albert Bifet, Telecom-ParisTech; Paris, France
Hamid Bouchachia, Department of Computing & Informatics, University of Bournemouth, Bournemouth, UK
João Gama, Laboratory of Artificial Intelligence and Decision Support, University of Porto, Porto, Portugal
Rita Ribeiro, Laboratory of Artificial Intelligence and Decision Support, University of Porto, Porto, Portugal

Workshop and tutorial web page

Interactive Adaptive Learning

Abstract:
This workshop on interactive adaptive learning aims at discussing techniques and approaches for optimising the whole learning process, including the interaction with human supervisors, processing systems, and includes adaptive, active, semi-supervised, and transfer learning techniques, and combinations thereof in interactive and adaptive machine learning systems.

AutoML – Automatic selection, configuration, and composition of machine learning algorithms

Workshop abstract:
This workshop will provide a platform for discussing the recent developments in the area of algorithm selection and configuration, which arises in many diverse domains, such as machine learning, data mining, optimization and automated reasoning. Algorithm selection and configuration are increasingly relevant today. Researchers and practitioners from all areas of science and technology face a large choice of parameterized machine learning algorithms, with little guidance as to which techniques to use in a given application context. Moreover, data mining challenges frequently remind us that algorithm selection and configuration are crucial in order to achieve cutting-edge performance, and drive industrial applications.

Meta-learning leverages knowledge of past algorithm applications to select the best techniques for future applications, and offers effective techniques that are superior to humans both in terms of the end result and especially in the time required to achieve it. In this workshop, we will discuss different ways of exploiting meta-learning techniques to identify the potentially best algorithm(s) for a new task, based on meta-level information, prior experiments on both past datasets and the current one. Many contemporary problems require the use of workflows that consist of several processes or operations. Constructing such complex workflows requires extensive expertise, and could be greatly facilitated by leveraging planning, meta-learning and intelligent system design. This task is inherently interdisciplinary, as it builds on expertise in various areas of AI.

Workshop web page

Tutorial abstract:
This tutorial will introduce and discuss state of the art methods in meta-learning, algorithm selection, and algorithm configuration, which are increasingly relevant today. Researchers and practitioners from all areas of science and technology face a large choice of parameterized machine learning algorithms, with little guidance as to when and how to use which technique. Data mining challenges frequently remind us that algorithm selection and configuration play a crucial role in achieving cutting-edge performance, and are indispensible in industrial applications.

Meta-learning leverages knowledge of past applications of algorithms applications to learn how to select the best techniques for future applications, and offers effective techniques that are superior to humans both in terms of the quality of the end result and even more so in the time required to achieve it. Recent approaches include also (preferably very fast) partial probing runs on a given problem with the aim of determining the best strategy to be used from there onwards. This may include further probing or recommending an algorithm to be used to solve the given problem. A recent trend is to incorporate such techniques into software platforms. This synergy leads to new advances that recommend combinations of algorithms and hyperparameter settings simultaneously, and that speed up algorithm configuration by learning which parameter settings are likely most useful for dealing with the data at hand.

After motivating and introducing the concepts of algorithm selection and configuration, we elucidate how they arise in machine learning and data mining, but also in other domains, such as optimization. We demonstrate how meta-learning techniques can be effectively used in this context, exploiting information gleaned from past experiments as well as by probing the data at hand. Moreover, many current applications require the use of machine learning or data mining workflows that consist of several different processes or operations. Constructing such complex systems or workflows requires extensive expertise, as well as existing meta-data and software, and can be greatly facilitated by leveraging the methodologies presented at this tutorial.

Tutorial web page

Organizers:
Pavel Brazdil, LIAAD Inesc Tec., Portugal
Joaquin Vanschoren, Eindhoven University of Technology, Netherlands
Holger H. Hoos, Universiteit Leiden, Netherlands
Frank Hutter, University of Freiburg, Germany

Monday, September 18th, 2017

Core Decomposition of Networks: Concepts, Algorithms, and Applications

Abstract:
Graph mining is an important research area with a plethora of practical applications. Core decomposition of networks is a fundamental operation strongly related to more complex mining tasks such as community detection, dense subgraph discovery, identification of influential nodes, network visualization, text mining, just to name a few. In this tutorial, we will present in detail the concept and properties of core decomposition in graphs, the associated algorithms for its efficient computation and important cross-disciplinary applications that benefit from it.

Organizers:
Fragkiskos D. Malliaros, UC San Diego La Jolla, USA
Apostolos N. Papadopoulos, Aristotle University of Thessaloniki, Thessaloniki, Greece
Michalis Vazirgiannis, Ecole Polytechnique Palaiseau, France

Tutorial web page

Combined Workshops with Tutorials

IoT Large Scale Learning from Data Streams

Workshop and tutorial web page

Interactive Adaptive Learning

Workshop and tutorial web page

Friday, September 22nd, 2017

Machine learning with fossil data: analyzing environmental and climate change

Abstract:
Global fossil databases have been growing rapidly in the last decade. They aggregate and accumulate findings and knowledge that palaeobiologists acquired over many years. These datasets are big data in their essence - compiled from different sources, to an extent subjective, include specific biases and uncertainties, data sparseness and quality varies over time and space. In addition, to understand relations between organisms and climate high volume and large velocity satellite observations some into play that require scalability in computing. Databases of this kind offer an excellent ground for interdisciplinary machine learning research. This tutorial will outline research questions that could be addressed using computational methods, discuss characteristics of fossil data and computational tasks for machine learning and data mining, overview existing computational approaches, and discuss what more could be done from the machine learning and data mining perspective.

Organization:
Indrė Žliobaitė, University of Helsinki, Finnland

Tutorial web page

Deep Learning for Computer Vision Applications: Robotics and Driving

Abstract:
Deep Learning methods have become ubiquitous for computer vision tasks. This tutorial will focus on recent advances in deep learning for vision applications in robotics and autonomous vehicles. The tutorial will start with basic Deep Learning techniques and will highlight state-of-the-art methods in the three major topics in computer vision: classification, detection and segmentation. Then the tutorial will continue with more concrete methods and their applications, e.g. in scene understanding, 3D analysis, perception for robotics and autonomous driving. The goal of the tutorial is to focus on relevant techniques, which are of significant impact to real-world applications, and which will benefit the broader Machine Learning community.

Organization:
Anelia Angelova, Google Research / Google Brain
Sanja Fidler, University of Toronto

Tutorial web page

Combined Workshops with Tutorials

AutoML – Automatic selection, configuration, and composition of machine learning algorithms

Workshop web page

Tutorial web page

Johannes Fürnkranz

TU Darmstadt, Germany

50 Ways to Tweak your Paper

Very often, we see papers that have good research ideas be rejected because the quality of the write-up does not quite live up to the quality of the idea. Whether you like it or not, the presentation of your work can often make the difference that tilts a borderline paper one way or the other. In particular for conference-style reviewing, where reviewers have to make recommendations for multiple papers in a very narrow time frame, they are often influenced by the writing and the appearance of the paper (whether they like it or not). In this talk, targeted towards junior Ph. D. students, we will make a few suggestions how to improve the presentation of your work. Most of them are obvious, but we nevertheless often see them violated in practice.

presentation slides (22.09.2017)

Johannes Fürnkranz is a full Professor for Knowledge Engineering at TU Darmstadt, Germany. His main research interests are machine learning and data mining, in particular inductive rule learning, learning of intrpretable models, multi-label classification and preference learning, and their applications in game playing, web mining, and scientific data mining. Since 2015, he serves as the editor-in-chief of Data Mining and Knowledge Discovery, the most traditional and renown journal in this area. He is also a long-time action editor for Machine Learning, and current or past editorial board member of several other well-known journals, and a regular PC or senior PC member of premier conferences in the areas of machine learning, data mining, information retrieval, and artificial intelligence. He was nominated “best reviewer” at two Machine Learning conferences, “outstanding PC member” at the AAAI conference, and “outstanding editor” of the Machine Learning journal. He served as the program co-chair of the 6th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (Berlin 2006), the 27th International Conference on Machine Learning (Haifa 2010), the 16th International Conference on Discovery Science (Singapore 2013), and the 40th German Conference on Artificial Intelligence (Dortmund 2017, September 25-29th, 2017)

Tias Guns

Vrije Universiteit Brussel (VUB), Belgium

Tips for a successful PhD, and how to win an award with it

The talk will dive into some of the every day challenges a PhD student faces: working with your promoter, writing a paper, conference visits, how to get your work more widely known and up to winning an award with your thesis. The talk will have anecdotes based on my or colleagues' experiences and indispensable references to PhD comics, XKCD and other highly valuable sources.

presentation slides (22.09.2017)

Tias Guns is Assistant Professor at the Vrije Universiteit Brussel (VUB), Belgium, in the Business, Technology and Operations lab of the faculty of Economic and Social Sciences & Solvay Business School. His research lies on the border between data mining and constraint programming, and his main interest is in integrating domain expertise and user constraints into data analytics tasks. As part of his PhD, he has developed the CP4IM framework which showed for the first time the potential of using constraint programming for pattern mining. He is an active member of the community and has organized a number of workshops and a special issue on the topic of combining constraint programming with machine learning and data mining. His PhD was awarded with both the constraint programming dissertation award and the ECCAI artificial intelligence dissertation award.

Discovery Challenges

In these first 3 years of Horizon 2020, a total of 2.648 proposals were submitted to the FET-Open programme and covered a wide range of disciplines: from Physics to Life Sciences, from Information Sciences and Engineering to Chemistry. Most proposals show indeed high degree of interdisciplinarity.

During my presentation I will focus on Research and Innovation-Actions (RIA) and the so-called "gatekeepers" that every excellent proposal should address. I will then present the evaluation process that allows the selection of the best proposals, resulting in a continuously growing portfolio of high quality interdisciplinary projects. I will conclude providing some statistics in terms of country and organization participations, scientific fields covered and interdisciplinarity.

presentation slides (18.09.2017)

Salvatore Spinello started his Ph.D. in Computer Science in 1997 at the University of Catania (Italy). He moved to Germany in 1999 where I finished his Ph.D. in collaboration with the University of Erlangen-Nuremberg. He then moved to London for his first Post-Doc (UCL) and after one year to Bordeaux (France) for his second Post-Doc (Inria). In 2004 he joined a small company, the French leader in the distribution of Virtual Reality's products. He held the position of Director of the R&D Department managing two European Projects.

In 2006 he joined Inria, the French Institute for Research in Computer Science and Control. He was in charge of identifying knowledge and technologies from research teams which were transferable to the external world (both industry and academic partners), enabling the transfer typically through licensing or joint R&D projects, protecting whenever appropriate the underlying intellectual property. He was also in charge of activities which aimed to facilitate the participation of researchers to National and European collaborative projects.

In 2013 he took the position of Project Officer at the Aquitaine Regional Council (France). He negotiated grant agreements; he monitored his portfolio from administrative, financial and technical aspects; he assessed technological progress and the fulfilment of contractual obligations; he managed the correct use of resources allocated to the projects, ensuring that the work was been carried out as planned; he monitored the overall performance (technical, dissemination, exploitation) and the strategic impact of projects.

In 2015 he joined the Research Executive Agency as Research Programme Officer where he is participating to the evaluation and selection of proposals submitted to the FET Open Programme and monitoring several funded projects.

Richard Wheeler

Edinburgh Scientific

Hints on how to write a successful project proposal

EU Funding has never been more important, nor harder to get. In this very practical talk, Richard Wheeler will provide an insider's view to securing European Union funding, including tips and tricks on good proposal writing, what really happens in EU review meetings, why most proposals fail, good project management methods, and more. Attendees will have the opportunity to ask questions and discuss their ideas in an informal workshop environment.

Topics will include:

What makes a good consortium
Why proposals fail
The EU review process
What makes good proposal writing
Getting started
Writing: Science and Technology
Writing: Knowledge Transfer
Writing: Exchange Programmes
Writing: Management and Implementation
Writing: Impact
Writing: Exploitation
Writing: Dissemination

presentation slides (18.09.2017)

Richard Wheeler is a specialist in artificial intelligence and computer science who has worked for the World Health Organisation in Geneva and The University of Edinburgh, and been a research manager at laboratories in Brussel s in Vienna. He currently runs a private scientific consultancy (Edinburgh Scientific) serving academic and industrial clients across Europe. He is active in the field of renewable energy and scientific management, acts as a chair in a number of EU funding schemes, and recently completed a book “Success with EU Proposals”.