Home Research Working groups Data-driven modelling

Data-driven modelling

Article Index
Data-driven modelling
References
Members

Problem statement and challenges

In several applications it is difficult and/or time-consuming to construct models that are based on first principles. A black-box modelling approach is a viable alternative then, given the considerable progress that has been made in areas as machine learning, system identification, pattern recognition and statistics in relation to optimization. Commonly used models include support vector machines and kernel-based learning, graphical models and Bayesian networks, neural networks and others. However, new technologies are posing increasing challenges e.g. on the generation of massive data sets, the high dimensionality of the data spaces, the reliability of predictions and the need for interpretability of the estimated models.

The main focus of the working group is data-driven modelling. Different tasks are studied such as regression, classification, clustering, dimensionality reduction, data visualization, predictive modelling, feature selection, structure detection, data fusion, ranking and survival analysis. Major aims are the development of reliable and generic methodologies, convex formulations and convex relaxations, regularization mechanisms and incorporation of prior knowledge, handling different model structures, high dimensionality and large data sets.

Objectives and methodology

  1. Achieving sparseness. Objectives of our work include the study of block-wise penalties [1] both in a parametric and nonparametric setting, in relation e.g. to current work on L1 regularization; to link the optimality properties of solutions with their statistical relevance; to establish test procedures aimed at identifying all the relevant (groups of) covariates, based on duality arguments. Among the studied applications will be the use of interpretable models in survival analysis [2].

  2. Regularization mechanisms and prior knowledge incorporation. In problems of nonlinear system identification, the application of kernel-based models and support vector machines has been best
    established for general black-box models [3]. An open problem is to incorporate prior knowledge about the system within the optimization formulations. Up till now this has only been achieved for a
    limited class of systems such as Hammerstein systems [4]. Further systematic approaches will be investigated. Improved black-box modelling schemes will be investigated in the
    analysis of magnetic resonance spectroscopy in semi-parametric models [5,6] with the incorporation of spatial constraints.

  3. Optimization based clustering. Related to spectral clustering [7], clustering over time
    and the incorporation of prior knowledge will be studied. The methodology will be based on existing links between spectral clustering, kernel methods and least squares support vector machines [8]. In this setting underlying models are employed which have a feature map representation in the primal and a kernel-based representation in the dual. Applications will be studied in the analysis of network data (e.g. power grid networks, literature networks).

  4. Large data sets. In most (bio)chemical companies, process optimization and control is limited to data archiving, with database sizes of TeraBytes. Identification of a black box process model on these data sets requires efficient data clean-up, measurement selection and estimation procedures
    capable of dealing with these massive amounts of data. Both the applicability of parametric and non-parametric models in this context, including the application of convex optimization methods, is challenging [9]. Kernel-based models and support vector machines have been performing well on a wide variety of (smaller scale) problems [10,11]. Further research is needed towards the applicability of massive data sets, including estimations in the primal with fixed-size methods.

  5. Modelling for control. The use of black-box modelling approaches will be studied for use in control applications: for model based predictive control this involves the study of fast on-line updating of parameters and hyper-parameters; for the use of black-box models in statistical process control new
    multiple objective optimization problems need to be explored.



Newsflash

Johan Suykens has been awarded an ERC Advanced Grant. 

The ERC Project is entitled "A-DATADRIVE-B: Advanced Data-Driven Black-box modelling" and will in the coming 5 years considerably reinforce the research of OPTEC's working group 2 on Data Driven Modelling, which is led by Johan Suykens. More info can be found on
http://www.kuleuven.be/research/erc/suykens.html

Join the OPTEC Info List!