Sign In

Regular decomposition for data mining


VTT has developed a new, generic method for dividing the structure and randomness of complex data.

Large masses of data often contain significant structures that do not reveal themselves to an observer without computational methods. So-called clustering methods typically structure the data into groups the members of which are similar to each other according to some metrics. The Regular Decomposition (RD) method developed by VTT aims instead to perform the grouping in the information-theoretically best way, creating a greatly compressed representation of the data.

The idea of this method is related to the fact of combinatorics known as the Szemerédi regularity lemma: very large networks always approximately have a so-called regular decomposition into relatively few parts, and the details smaller than this decomposition structure are approximately random. This classic result has later been generalised for many structures richer than simple networks such as matrices and hypergraphs.

VTT has found that this type of a regular decomposition can be found in very different large data sets: peer-to-peer traffic, metabolic networks, household electricity use, etc. It is significant that a decomposition can also be found in a computationally efficient way, and no advance guesses concerning the structure need to be made. The method has been partially developed in cooperation with Hungarian researchers.

The method is described in the article:

Hannu Reittu, Fülöp Bazsó, Robert Weiss. Regular Decomposition of Multivariate Time Series and Other Matrices. Structural, Syntactic, and Statistical Pattern Recognition (S+SSPR 2014). Lecture Notes in Computer Science 8621, 2014, pp 424-433. read the article »

A basic tool of the future for analysing massive data?

The regular decomposition method is a general method for finding large structures from data that is justified in more depth than traditional clustering (e.g. k-means). VTT offers finding regular decompositions as a new tol in its data mining tool set. It is also interested in cooperation in the development of new methods of application for the method.