You are here

Statistical physics based heuristic clustering algorithms with an application to econophysics

Download pdf | Full Screen View

Date Issued:
2003
Summary:
Three new approaches to the clustering of data sets are presented. They are heuristic methods and represent forms of unsupervised (non-parametric) clustering. Applied to an unknown set of data these methods automatically determine the number of clusters and their location using no a priori assumptions. All are based on analogies with different physical phenomena. The first technique, named the Percolation Clustering Algorithm, embodies a novel variation on the nearest-neighbor algorithm focusing on the connectivity between sample points. Exploiting the equivalence with a percolation process, this algorithm considers data points to be surrounded by expanding hyperspheres, which bond when they touch each other. Once a sequence of joined spheres spans an entire cluster, percolation occurs and the cluster size remains constant until it merges with a neighboring cluster. The second procedure, named Nucleation and Growth Clustering, exploits the analogy with nucleation and growth which occurs in island formation during epitaxial growth of solids. The original data points are nucleation centers, around which aggregation will occur. Additional "ad-data" that are introduced into the sample space, interact with the data points and stick if located within a threshold distance. These "ad-data" are used as a tool to facilitate the detection of clusters. The third method, named Discrete Deposition Clustering Algorithm, constrains deposition to occur on a grid, which has the advantage of computational efficiency as opposed to the continuous deposition used in the previous method. The original data form the vertexes of a sparse graph and the deposition sites are defined to be the middle points of this graphs edges. Ad-data are introduced on the deposition site and the system is allowed to evolve in a self-organizing regime. This allows the simulation of a phase transition and by monitoring the specific heat capacity of the system one can mark out a "natural" criterion for validating the partition. All of these techniques are competitive with existing algorithms and offer possible advantages for certain types of data distributions. A practical application is presented using the Percolation Clustering Algorithm to determine the taxonomy of the Dow Jones Industrial Average portfolio. The statistical properties of the correlation coefficients between DJIA components are studied along with the eigenvalues of the correlation matrix between the DJIA components.
Title: Statistical physics based heuristic clustering algorithms with an application to econophysics.
108 views
24 downloads
Name(s): Baldwin, Lucia Liliana
Florida Atlantic University, Degree Grantor
Wille, Luc T., Thesis Advisor
Type of Resource: text
Genre: Electronic Thesis Or Dissertation
Issuance: monographic
Date Issued: 2003
Publisher: Florida Atlantic University
Place of Publication: Boca Raton, Fla.
Physical Form: application/pdf
Extent: 225 p.
Language(s): English
Summary: Three new approaches to the clustering of data sets are presented. They are heuristic methods and represent forms of unsupervised (non-parametric) clustering. Applied to an unknown set of data these methods automatically determine the number of clusters and their location using no a priori assumptions. All are based on analogies with different physical phenomena. The first technique, named the Percolation Clustering Algorithm, embodies a novel variation on the nearest-neighbor algorithm focusing on the connectivity between sample points. Exploiting the equivalence with a percolation process, this algorithm considers data points to be surrounded by expanding hyperspheres, which bond when they touch each other. Once a sequence of joined spheres spans an entire cluster, percolation occurs and the cluster size remains constant until it merges with a neighboring cluster. The second procedure, named Nucleation and Growth Clustering, exploits the analogy with nucleation and growth which occurs in island formation during epitaxial growth of solids. The original data points are nucleation centers, around which aggregation will occur. Additional "ad-data" that are introduced into the sample space, interact with the data points and stick if located within a threshold distance. These "ad-data" are used as a tool to facilitate the detection of clusters. The third method, named Discrete Deposition Clustering Algorithm, constrains deposition to occur on a grid, which has the advantage of computational efficiency as opposed to the continuous deposition used in the previous method. The original data form the vertexes of a sparse graph and the deposition sites are defined to be the middle points of this graphs edges. Ad-data are introduced on the deposition site and the system is allowed to evolve in a self-organizing regime. This allows the simulation of a phase transition and by monitoring the specific heat capacity of the system one can mark out a "natural" criterion for validating the partition. All of these techniques are competitive with existing algorithms and offer possible advantages for certain types of data distributions. A practical application is presented using the Percolation Clustering Algorithm to determine the taxonomy of the Dow Jones Industrial Average portfolio. The statistical properties of the correlation coefficients between DJIA components are studied along with the eigenvalues of the correlation matrix between the DJIA components.
Identifier: 9780496295555 (isbn), 12032 (digitool), FADT12032 (IID), fau:8947 (fedora)
Note(s): Thesis (Ph.D.)--Florida Atlantic University, 2003.
Subject(s): Cluster analysis
Statistical physics
Percolation (Statistical physics)
Algorithms
Held by: Florida Atlantic University Libraries
Persistent Link to This Record: http://purl.flvc.org/fcla/dt/12032
Sublocation: Digital Library
Use and Reproduction: Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Use and Reproduction: http://rightsstatements.org/vocab/InC/1.0/
Host Institution: FAU
Is Part of Series: Florida Atlantic University Digital Library Collections.