Current Search: Cluster analysis (x)
View All Items
- Title
- Generalized Feature Embedding Learning for Clustering and Classication.
- Creator
- Golinko, Eric David, Zhu, Xingquan, Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Data comes in many di erent shapes and sizes. In real life applications it is common that data we are studying has features that are of varied data types. This may include, numerical, categorical, and text. In order to be able to model this data with machine learning algorithms, it is required that the data is typically in numeric form. Therefore, for data that is not originally numerical, it must be transformed to be able to be used as input into these algorithms. Along with this...
Show moreData comes in many di erent shapes and sizes. In real life applications it is common that data we are studying has features that are of varied data types. This may include, numerical, categorical, and text. In order to be able to model this data with machine learning algorithms, it is required that the data is typically in numeric form. Therefore, for data that is not originally numerical, it must be transformed to be able to be used as input into these algorithms. Along with this transformation it is common that data we study has many features relative to the number of samples in the data. It is often desirable to reduce the number of features that are being trained in a model to eliminate noise and reduce time in training. This problem of high dimensionality can be approached through feature selection, feature extraction, or feature embedding. Feature selection seeks to identify the most essential variables in a dataset that will lead to a parsimonious model and high performing results, while feature extraction and embedding are techniques that utilize a mathematical transformation of the data into a represented space. As a byproduct of using a new representation, we are able to reduce the dimension greatly without sacri cing performance. Oftentimes, by using embedded features we observe a gain in performance. Though extraction and embedding methods may be powerful for isolated machine learning problems, they do not always generalize well. Therefore, we are motivated to illustrate a methodology that can be applied to any data type with little pre-processing. The methods we develop can be applied in unsupervised, supervised, incremental, and deep learning contexts. Using 28 benchmark datasets as examples which include di erent data types, we construct a framework that can be applied for general machine learning tasks. The techniques we develop contribute to the eld of dimension reduction and feature embedding. Using this framework, we make additional contributions to eigendecomposition by creating an objective matrix that includes three main vital components. The rst being a class partitioned row and feature product representation of one-hot encoded data. Secondarily, the derivation of a weighted adjacency matrix based on class label relationships. Finally, by the inner product of these aforementioned values, we are able to condition the one-hot encoded data generated from the original data prior to eigenvector decomposition. The use of class partitioning and adjacency enable subsequent projections of the data to be trained more e ectively when compared side-to-side to baseline algorithm performance. Along with this improved performance, we can adjust the dimension of the subsequent data arbitrarily. In addition, we also show how these dense vectors may be used in applications to order the features of generic data for deep learning. In this dissertation, we examine a general approach to dimension reduction and feature embedding that utilizes a class partitioned row and feature representation, a weighted approach to instance similarity, and an adjacency representation. This general approach has application to unsupervised, supervised, online, and deep learning. In our experiments of 28 benchmark datasets, we show signi cant performance gains in clustering, classi cation, and training time.
Show less - Date Issued
- 2018
- PURL
- http://purl.flvc.org/fau/fd/FA00013063
- Subject Headings
- Eigenvectors--Data processing., Algorithms., Cluster analysis.
- Format
- Document (PDF)
- Title
- THE TAXONOMIC AND NON-TAXONOMIC CLUSTERING AND RECALL OF YOUNG CHILDREN IN A SORTING AND DELAYED RECALL TASK.
- Creator
- ZAKEN, FLORA JANE, Florida Atlantic University
- Abstract/Description
-
Several studies have found recall and clustering performance of young children to be greater with non-taxonomic (NT) than with taxonomic (T) materials, while other studies have found the reverse. The present experiment has tried to resolve this discrepancy by introducing the variable of criterion vs single sorting prior to recall. A comparison of Immediate and Delayed recall between child-generated T and child-generated NT categories under criterion (two consecutive identical sorts) and...
Show moreSeveral studies have found recall and clustering performance of young children to be greater with non-taxonomic (NT) than with taxonomic (T) materials, while other studies have found the reverse. The present experiment has tried to resolve this discrepancy by introducing the variable of criterion vs single sorting prior to recall. A comparison of Immediate and Delayed recall between child-generated T and child-generated NT categories under criterion (two consecutive identical sorts) and single sorting conditions was used to assess the differences in these T and NT grouping patterns as a basis for organizing recall. Although there were no significant interactions with delay, when subjects sorted only once, recall performance was greater with T related materials. However, when subjects sorted to a stable criterion of two consecutive identical sorts, recall performance with NT related materials was greater than performance with T related materials. These results suggest that under single sorting conditions, the use of T categories may have resulted in a better fit with the child's semantic memory structure than NT groupings. However, with stable sorting, both T and NT grouping patterns were equally consolidated into the memory structure, making them both equally retrievable.
Show less - Date Issued
- 1980
- PURL
- http://purl.flvc.org/fcla/dt/14009
- Subject Headings
- Memory in children, Cluster analysis, Recollection (Psychology)
- Format
- Document (PDF)
- Title
- Image retrieval using visual attention.
- Creator
- Mayron, Liam M., College of Engineering and Computer Science, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
The retrieval of digital images is hindered by the semantic gap. The semantic gap is the disparity between a user's high-level interpretation of an image and the information that can be extracted from an image's physical properties. Content based image retrieval systems are particularly vulnerable to the semantic gap due to their reliance on low-level visual features for describing image content. The semantic gap can be narrowed by including high-level, user-generated information. High-level...
Show moreThe retrieval of digital images is hindered by the semantic gap. The semantic gap is the disparity between a user's high-level interpretation of an image and the information that can be extracted from an image's physical properties. Content based image retrieval systems are particularly vulnerable to the semantic gap due to their reliance on low-level visual features for describing image content. The semantic gap can be narrowed by including high-level, user-generated information. High-level descriptions of images are more capable of capturing the semantic meaning of image content, but it is not always practical to collect this information. Thus, both content-based and human-generated information is considered in this work. A content-based method of retrieving images using a computational model of visual attention was proposed, implemented, and evaluated. This work is based on a study of contemporary research in the field of vision science, particularly computational models of bottom-up visual attention. The use of computational models of visual attention to detect salient by design regions of interest in images is investigated. The method is then refined to detect objects of interest in broad image databases that are not necessarily salient by design. An interface for image retrieval, organization, and annotation that is compatible with the attention-based retrieval method has also been implemented. It incorporates the ability to simultaneously execute querying by image content, keyword, and collaborative filtering. The user is central to the design and evaluation of the system. A game was developed to evaluate the entire system, which includes the user, the user interface, and retrieval methods.
Show less - Date Issued
- 2008
- PURL
- http://purl.flvc.org/fcla/flaent/EN00154040/68_1/98p0137i.pdf, http://purl.flvc.org/FAU/58006
- Subject Headings
- Image processing, Digital techniques, Database systems, Cluster analysis, Multimedia systems
- Format
- Document (PDF)
- Title
- Statistical physics based heuristic clustering algorithms with an application to econophysics.
- Creator
- Baldwin, Lucia Liliana, Florida Atlantic University, Wille, Luc T.
- Abstract/Description
-
Three new approaches to the clustering of data sets are presented. They are heuristic methods and represent forms of unsupervised (non-parametric) clustering. Applied to an unknown set of data these methods automatically determine the number of clusters and their location using no a priori assumptions. All are based on analogies with different physical phenomena. The first technique, named the Percolation Clustering Algorithm, embodies a novel variation on the nearest-neighbor algorithm...
Show moreThree new approaches to the clustering of data sets are presented. They are heuristic methods and represent forms of unsupervised (non-parametric) clustering. Applied to an unknown set of data these methods automatically determine the number of clusters and their location using no a priori assumptions. All are based on analogies with different physical phenomena. The first technique, named the Percolation Clustering Algorithm, embodies a novel variation on the nearest-neighbor algorithm focusing on the connectivity between sample points. Exploiting the equivalence with a percolation process, this algorithm considers data points to be surrounded by expanding hyperspheres, which bond when they touch each other. Once a sequence of joined spheres spans an entire cluster, percolation occurs and the cluster size remains constant until it merges with a neighboring cluster. The second procedure, named Nucleation and Growth Clustering, exploits the analogy with nucleation and growth which occurs in island formation during epitaxial growth of solids. The original data points are nucleation centers, around which aggregation will occur. Additional "ad-data" that are introduced into the sample space, interact with the data points and stick if located within a threshold distance. These "ad-data" are used as a tool to facilitate the detection of clusters. The third method, named Discrete Deposition Clustering Algorithm, constrains deposition to occur on a grid, which has the advantage of computational efficiency as opposed to the continuous deposition used in the previous method. The original data form the vertexes of a sparse graph and the deposition sites are defined to be the middle points of this graphs edges. Ad-data are introduced on the deposition site and the system is allowed to evolve in a self-organizing regime. This allows the simulation of a phase transition and by monitoring the specific heat capacity of the system one can mark out a "natural" criterion for validating the partition. All of these techniques are competitive with existing algorithms and offer possible advantages for certain types of data distributions. A practical application is presented using the Percolation Clustering Algorithm to determine the taxonomy of the Dow Jones Industrial Average portfolio. The statistical properties of the correlation coefficients between DJIA components are studied along with the eigenvalues of the correlation matrix between the DJIA components.
Show less - Date Issued
- 2003
- PURL
- http://purl.flvc.org/fcla/dt/12032
- Subject Headings
- Cluster analysis, Statistical physics, Percolation (Statistical physics), Algorithms
- Format
- Document (PDF)
- Title
- Analysis of a cluster-based architecture for hypercube multicomputers.
- Creator
- Obeng, Morrison Stephen., Florida Atlantic University, Mahgoub, Imad, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
In this dissertation, we propose and analyze a cluster-based hypercube architecture in which each node of the hypercube is furnished with a cluster of n processors connected through a small crossbar switch with n memory modules. Topological analysis of the cluster-based hypercube architecture shows that it reduces the complexity of the basic hypercube architecture by reducing the diameter, the degree of a node and the number of links in the hypercube. The proposed architecture uses the higher...
Show moreIn this dissertation, we propose and analyze a cluster-based hypercube architecture in which each node of the hypercube is furnished with a cluster of n processors connected through a small crossbar switch with n memory modules. Topological analysis of the cluster-based hypercube architecture shows that it reduces the complexity of the basic hypercube architecture by reducing the diameter, the degree of a node and the number of links in the hypercube. The proposed architecture uses the higher processing power furnished by the cluster of execution processors in each node to address the needs of computation-intensive parallel application programs. It provides a smaller dimension hypercube with the same number of execution processors as a higher dimension conventional hypercube architecture. This scheme can be extended to meshes and other architectures. Mathematical analysis of the parallel simplex and parallel Gaussian elimination algorithms executing on the cluster-based hypercube show the order of complexity of executing an n x n matrix problem on the cluster-based hypercube using parallel simplex algorithm to be O(n^2) and that of the parallel Gaussian elimination algorithm to be O(n^3). The timing analysis derived from the mathematical analysis results indicate that for the same number of processors in the cluster-based hypercube system as the conventional hypercube system, the computation to communication ratio of the cluster-based hypercube executing a matrix problem by parallel simplex algorithm increases when the number of nodes of the cluster-based hypercube is decreased. Self-driven simulations were developed to run parallel simplex and parallel Gaussian elimination algorithms on the proposed cluster-based hypercube architecture and on the Intel Personal Supercomputer (iPSC/860), which is a conventional hypercube. The simulation results show a response time performance improvement of up to 30% in favor of the cluster-based hypercube. We also observe that for increased link delays, the performance gap increases significantly in favor of the cluster-based hypercube architecture when both the cluster-based hypercube and the Intel iPSC/860, a conventional hypercube, execute the same parallel simplex and Gaussian elimination algorithms.
Show less - Date Issued
- 1995
- PURL
- http://purl.flvc.org/fcla/dt/12435
- Subject Headings
- Computer architecture, Cluster analysis--Computer programs, Hypercube networks (Computer networks), Parallel computers
- Format
- Document (PDF)
- Title
- Performance analysis of K-means algorithm and Kohonen networks.
- Creator
- Syed, Afzal A., Florida Atlantic University, Pandya, Abhijit S., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
K-means algorithm and Kohonen network possess self-organizing characteristics and are widely used in different fields currently. The factors that influence the behavior of K-means are the choice of initial cluster centers, number of cluster centers and the geometric properties of the input data. Kohonen networks have the ability of self-organization without any prior input about the number of clusters to be formed. This thesis looks into the performances of these algorithms and provides a...
Show moreK-means algorithm and Kohonen network possess self-organizing characteristics and are widely used in different fields currently. The factors that influence the behavior of K-means are the choice of initial cluster centers, number of cluster centers and the geometric properties of the input data. Kohonen networks have the ability of self-organization without any prior input about the number of clusters to be formed. This thesis looks into the performances of these algorithms and provides a unique way of combining them for better clustering. A series of benchmark problem sets are developed and run to obtain the performance analysis of the K-means algorithm and Kohonen networks. We have attempted to obtain the better of these two self-organizing algorithms by providing the same problem sets and extract the best results based on the users needs. A toolbox, which is user-friendly and written in C++ and VC++ is developed for applications on both images and feature data sets. The tool contains K-means algorithm and Kohonen networks code for clustering and pattern classification.
Show less - Date Issued
- 2004
- PURL
- http://purl.flvc.org/fcla/dt/13112
- Subject Headings
- Self-organizing maps, Neural networks (Computer science), Cluster analysis--Computer programs, Computer algorithms
- Format
- Document (PDF)
- Title
- Simulation analysis of cluster-based multiprocessor systems.
- Creator
- De Armas, Mario Ernesto., Florida Atlantic University, Mahgoub, Imad
- Abstract/Description
-
Multiprocessor systems have demonstrated great potential for meeting the ever increasing demand for higher performance. In this thesis, we develop simulation models with fewer and more realistic assumptions to evaluate the performance of the circuit-switched cluster-based multiprocessor system. We then introduce a packet-switched variation of the cluster-based architecture and develop simulation models to evaluate its performance. The analysis of the cluster-based systems is performed for...
Show moreMultiprocessor systems have demonstrated great potential for meeting the ever increasing demand for higher performance. In this thesis, we develop simulation models with fewer and more realistic assumptions to evaluate the performance of the circuit-switched cluster-based multiprocessor system. We then introduce a packet-switched variation of the cluster-based architecture and develop simulation models to evaluate its performance. The analysis of the cluster-based systems is performed for both uniform and non-uniform memory reference models. We conducted similar analysis for the crossbar and multiple-bus systems. Finally, the results of the cluster-based systems are compared to those obtained for the crossbar and the multiple-bus systems.
Show less - Date Issued
- 1993
- PURL
- http://purl.flvc.org/fcla/dt/14969
- Subject Headings
- Multiprocessors, Cluster analysis, Packet switching (Data transmission), Computer architecture, Computer simulation
- Format
- Document (PDF)