Current Search: Combinatorial group theory (x)
View All Items
- Title
- Techniques in Lattice Basis Reduction.
- Creator
- Khadka, Bal K., Magliveras, Spyros S., Florida Atlantic University, Charles E. Schmidt College of Science, Department of Mathematical Sciences
- Abstract/Description
-
The mathematical theory of nding a basis of shortest possible vectors in a given lattice L is known as reduction theory and goes back to the work of Lagrange, Gauss, Hermite, Korkin, Zolotarev, and Minkowski. Modern reduction theory is voluminous and includes the work of A. Lenstra, H. Lenstra and L. Lovasz who created the well known LLL algorithm, and many other researchers such as L. Babai and C. P. Schnorr who created signi cant new variants of basis reduction algorithms. The shortest...
Show moreThe mathematical theory of nding a basis of shortest possible vectors in a given lattice L is known as reduction theory and goes back to the work of Lagrange, Gauss, Hermite, Korkin, Zolotarev, and Minkowski. Modern reduction theory is voluminous and includes the work of A. Lenstra, H. Lenstra and L. Lovasz who created the well known LLL algorithm, and many other researchers such as L. Babai and C. P. Schnorr who created signi cant new variants of basis reduction algorithms. The shortest vector (SVP) and closest vector (CVP) problems, presently considered intractable, are algorithmic tasks that lie at the core of many number theoretic problems, integer programming, nding irreducible factors of polynomials, minimal polynomials of algebraic numbers, and simultaneous diophantine approximation. Lattice basis reduction also has deep and extensive connections with modern cryptography, and cryptanalysis particularly in the post-quantum era. In this dissertation we study and compare current systems LLL and BKZ, and point out their strengths and drawbacks. In addition, we propose and investigate the e cacy of new optimization techniques, to be used along with LLL, such as hill climbing, random walks in groups, our lattice di usion-sub lattice fusion, and multistage hybrid LDSF-HC technique. The rst two methods rely on the sensitivity of LLL to permutations of the input basis B, and optimization ideas over the symmetric group Sm viewed as a metric space. The third technique relies on partitioning the lattice into sublattices, performing basis reduction in the partition sublattice blocks, fusing the sublattices, and repeating. We also point out places where parallel computation can reduce runtimes achieving almost linear speedup. The multistage hybrid technique relies on the lattice di usion and sublattice fusion and hill climbing algorithms. Unlike traditional methods, our approach brings in better results in terms of basis reduction towards nding shortest vectors and minimal weight bases. Using these techniques we have published the competitive lattice vectors of ideal lattice challenge on the lattice hall of fame. Toward the end of the dissertation we also discuss applications to the multidimensional knapsack problem that resulted in the discovery of new large sets of geometric designs still considered very rare. The research introduces innovative techniques in lattice basis reduction theory and provides some space for future researchers to contemplate lattices from a new viewpoint.
Show less - Date Issued
- 2016
- PURL
- http://purl.flvc.org/fau/fd/FA00004678
- Subject Headings
- Cryptography., Combinatorial analysis., Group theory.
- Format
- Document (PDF)
- Title
- Computing automorphism groups of projective planes.
- Creator
- Adamski, Jesse Victor, Magliveras, Spyros S., Charles E. Schmidt College of Science, Department of Mathematical Sciences
- Abstract/Description
-
The main objective of this thesis was to find the full automorphism groups of finite Desarguesian planes. A set of homologies were used to generate the automorphism group when the order of the plane was prime. When the order was a prime power Pa,a ≠ 1 the Frobenius automorphism was added to the set of homologies, and then the full automorphism group was generated. The Frobenius automorphism was found by using the planar ternary ring derived from a coordinatization of the plane.
- Date Issued
- 2013
- PURL
- http://purl.flvc.org/fau/fd/FA0004000
- Subject Headings
- Combinatorial group theory, Finite geometrics, Geometry, Projective
- Format
- Document (PDF)
- Title
- The existence of minimal logarithmic signatures for classical groups.
- Creator
- Singhi, Nikhil., Charles E. Schmidt College of Science, Department of Mathematical Sciences
- Abstract/Description
-
A logarithmic signature (LS) for a nite group G is an ordered tuple = [A1;A2; : : : ;An] of subsets Ai of G, such that every element g 2 G can be expressed uniquely as a product g = a1a2 : : : ; an, where ai 2 Ai. Logarithmic signatures were dened by Magliveras in the late 1970's for arbitrary nite groups in the context of cryptography. They were also studied for abelian groups by Hajos in the 1930's. The length of an LS is defined to be `() = Pn i=1 jAij. It can be easily seen that for a...
Show moreA logarithmic signature (LS) for a nite group G is an ordered tuple = [A1;A2; : : : ;An] of subsets Ai of G, such that every element g 2 G can be expressed uniquely as a product g = a1a2 : : : ; an, where ai 2 Ai. Logarithmic signatures were dened by Magliveras in the late 1970's for arbitrary nite groups in the context of cryptography. They were also studied for abelian groups by Hajos in the 1930's. The length of an LS is defined to be `() = Pn i=1 jAij. It can be easily seen that for a group G of order Qk j=1 pj mj , the length of any LS for G satises `() Pk j=1mjpj . An LS for which this lower bound is achieved is called a minimal logarithmic signature (MLS). The MLS conjecture states that every finite simple group has an MLS. If the conjecture is true then every finite group will have an MLS. The conjecture was shown to be true by a number of researchers for a few classes of finite simple groups. However, the problem is still wide open. This dissertation addresses the MLS conjecture for the classical simple groups. In particular, it is shown that MLS's exist for the symplectic groups Sp2n(q), the orthogonal groups O 2n(q0) and the corresponding simple groups PSp2n(q) and 2n(q0) for all n 2 N, prime power q and even prime power q0. The existence of an MLS is also shown for all unitary groups GUn(q) for all odd n and q = 2s under the assumption that an MLS exists for GUn 1(q). The methods used are very general and algorithmic in nature and may be useful for studying all nite simple groups of Lie type and possibly also the sporadic groups. The blocks of logarithmic signatures constructed in this dissertation have cyclic structure and provide a sort of cyclic decomposition for these classical groups.
Show less - Date Issued
- 2011
- PURL
- http://purl.flvc.org/FAU/3172943
- Subject Headings
- Finite groups, Abelian groups, Number theory, Combinatorial group theory, Mathematical recreations, Linear algebraic groups, Lie groups
- Format
- Document (PDF)
- Title
- On the minimal logarithmic signature conjecture.
- Creator
- Singhi, Nidhi., Charles E. Schmidt College of Science, Department of Mathematical Sciences
- Abstract/Description
-
The minimal logarithmic signature conjecture states that in any finite simple group there are subsets Ai, 1 i s such that the size jAij of each Ai is a prime or 4 and each element of the group has a unique expression as a product Qs i=1 ai of elements ai 2 Ai. Logarithmic signatures have been used in the construction of several cryptographic primitives since the late 1970's [3, 15, 17, 19, 16]. The conjecture is shown to be true for various families of simple groups including cyclic groups,...
Show moreThe minimal logarithmic signature conjecture states that in any finite simple group there are subsets Ai, 1 i s such that the size jAij of each Ai is a prime or 4 and each element of the group has a unique expression as a product Qs i=1 ai of elements ai 2 Ai. Logarithmic signatures have been used in the construction of several cryptographic primitives since the late 1970's [3, 15, 17, 19, 16]. The conjecture is shown to be true for various families of simple groups including cyclic groups, An, PSLn(q) when gcd(n; q 1) is 1, 4 or a prime and several sporadic groups [10, 9, 12, 14, 18]. This dissertation is devoted to proving that the conjecture is true for a large class of simple groups of Lie type called classical groups. The methods developed use the structure of these groups as isometry groups of bilinear or quadratic forms. A large part of the construction is also based on the Bruhat and Levi decompositions of parabolic subgroups of these groups. In this dissertation the conjecture is shown to be true for the following families of simple groups: the projective special linear groups PSLn(q), the projective symplectic groups PSp2n(q) for all n and q a prime power, and the projective orthogonal groups of positive type + 2n(q) for all n and q an even prime power. During the process, the existence of minimal logarithmic signatures (MLS's) is also proven for the linear groups: GLn(q), PGLn(q), SLn(q), the symplectic groups: Sp2n(q) for all n and q a prime power, and for the orthogonal groups of plus type O+ 2n(q) for all n and q an even prime power. The constructions in most of these cases provide cyclic MLS's. Using the relationship between nite groups of Lie type and groups with a split BN-pair, it is also shown that every nite group of Lie type can be expressed as a disjoint union of sets, each of which has an MLS.
Show less - Date Issued
- 2011
- PURL
- http://purl.flvc.org/FAU/3172946
- Subject Headings
- Finite groups, Abelian groups, Number theory, Combinatorial group theory, Mathematical recreations, Linear algebraic groups, Lie groups
- Format
- Document (PDF)
- Title
- Alleviating class imbalance using data sampling: Examining the effects on classification algorithms.
- Creator
- Napolitano, Amri E., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Imbalanced class distributions typically cause poor classifier performance on the minority class, which also tends to be the class with the highest cost of mis-classification. Data sampling is a common solution to this problem, and numerous sampling techniques have been proposed to address it. Prior research examining the performance of these techniques has been narrow and limited. This work uses thorough empirical experimentation to compare the performance of seven existing data sampling...
Show moreImbalanced class distributions typically cause poor classifier performance on the minority class, which also tends to be the class with the highest cost of mis-classification. Data sampling is a common solution to this problem, and numerous sampling techniques have been proposed to address it. Prior research examining the performance of these techniques has been narrow and limited. This work uses thorough empirical experimentation to compare the performance of seven existing data sampling techniques using five different classifiers and four different datasets. The work addresses which sampling techniques produce the best performance in the presence of class unbalance, which classifiers are most robust to the problem, as well as which sampling techniques perform better or worse with each classifier. Extensive statistical analysis of these results is provided, in addition to an examination of the qualitative effects of the sampling techniques on the types of predictions made by the C4.5 classifier.
Show less - Date Issued
- 2006
- PURL
- http://purl.flvc.org/fcla/dt/13413
- Subject Headings
- Combinatorial group theory, Data mining, Decision trees, Machine learning
- Format
- Document (PDF)
- Title
- New Geometric Large Sets.
- Creator
- Hurley, Michael Robert, Magliveras, Spyros S., Florida Atlantic University, Charles E. Schmidt College of Science, Department of Mathematical Sciences
- Abstract/Description
-
Let V be an n-dimensional vector space over the field of q elements. By a geometric t-[q^n, k, λ] design we mean a collection D of k-dimensional subspaces of V, called blocks, such that every t-dimensional subspace T of V appears in exactly λ blocks in D. A large set, LS [N] [t, k, q^n], of geometric designs is a collection on N disjoint t-[q^n, k, λ] designs that partitions [V K], the collection of k-dimensional subspaces of V. In this work we construct non-isomorphic large sets using...
Show moreLet V be an n-dimensional vector space over the field of q elements. By a geometric t-[q^n, k, λ] design we mean a collection D of k-dimensional subspaces of V, called blocks, such that every t-dimensional subspace T of V appears in exactly λ blocks in D. A large set, LS [N] [t, k, q^n], of geometric designs is a collection on N disjoint t-[q^n, k, λ] designs that partitions [V K], the collection of k-dimensional subspaces of V. In this work we construct non-isomorphic large sets using methods based on incidence structures known as the Kramer-Mesner matrices. These structures are induced by particular group actions on the collection of subspaces of the vector space V. Subsequently, we discuss and use computational techniques for solving certain linear problems of the form AX = B, where A is a large integral matrix and X is a {0,1} solution. These techniques involve (i) lattice basis-reduction, including variants of the LLL algorithm, and (ii) linear programming. Inspiration came from the 2013 work of Braun, Kohnert, Ostergard, and Wassermann, [17], who produced the first nontrivial large set of geometric designs with t ≥ 2. Bal Khadka and Michael Epstein provided the know-how for using the LLL and linear programming algorithms that we implemented to construct the large sets.
Show less - Date Issued
- 2016
- PURL
- http://purl.flvc.org/fau/fd/FA00004732, http://purl.flvc.org/fau/fd/FA00004732
- Subject Headings
- Group theory., Finite groups., Factorial experiment designs., Irregularities of distribution (Number theory), Combinatorial analysis.
- Format
- Document (PDF)
- Title
- Avoiding abelian squares in infinite partial words.
- Creator
- Severa, William., Harriet L. Wilkes Honors College
- Abstract/Description
-
Famous mathematician Paul Erdèos conjectured the existence of infinite sequences of symbols where no two adjacent subsequences are permutations of one another. It can easily be checked that no such sequence can be constructed using only three symbols, but as few as four symbols are sufficient. Here, we expand this concept to include sequences that may contain 'do not know'' characters, called holes. These holes make the undesired subsequences more common. We explore both finite and infinite...
Show moreFamous mathematician Paul Erdèos conjectured the existence of infinite sequences of symbols where no two adjacent subsequences are permutations of one another. It can easily be checked that no such sequence can be constructed using only three symbols, but as few as four symbols are sufficient. Here, we expand this concept to include sequences that may contain 'do not know'' characters, called holes. These holes make the undesired subsequences more common. We explore both finite and infinite sequences. For infinite sequences, we use iterating morphisms to construct the non-repetitive sequences with either a finite number of holes or infinitely many holes. We also discuss the problem of using the minimum number of different symbols.
Show less - Date Issued
- 2010
- PURL
- http://purl.flvc.org/FAU/3335460
- Subject Headings
- Abelian groups, Mathematics, Study and teaching (Higher), Combinatorial analysis, Combinatorial set theory, Probabilities
- Format
- Document (PDF)
- Title
- New Results in Group Theoretic Cryptology.
- Creator
- Sramka, Michal, Florida Atlantic University, Magliveras, Spyros S., Charles E. Schmidt College of Science, Department of Mathematical Sciences
- Abstract/Description
-
With the publication of Shor's quantum algorithm for solving discrete logarithms in finite cyclic groups, a need for new cryptographic primitives arose; namely, for more secure primitives that would prevail in the post-quantum era. The aim of this dissertation is to exploit some hard problems arising from group theory for use in cryptography. Over the years, there have been many such proposals. We first look at two recently proposed schemes based on some form of a generalization of the...
Show moreWith the publication of Shor's quantum algorithm for solving discrete logarithms in finite cyclic groups, a need for new cryptographic primitives arose; namely, for more secure primitives that would prevail in the post-quantum era. The aim of this dissertation is to exploit some hard problems arising from group theory for use in cryptography. Over the years, there have been many such proposals. We first look at two recently proposed schemes based on some form of a generalization of the discrete logari thm problem (DLP), identify their weaknesses, and cryptanalyze them. By applying the exper tise gained from the above cryptanalyses, we define our own generalization of the DLP to arbitrary finite groups. We show that such a definition leads to the design of signature schemes and pseudo-random number generators with provable security under a security assumption based on a group theoretic problem. In particular, our security assumption is based on the hardness of factorizing elements of the projective special linear group over a finite field in some representations. We construct a one-way function based on this group theoretic assumption and provide a security proof.
Show less - Date Issued
- 2006
- PURL
- http://purl.flvc.org/fau/fd/FA00000878
- Subject Headings
- Group theory, Mathematical statistics, Cryptography, Combinatorial designs and configurations, Data encryption (Computer science), Coding theory
- Format
- Document (PDF)
- Title
- Low rank transitive representations, primitive extensions, and the collision problem in PSL (2, q).
- Creator
- Thapa Magar, Krishna B., Magliveras, Spyros S., Florida Atlantic University, Charles E. Schmidt College of Science, Department of Mathematical Sciences
- Abstract/Description
-
Every transitive permutation representation of a finite group is the representation of the group in its action on the cosets of a particular subgroup of the group. The group has a certain rank for each of these representations. We first find almost all rank-3 and rank-4 transitive representations of the projective special linear group P SL(2, q) where q = pm and p is an odd prime. We also determine the rank of P SL (2, p) in terms of p on the cosets of particular given subgroups. We then...
Show moreEvery transitive permutation representation of a finite group is the representation of the group in its action on the cosets of a particular subgroup of the group. The group has a certain rank for each of these representations. We first find almost all rank-3 and rank-4 transitive representations of the projective special linear group P SL(2, q) where q = pm and p is an odd prime. We also determine the rank of P SL (2, p) in terms of p on the cosets of particular given subgroups. We then investigate the construction of rank-3 transitive and primitive extensions of a simple group, such that the extension group formed is also simple. In the latter context we present a new, group theoretic construction of the famous Hoffman-Singleton graph as a rank-3 graph.
Show less - Date Issued
- 2015
- PURL
- http://purl.flvc.org/fau/fd/FA00004471, http://purl.flvc.org/fau/fd/FA00004471
- Subject Headings
- Combinatorial designs and configurations, Cryptography, Data encryption (Computer science), Finite geometries, Finite groups, Group theory, Permutation groups
- Format
- Document (PDF)
- Title
- Collabortive filtering using machine learning and statistical techniques.
- Creator
- Su, Xiaoyuan., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Collaborative filtering (CF), a very successful recommender system, is one of the applications of data mining for incomplete data. The main objective of CF is to make accurate recommendations from highly sparse user rating data. My contributions to this research topic include proposing the frameworks of imputation-boosted collaborative filtering (IBCF) and imputed neighborhood based collaborative filtering (INCF). We also proposed a model-based CF technique, TAN-ELR CF, and two hybrid CF...
Show moreCollaborative filtering (CF), a very successful recommender system, is one of the applications of data mining for incomplete data. The main objective of CF is to make accurate recommendations from highly sparse user rating data. My contributions to this research topic include proposing the frameworks of imputation-boosted collaborative filtering (IBCF) and imputed neighborhood based collaborative filtering (INCF). We also proposed a model-based CF technique, TAN-ELR CF, and two hybrid CF algorithms, sequential mixture CF and joint mixture CF. Empirical results show that our proposed CF algorithms have very good predictive performances. In the investigation of applying imputation techniques in mining incomplete data, we proposed imputation-helped classifiers, and VCI predictors (voting on classifications from imputed learning sets), both of which resulted in significant improvement in classification performance for incomplete data over conventional machine learned classifiers, including kNN, neural network, one rule, decision table, SVM, logistic regression, decision tree (C4.5), random forest, and decision list (PART), and the well known Bagging predictors. The main imputation techniques involved in these algorithms include EM (expectation maximization) and BMI (Bayesian multiple imputation).
Show less - Date Issued
- 2008
- PURL
- http://purl.flvc.org/FAU/186301
- Subject Headings
- Filters (Mathematics), Machine learning, Data mining, Technological innovations, Database management, Combinatorial group theory
- Format
- Document (PDF)
- Title
- Classification techniques for noisy and imbalanced data.
- Creator
- Napolitano, Amri E., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Machine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are...
Show moreMachine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are recorded erroneously, misleads a classifier and reduces predictive performance. Class imbalance occurs when one class represents only a small portion of the examples in a dataset, and, in such cases, classifiers often display poor accuracy on the minority class. The reduction in classification performance becomes even worse when the two issues occur simultaneously. To address the magnified difficulty caused by this interaction, this dissertation performs thorough empirical investigations of several techniques for dealing with class noise and imbalanced data. Comprehensive experiments are performed to assess the effects of the classification techniques on classifier performance, as well as how the level of class imbalance, level of class noise, and distribution of class noise among the classes affects results. An empirical analysis of classifier based noise detection efficiency appears first. Subsequently, an intelligent data sampling technique, based on noise detection, is proposed and tested. Several hybrid classifier ensemble techniques for addressing class noise and imbalance are introduced. Finally, a detailed empirical investigation of classification filtering is performed to determine best practices.
Show less - Date Issued
- 2009
- PURL
- http://purl.flvc.org/FAU/369201
- Subject Headings
- Combinatorial group theory, Data mining, Technological innovations, Decision trees, Machine learning, Filters (Mathematics)
- Format
- Document (PDF)
- Title
- Feature selection techniques and applications in bioinformatics.
- Creator
- Dittman, David, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Possibly the largest problem when working in bioinformatics is the large amount of data to sift through to find useful information. This thesis shows that the use of feature selection (a method of removing irrelevant and redundant information from the dataset) is a useful and even necessary technique to use in these large datasets. This thesis also presents a new method in comparing classes to each other through the use of their features. It also provides a thorough analysis of the use of...
Show morePossibly the largest problem when working in bioinformatics is the large amount of data to sift through to find useful information. This thesis shows that the use of feature selection (a method of removing irrelevant and redundant information from the dataset) is a useful and even necessary technique to use in these large datasets. This thesis also presents a new method in comparing classes to each other through the use of their features. It also provides a thorough analysis of the use of various feature selection techniques and classifier in different scenarios from bioinformatics. Overall, this thesis shows the importance of the use of feature selection in bioinformatics.
Show less - Date Issued
- 2011
- PURL
- http://purl.flvc.org/FAU/3175016
- Subject Headings
- Bioinformatifcs, Data mining, Technological innovations, Computational biology, Combinatorial group theory, Filters (Mathematics), Ranking and selection (Statistics)
- Format
- Document (PDF)
- Title
- The discrete logarithm problem in non-abelian groups.
- Creator
- Iliâc, Ivana., Charles E. Schmidt College of Science, Department of Mathematical Sciences
- Abstract/Description
-
This dissertation contains results of the candidate's research on the generalized discrete logarithm problem (GDLP) and its applications to cryptology, in non-abelian groups. The projective special linear groups PSL(2; p), where p is a prime, represented by matrices over the eld of order p, are investigated as potential candidates for implementation of the GDLP. Our results show that the GDLP with respect to specic pairs of PSL(2; p) generators is weak. In such cases the groups PSL(2; p) are...
Show moreThis dissertation contains results of the candidate's research on the generalized discrete logarithm problem (GDLP) and its applications to cryptology, in non-abelian groups. The projective special linear groups PSL(2; p), where p is a prime, represented by matrices over the eld of order p, are investigated as potential candidates for implementation of the GDLP. Our results show that the GDLP with respect to specic pairs of PSL(2; p) generators is weak. In such cases the groups PSL(2; p) are not good candidates for cryptographic applications which rely on the hardness of the GDLP. Results are presented on generalizing existing cryptographic primitives and protocols based on the hardness of the GDLP in non-abelian groups. A special instance of a cryptographic primitive dened over the groups SL(2; 2n), the Tillich-Zemor hash function, has been cryptanalyzed. In particular, an algorithm for constructing collisions of short length for any input parameter is presented. A series of mathematical results are developed to support the algorithm and to prove existence of short collisions.
Show less - Date Issued
- 2010
- PURL
- http://purl.flvc.org/FAU/3356783
- Subject Headings
- Data encryption (Computer science), Computer security, Cryptography, Combinatorial group theory, Data processing, Mapping (Mathematics)
- Format
- Document (PDF)
- Title
- Stability analysis of feature selection approaches with low quality data.
- Creator
- Altidor, Wilker., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
One of the greatest challenges to data mining is erroneous or noisy data. Several studies have noted the weak performance of classification models trained from low quality data. This dissertation shows that low quality data can also impact the effectiveness of feature selection, and considers the effect of class noise on various feature ranking techniques. It presents a novel approach to feature ranking based on ensemble learning and assesses these ensemble feature selection techniques in...
Show moreOne of the greatest challenges to data mining is erroneous or noisy data. Several studies have noted the weak performance of classification models trained from low quality data. This dissertation shows that low quality data can also impact the effectiveness of feature selection, and considers the effect of class noise on various feature ranking techniques. It presents a novel approach to feature ranking based on ensemble learning and assesses these ensemble feature selection techniques in terms of their robustness to class noise. It presents a noise-based stability analysis that measures the degree of agreement between a feature ranking techniques output on a clean dataset versus its outputs on the same dataset but corrupted with different combinations of noise level and noise distribution. It then considers classification performances from models built with a subset of the original features obtained after applying feature ranking techniques on noisy data. It proposes the focused ensemble feature ranking as a noise-tolerant approach to feature selection and compares focused ensembles with general ensembles in terms of the ability of the selected features to withstand the impact of class noise when used to build classification models. Finally, it explores three approaches for addressing the combined problem of high dimensionality and class imbalance. Collectively, this research shows the importance of considering class noise when performing feature selection.
Show less - Date Issued
- 2011
- PURL
- http://purl.flvc.org/FAU/3174501
- Subject Headings
- Data mining, Technological innovations, Combinatorial group theory, Filters (Mathematics), Ranking and selection (Statistics)
- Format
- Document (PDF)
- Title
- Evolutionary Methods for Mining Data with Class Imbalance.
- Creator
- Drown, Dennis J., Khoshgoftaar, Taghi M., Florida Atlantic University
- Abstract/Description
-
Class imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different...
Show moreClass imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different learners: artificial neural networks, 0 4.5 decision trees, and RIPPER. We also propose a novel technique called evolutionary sampling which works to remove noisy and unnecessary duplicate instances so that the sampled training data will produce a superior classifier for the imbalanced dataset. Our GA fitness function uses metrics appropriate for dealing with class imbalance, in particular the area under the ROC curve. We perform extensive empirical testing on these techniques and compare the results with seven exist ing sampling methods.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00012515
- Subject Headings
- Combinatorial group theory, Data mining, Machine learning, Data structure (Computer science)
- Format
- Document (PDF)