Current Search: Combinatorial group theory (x)
View All Items
 Title
 Techniques in Lattice Basis Reduction.
 Creator
 Khadka, Bal K., Magliveras, Spyros S., Florida Atlantic University, Charles E. Schmidt College of Science, Department of Mathematical Sciences
 Abstract/Description

The mathematical theory of nding a basis of shortest possible vectors in a given lattice L is known as reduction theory and goes back to the work of Lagrange, Gauss, Hermite, Korkin, Zolotarev, and Minkowski. Modern reduction theory is voluminous and includes the work of A. Lenstra, H. Lenstra and L. Lovasz who created the well known LLL algorithm, and many other researchers such as L. Babai and C. P. Schnorr who created signi cant new variants of basis reduction algorithms. The shortest...
Show moreThe mathematical theory of nding a basis of shortest possible vectors in a given lattice L is known as reduction theory and goes back to the work of Lagrange, Gauss, Hermite, Korkin, Zolotarev, and Minkowski. Modern reduction theory is voluminous and includes the work of A. Lenstra, H. Lenstra and L. Lovasz who created the well known LLL algorithm, and many other researchers such as L. Babai and C. P. Schnorr who created signi cant new variants of basis reduction algorithms. The shortest vector (SVP) and closest vector (CVP) problems, presently considered intractable, are algorithmic tasks that lie at the core of many number theoretic problems, integer programming, nding irreducible factors of polynomials, minimal polynomials of algebraic numbers, and simultaneous diophantine approximation. Lattice basis reduction also has deep and extensive connections with modern cryptography, and cryptanalysis particularly in the postquantum era. In this dissertation we study and compare current systems LLL and BKZ, and point out their strengths and drawbacks. In addition, we propose and investigate the e cacy of new optimization techniques, to be used along with LLL, such as hill climbing, random walks in groups, our lattice di usionsub lattice fusion, and multistage hybrid LDSFHC technique. The rst two methods rely on the sensitivity of LLL to permutations of the input basis B, and optimization ideas over the symmetric group Sm viewed as a metric space. The third technique relies on partitioning the lattice into sublattices, performing basis reduction in the partition sublattice blocks, fusing the sublattices, and repeating. We also point out places where parallel computation can reduce runtimes achieving almost linear speedup. The multistage hybrid technique relies on the lattice di usion and sublattice fusion and hill climbing algorithms. Unlike traditional methods, our approach brings in better results in terms of basis reduction towards nding shortest vectors and minimal weight bases. Using these techniques we have published the competitive lattice vectors of ideal lattice challenge on the lattice hall of fame. Toward the end of the dissertation we also discuss applications to the multidimensional knapsack problem that resulted in the discovery of new large sets of geometric designs still considered very rare. The research introduces innovative techniques in lattice basis reduction theory and provides some space for future researchers to contemplate lattices from a new viewpoint.
Show less  Date Issued
 2016
 PURL
 http://purl.flvc.org/fau/fd/FA00004678
 Subject Headings
 Cryptography., Combinatorial analysis., Group theory.
 Format
 Document (PDF)
 Title
 Computing automorphism groups of projective planes.
 Creator
 Adamski, Jesse Victor, Magliveras, Spyros S., Charles E. Schmidt College of Science, Department of Mathematical Sciences
 Abstract/Description

The main objective of this thesis was to find the full automorphism groups of finite Desarguesian planes. A set of homologies were used to generate the automorphism group when the order of the plane was prime. When the order was a prime power Pa,a ≠ 1 the Frobenius automorphism was added to the set of homologies, and then the full automorphism group was generated. The Frobenius automorphism was found by using the planar ternary ring derived from a coordinatization of the plane.
 Date Issued
 2013
 PURL
 http://purl.flvc.org/fau/fd/FA0004000
 Subject Headings
 Combinatorial group theory, Finite geometrics, Geometry, Projective
 Format
 Document (PDF)
 Title
 The existence of minimal logarithmic signatures for classical groups.
 Creator
 Singhi, Nikhil., Charles E. Schmidt College of Science, Department of Mathematical Sciences
 Abstract/Description

A logarithmic signature (LS) for a nite group G is an ordered tuple = [A1;A2; : : : ;An] of subsets Ai of G, such that every element g 2 G can be expressed uniquely as a product g = a1a2 : : : ; an, where ai 2 Ai. Logarithmic signatures were dened by Magliveras in the late 1970's for arbitrary nite groups in the context of cryptography. They were also studied for abelian groups by Hajos in the 1930's. The length of an LS is defined to be `() = Pn i=1 jAij. It can be easily seen that for a...
Show moreA logarithmic signature (LS) for a nite group G is an ordered tuple = [A1;A2; : : : ;An] of subsets Ai of G, such that every element g 2 G can be expressed uniquely as a product g = a1a2 : : : ; an, where ai 2 Ai. Logarithmic signatures were dened by Magliveras in the late 1970's for arbitrary nite groups in the context of cryptography. They were also studied for abelian groups by Hajos in the 1930's. The length of an LS is defined to be `() = Pn i=1 jAij. It can be easily seen that for a group G of order Qk j=1 pj mj , the length of any LS for G satises `() Pk j=1mjpj . An LS for which this lower bound is achieved is called a minimal logarithmic signature (MLS). The MLS conjecture states that every finite simple group has an MLS. If the conjecture is true then every finite group will have an MLS. The conjecture was shown to be true by a number of researchers for a few classes of finite simple groups. However, the problem is still wide open. This dissertation addresses the MLS conjecture for the classical simple groups. In particular, it is shown that MLS's exist for the symplectic groups Sp2n(q), the orthogonal groups O 2n(q0) and the corresponding simple groups PSp2n(q) and 2n(q0) for all n 2 N, prime power q and even prime power q0. The existence of an MLS is also shown for all unitary groups GUn(q) for all odd n and q = 2s under the assumption that an MLS exists for GUn 1(q). The methods used are very general and algorithmic in nature and may be useful for studying all nite simple groups of Lie type and possibly also the sporadic groups. The blocks of logarithmic signatures constructed in this dissertation have cyclic structure and provide a sort of cyclic decomposition for these classical groups.
Show less  Date Issued
 2011
 PURL
 http://purl.flvc.org/FAU/3172943
 Subject Headings
 Finite groups, Abelian groups, Number theory, Combinatorial group theory, Mathematical recreations, Linear algebraic groups, Lie groups
 Format
 Document (PDF)
 Title
 On the minimal logarithmic signature conjecture.
 Creator
 Singhi, Nidhi., Charles E. Schmidt College of Science, Department of Mathematical Sciences
 Abstract/Description

The minimal logarithmic signature conjecture states that in any finite simple group there are subsets Ai, 1 i s such that the size jAij of each Ai is a prime or 4 and each element of the group has a unique expression as a product Qs i=1 ai of elements ai 2 Ai. Logarithmic signatures have been used in the construction of several cryptographic primitives since the late 1970's [3, 15, 17, 19, 16]. The conjecture is shown to be true for various families of simple groups including cyclic groups,...
Show moreThe minimal logarithmic signature conjecture states that in any finite simple group there are subsets Ai, 1 i s such that the size jAij of each Ai is a prime or 4 and each element of the group has a unique expression as a product Qs i=1 ai of elements ai 2 Ai. Logarithmic signatures have been used in the construction of several cryptographic primitives since the late 1970's [3, 15, 17, 19, 16]. The conjecture is shown to be true for various families of simple groups including cyclic groups, An, PSLn(q) when gcd(n; q 1) is 1, 4 or a prime and several sporadic groups [10, 9, 12, 14, 18]. This dissertation is devoted to proving that the conjecture is true for a large class of simple groups of Lie type called classical groups. The methods developed use the structure of these groups as isometry groups of bilinear or quadratic forms. A large part of the construction is also based on the Bruhat and Levi decompositions of parabolic subgroups of these groups. In this dissertation the conjecture is shown to be true for the following families of simple groups: the projective special linear groups PSLn(q), the projective symplectic groups PSp2n(q) for all n and q a prime power, and the projective orthogonal groups of positive type + 2n(q) for all n and q an even prime power. During the process, the existence of minimal logarithmic signatures (MLS's) is also proven for the linear groups: GLn(q), PGLn(q), SLn(q), the symplectic groups: Sp2n(q) for all n and q a prime power, and for the orthogonal groups of plus type O+ 2n(q) for all n and q an even prime power. The constructions in most of these cases provide cyclic MLS's. Using the relationship between nite groups of Lie type and groups with a split BNpair, it is also shown that every nite group of Lie type can be expressed as a disjoint union of sets, each of which has an MLS.
Show less  Date Issued
 2011
 PURL
 http://purl.flvc.org/FAU/3172946
 Subject Headings
 Finite groups, Abelian groups, Number theory, Combinatorial group theory, Mathematical recreations, Linear algebraic groups, Lie groups
 Format
 Document (PDF)
 Title
 Alleviating class imbalance using data sampling: Examining the effects on classification algorithms.
 Creator
 Napolitano, Amri E., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
 Abstract/Description

Imbalanced class distributions typically cause poor classifier performance on the minority class, which also tends to be the class with the highest cost of misclassification. Data sampling is a common solution to this problem, and numerous sampling techniques have been proposed to address it. Prior research examining the performance of these techniques has been narrow and limited. This work uses thorough empirical experimentation to compare the performance of seven existing data sampling...
Show moreImbalanced class distributions typically cause poor classifier performance on the minority class, which also tends to be the class with the highest cost of misclassification. Data sampling is a common solution to this problem, and numerous sampling techniques have been proposed to address it. Prior research examining the performance of these techniques has been narrow and limited. This work uses thorough empirical experimentation to compare the performance of seven existing data sampling techniques using five different classifiers and four different datasets. The work addresses which sampling techniques produce the best performance in the presence of class unbalance, which classifiers are most robust to the problem, as well as which sampling techniques perform better or worse with each classifier. Extensive statistical analysis of these results is provided, in addition to an examination of the qualitative effects of the sampling techniques on the types of predictions made by the C4.5 classifier.
Show less  Date Issued
 2006
 PURL
 http://purl.flvc.org/fcla/dt/13413
 Subject Headings
 Combinatorial group theory, Data mining, Decision trees, Machine learning
 Format
 Document (PDF)
 Title
 New Geometric Large Sets.
 Creator
 Hurley, Michael Robert, Magliveras, Spyros S., Florida Atlantic University, Charles E. Schmidt College of Science, Department of Mathematical Sciences
 Abstract/Description

Let V be an ndimensional vector space over the field of q elements. By a geometric t[q^n, k, λ] design we mean a collection D of kdimensional subspaces of V, called blocks, such that every tdimensional subspace T of V appears in exactly λ blocks in D. A large set, LS [N] [t, k, q^n], of geometric designs is a collection on N disjoint t[q^n, k, λ] designs that partitions [V K], the collection of kdimensional subspaces of V. In this work we construct nonisomorphic large sets using...
Show moreLet V be an ndimensional vector space over the field of q elements. By a geometric t[q^n, k, λ] design we mean a collection D of kdimensional subspaces of V, called blocks, such that every tdimensional subspace T of V appears in exactly λ blocks in D. A large set, LS [N] [t, k, q^n], of geometric designs is a collection on N disjoint t[q^n, k, λ] designs that partitions [V K], the collection of kdimensional subspaces of V. In this work we construct nonisomorphic large sets using methods based on incidence structures known as the KramerMesner matrices. These structures are induced by particular group actions on the collection of subspaces of the vector space V. Subsequently, we discuss and use computational techniques for solving certain linear problems of the form AX = B, where A is a large integral matrix and X is a {0,1} solution. These techniques involve (i) lattice basisreduction, including variants of the LLL algorithm, and (ii) linear programming. Inspiration came from the 2013 work of Braun, Kohnert, Ostergard, and Wassermann, [17], who produced the first nontrivial large set of geometric designs with t ≥ 2. Bal Khadka and Michael Epstein provided the knowhow for using the LLL and linear programming algorithms that we implemented to construct the large sets.
Show less  Date Issued
 2016
 PURL
 http://purl.flvc.org/fau/fd/FA00004732, http://purl.flvc.org/fau/fd/FA00004732
 Subject Headings
 Group theory., Finite groups., Factorial experiment designs., Irregularities of distribution (Number theory), Combinatorial analysis.
 Format
 Document (PDF)
 Title
 Avoiding abelian squares in infinite partial words.
 Creator
 Severa, William., Harriet L. Wilkes Honors College
 Abstract/Description

Famous mathematician Paul Erdèos conjectured the existence of infinite sequences of symbols where no two adjacent subsequences are permutations of one another. It can easily be checked that no such sequence can be constructed using only three symbols, but as few as four symbols are sufficient. Here, we expand this concept to include sequences that may contain 'do not know'' characters, called holes. These holes make the undesired subsequences more common. We explore both finite and infinite...
Show moreFamous mathematician Paul Erdèos conjectured the existence of infinite sequences of symbols where no two adjacent subsequences are permutations of one another. It can easily be checked that no such sequence can be constructed using only three symbols, but as few as four symbols are sufficient. Here, we expand this concept to include sequences that may contain 'do not know'' characters, called holes. These holes make the undesired subsequences more common. We explore both finite and infinite sequences. For infinite sequences, we use iterating morphisms to construct the nonrepetitive sequences with either a finite number of holes or infinitely many holes. We also discuss the problem of using the minimum number of different symbols.
Show less  Date Issued
 2010
 PURL
 http://purl.flvc.org/FAU/3335460
 Subject Headings
 Abelian groups, Mathematics, Study and teaching (Higher), Combinatorial analysis, Combinatorial set theory, Probabilities
 Format
 Document (PDF)
 Title
 New Results in Group Theoretic Cryptology.
 Creator
 Sramka, Michal, Florida Atlantic University, Magliveras, Spyros S., Charles E. Schmidt College of Science, Department of Mathematical Sciences
 Abstract/Description

With the publication of Shor's quantum algorithm for solving discrete logarithms in finite cyclic groups, a need for new cryptographic primitives arose; namely, for more secure primitives that would prevail in the postquantum era. The aim of this dissertation is to exploit some hard problems arising from group theory for use in cryptography. Over the years, there have been many such proposals. We first look at two recently proposed schemes based on some form of a generalization of the...
Show moreWith the publication of Shor's quantum algorithm for solving discrete logarithms in finite cyclic groups, a need for new cryptographic primitives arose; namely, for more secure primitives that would prevail in the postquantum era. The aim of this dissertation is to exploit some hard problems arising from group theory for use in cryptography. Over the years, there have been many such proposals. We first look at two recently proposed schemes based on some form of a generalization of the discrete logari thm problem (DLP), identify their weaknesses, and cryptanalyze them. By applying the exper tise gained from the above cryptanalyses, we define our own generalization of the DLP to arbitrary finite groups. We show that such a definition leads to the design of signature schemes and pseudorandom number generators with provable security under a security assumption based on a group theoretic problem. In particular, our security assumption is based on the hardness of factorizing elements of the projective special linear group over a finite field in some representations. We construct a oneway function based on this group theoretic assumption and provide a security proof.
Show less  Date Issued
 2006
 PURL
 http://purl.flvc.org/fau/fd/FA00000878
 Subject Headings
 Group theory, Mathematical statistics, Cryptography, Combinatorial designs and configurations, Data encryption (Computer science), Coding theory
 Format
 Document (PDF)
 Title
 Low rank transitive representations, primitive extensions, and the collision problem in PSL (2, q).
 Creator
 Thapa Magar, Krishna B., Magliveras, Spyros S., Florida Atlantic University, Charles E. Schmidt College of Science, Department of Mathematical Sciences
 Abstract/Description

Every transitive permutation representation of a finite group is the representation of the group in its action on the cosets of a particular subgroup of the group. The group has a certain rank for each of these representations. We first find almost all rank3 and rank4 transitive representations of the projective special linear group P SL(2, q) where q = pm and p is an odd prime. We also determine the rank of P SL (2, p) in terms of p on the cosets of particular given subgroups. We then...
Show moreEvery transitive permutation representation of a finite group is the representation of the group in its action on the cosets of a particular subgroup of the group. The group has a certain rank for each of these representations. We first find almost all rank3 and rank4 transitive representations of the projective special linear group P SL(2, q) where q = pm and p is an odd prime. We also determine the rank of P SL (2, p) in terms of p on the cosets of particular given subgroups. We then investigate the construction of rank3 transitive and primitive extensions of a simple group, such that the extension group formed is also simple. In the latter context we present a new, group theoretic construction of the famous HoffmanSingleton graph as a rank3 graph.
Show less  Date Issued
 2015
 PURL
 http://purl.flvc.org/fau/fd/FA00004471, http://purl.flvc.org/fau/fd/FA00004471
 Subject Headings
 Combinatorial designs and configurations, Cryptography, Data encryption (Computer science), Finite geometries, Finite groups, Group theory, Permutation groups
 Format
 Document (PDF)
 Title
 Collabortive filtering using machine learning and statistical techniques.
 Creator
 Su, Xiaoyuan., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
 Abstract/Description

Collaborative filtering (CF), a very successful recommender system, is one of the applications of data mining for incomplete data. The main objective of CF is to make accurate recommendations from highly sparse user rating data. My contributions to this research topic include proposing the frameworks of imputationboosted collaborative filtering (IBCF) and imputed neighborhood based collaborative filtering (INCF). We also proposed a modelbased CF technique, TANELR CF, and two hybrid CF...
Show moreCollaborative filtering (CF), a very successful recommender system, is one of the applications of data mining for incomplete data. The main objective of CF is to make accurate recommendations from highly sparse user rating data. My contributions to this research topic include proposing the frameworks of imputationboosted collaborative filtering (IBCF) and imputed neighborhood based collaborative filtering (INCF). We also proposed a modelbased CF technique, TANELR CF, and two hybrid CF algorithms, sequential mixture CF and joint mixture CF. Empirical results show that our proposed CF algorithms have very good predictive performances. In the investigation of applying imputation techniques in mining incomplete data, we proposed imputationhelped classifiers, and VCI predictors (voting on classifications from imputed learning sets), both of which resulted in significant improvement in classification performance for incomplete data over conventional machine learned classifiers, including kNN, neural network, one rule, decision table, SVM, logistic regression, decision tree (C4.5), random forest, and decision list (PART), and the well known Bagging predictors. The main imputation techniques involved in these algorithms include EM (expectation maximization) and BMI (Bayesian multiple imputation).
Show less  Date Issued
 2008
 PURL
 http://purl.flvc.org/FAU/186301
 Subject Headings
 Filters (Mathematics), Machine learning, Data mining, Technological innovations, Database management, Combinatorial group theory
 Format
 Document (PDF)
 Title
 Classification techniques for noisy and imbalanced data.
 Creator
 Napolitano, Amri E., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
 Abstract/Description

Machine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are...
Show moreMachine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are recorded erroneously, misleads a classifier and reduces predictive performance. Class imbalance occurs when one class represents only a small portion of the examples in a dataset, and, in such cases, classifiers often display poor accuracy on the minority class. The reduction in classification performance becomes even worse when the two issues occur simultaneously. To address the magnified difficulty caused by this interaction, this dissertation performs thorough empirical investigations of several techniques for dealing with class noise and imbalanced data. Comprehensive experiments are performed to assess the effects of the classification techniques on classifier performance, as well as how the level of class imbalance, level of class noise, and distribution of class noise among the classes affects results. An empirical analysis of classifier based noise detection efficiency appears first. Subsequently, an intelligent data sampling technique, based on noise detection, is proposed and tested. Several hybrid classifier ensemble techniques for addressing class noise and imbalance are introduced. Finally, a detailed empirical investigation of classification filtering is performed to determine best practices.
Show less  Date Issued
 2009
 PURL
 http://purl.flvc.org/FAU/369201
 Subject Headings
 Combinatorial group theory, Data mining, Technological innovations, Decision trees, Machine learning, Filters (Mathematics)
 Format
 Document (PDF)
 Title
 Feature selection techniques and applications in bioinformatics.
 Creator
 Dittman, David, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
 Abstract/Description

Possibly the largest problem when working in bioinformatics is the large amount of data to sift through to find useful information. This thesis shows that the use of feature selection (a method of removing irrelevant and redundant information from the dataset) is a useful and even necessary technique to use in these large datasets. This thesis also presents a new method in comparing classes to each other through the use of their features. It also provides a thorough analysis of the use of...
Show morePossibly the largest problem when working in bioinformatics is the large amount of data to sift through to find useful information. This thesis shows that the use of feature selection (a method of removing irrelevant and redundant information from the dataset) is a useful and even necessary technique to use in these large datasets. This thesis also presents a new method in comparing classes to each other through the use of their features. It also provides a thorough analysis of the use of various feature selection techniques and classifier in different scenarios from bioinformatics. Overall, this thesis shows the importance of the use of feature selection in bioinformatics.
Show less  Date Issued
 2011
 PURL
 http://purl.flvc.org/FAU/3175016
 Subject Headings
 Bioinformatifcs, Data mining, Technological innovations, Computational biology, Combinatorial group theory, Filters (Mathematics), Ranking and selection (Statistics)
 Format
 Document (PDF)
 Title
 The discrete logarithm problem in nonabelian groups.
 Creator
 Iliâc, Ivana., Charles E. Schmidt College of Science, Department of Mathematical Sciences
 Abstract/Description

This dissertation contains results of the candidate's research on the generalized discrete logarithm problem (GDLP) and its applications to cryptology, in nonabelian groups. The projective special linear groups PSL(2; p), where p is a prime, represented by matrices over the eld of order p, are investigated as potential candidates for implementation of the GDLP. Our results show that the GDLP with respect to specic pairs of PSL(2; p) generators is weak. In such cases the groups PSL(2; p) are...
Show moreThis dissertation contains results of the candidate's research on the generalized discrete logarithm problem (GDLP) and its applications to cryptology, in nonabelian groups. The projective special linear groups PSL(2; p), where p is a prime, represented by matrices over the eld of order p, are investigated as potential candidates for implementation of the GDLP. Our results show that the GDLP with respect to specic pairs of PSL(2; p) generators is weak. In such cases the groups PSL(2; p) are not good candidates for cryptographic applications which rely on the hardness of the GDLP. Results are presented on generalizing existing cryptographic primitives and protocols based on the hardness of the GDLP in nonabelian groups. A special instance of a cryptographic primitive dened over the groups SL(2; 2n), the TillichZemor hash function, has been cryptanalyzed. In particular, an algorithm for constructing collisions of short length for any input parameter is presented. A series of mathematical results are developed to support the algorithm and to prove existence of short collisions.
Show less  Date Issued
 2010
 PURL
 http://purl.flvc.org/FAU/3356783
 Subject Headings
 Data encryption (Computer science), Computer security, Cryptography, Combinatorial group theory, Data processing, Mapping (Mathematics)
 Format
 Document (PDF)
 Title
 Stability analysis of feature selection approaches with low quality data.
 Creator
 Altidor, Wilker., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
 Abstract/Description

One of the greatest challenges to data mining is erroneous or noisy data. Several studies have noted the weak performance of classification models trained from low quality data. This dissertation shows that low quality data can also impact the effectiveness of feature selection, and considers the effect of class noise on various feature ranking techniques. It presents a novel approach to feature ranking based on ensemble learning and assesses these ensemble feature selection techniques in...
Show moreOne of the greatest challenges to data mining is erroneous or noisy data. Several studies have noted the weak performance of classification models trained from low quality data. This dissertation shows that low quality data can also impact the effectiveness of feature selection, and considers the effect of class noise on various feature ranking techniques. It presents a novel approach to feature ranking based on ensemble learning and assesses these ensemble feature selection techniques in terms of their robustness to class noise. It presents a noisebased stability analysis that measures the degree of agreement between a feature ranking techniques output on a clean dataset versus its outputs on the same dataset but corrupted with different combinations of noise level and noise distribution. It then considers classification performances from models built with a subset of the original features obtained after applying feature ranking techniques on noisy data. It proposes the focused ensemble feature ranking as a noisetolerant approach to feature selection and compares focused ensembles with general ensembles in terms of the ability of the selected features to withstand the impact of class noise when used to build classification models. Finally, it explores three approaches for addressing the combined problem of high dimensionality and class imbalance. Collectively, this research shows the importance of considering class noise when performing feature selection.
Show less  Date Issued
 2011
 PURL
 http://purl.flvc.org/FAU/3174501
 Subject Headings
 Data mining, Technological innovations, Combinatorial group theory, Filters (Mathematics), Ranking and selection (Statistics)
 Format
 Document (PDF)
 Title
 Evolutionary Methods for Mining Data with Class Imbalance.
 Creator
 Drown, Dennis J., Khoshgoftaar, Taghi M., Florida Atlantic University
 Abstract/Description

Class imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different...
Show moreClass imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different learners: artificial neural networks, 0 4.5 decision trees, and RIPPER. We also propose a novel technique called evolutionary sampling which works to remove noisy and unnecessary duplicate instances so that the sampled training data will produce a superior classifier for the imbalanced dataset. Our GA fitness function uses metrics appropriate for dealing with class imbalance, in particular the area under the ROC curve. We perform extensive empirical testing on these techniques and compare the results with seven exist ing sampling methods.
Show less  Date Issued
 2007
 PURL
 http://purl.flvc.org/fau/fd/FA00012515
 Subject Headings
 Combinatorial group theory, Data mining, Machine learning, Data structure (Computer science)
 Format
 Document (PDF)