You are here
MACHINE LEARNING ALGORITHMS FOR THE DETECTION AND ANALYSIS OF WEB ATTACKS
- Date Issued:
- 2021
- Abstract/Description:
- The Internet has provided humanity with many great benefits, but it has also introduced new risks and dangers. E-commerce and other web portals have become large industries with big data. Criminals and other bad actors constantly seek to exploit these web properties through web attacks. Being able to properly detect these web attacks is a crucial component in the overall cybersecurity landscape. Machine learning is one tool that can assist in detecting web attacks. However, properly using machine learning to detect web attacks does not come without its challenges. Classification algorithms can have difficulty with severe levels of class imbalance. Class imbalance occurs when one class label disproportionately outnumbers another class label. For example, in cybersecurity, it is common for the negative (normal) label to severely outnumber the positive (attack) label. Another difficulty encountered in machine learning is models can be complex, thus making it difficult for even subject matter experts to truly understand a model’s detection process. Moreover, it is important for practitioners to determine which input features to include or exclude in their models for optimal detection performance. This dissertation studies machine learning algorithms in detecting web attacks with big data. Severe class imbalance is a common problem in cybersecurity, and mainstream machine learning research does not sufficiently consider this with web attacks. Our research first investigates the problems associated with severe class imbalance and rarity. Rarity is an extreme form of class imbalance where the positive class suffers extremely low positive class count, thus making it difficult for the classifiers to discriminate. In reducing imbalance, we demonstrate random undersampling can effectively mitigate the class imbalance and rarity problems associated with web attacks. Furthermore, our research introduces a novel feature popularity technique which produces easier to understand models by only including the fewer, most popular features. Feature popularity granted us new insights into the web attack detection process, even though we had already intensely studied it. Even so, we proceed cautiously in selecting the best input features, as we determined that the “most important” Destination Port feature might be contaminated by lopsided traffic distributions.
Title: | MACHINE LEARNING ALGORITHMS FOR THE DETECTION AND ANALYSIS OF WEB ATTACKS. |
102 views
74 downloads |
---|---|---|
Name(s): |
Zuech, Richard , author Khoshgoftaar, Taghi M. , Thesis advisor Florida Atlantic University, Degree grantor Department of Computer and Electrical Engineering and Computer Science College of Engineering and Computer Science |
|
Type of Resource: | text | |
Genre: | Electronic Thesis Or Dissertation | |
Date Created: | 2021 | |
Date Issued: | 2021 | |
Publisher: | Florida Atlantic University | |
Place of Publication: | Boca Raton, Fla. | |
Physical Form: | application/pdf | |
Extent: | 164 p. | |
Language(s): | English | |
Abstract/Description: | The Internet has provided humanity with many great benefits, but it has also introduced new risks and dangers. E-commerce and other web portals have become large industries with big data. Criminals and other bad actors constantly seek to exploit these web properties through web attacks. Being able to properly detect these web attacks is a crucial component in the overall cybersecurity landscape. Machine learning is one tool that can assist in detecting web attacks. However, properly using machine learning to detect web attacks does not come without its challenges. Classification algorithms can have difficulty with severe levels of class imbalance. Class imbalance occurs when one class label disproportionately outnumbers another class label. For example, in cybersecurity, it is common for the negative (normal) label to severely outnumber the positive (attack) label. Another difficulty encountered in machine learning is models can be complex, thus making it difficult for even subject matter experts to truly understand a model’s detection process. Moreover, it is important for practitioners to determine which input features to include or exclude in their models for optimal detection performance. This dissertation studies machine learning algorithms in detecting web attacks with big data. Severe class imbalance is a common problem in cybersecurity, and mainstream machine learning research does not sufficiently consider this with web attacks. Our research first investigates the problems associated with severe class imbalance and rarity. Rarity is an extreme form of class imbalance where the positive class suffers extremely low positive class count, thus making it difficult for the classifiers to discriminate. In reducing imbalance, we demonstrate random undersampling can effectively mitigate the class imbalance and rarity problems associated with web attacks. Furthermore, our research introduces a novel feature popularity technique which produces easier to understand models by only including the fewer, most popular features. Feature popularity granted us new insights into the web attack detection process, even though we had already intensely studied it. Even so, we proceed cautiously in selecting the best input features, as we determined that the “most important” Destination Port feature might be contaminated by lopsided traffic distributions. | |
Identifier: | FA00013823 (IID) | |
Degree granted: | Dissertation (Ph.D.)--Florida Atlantic University, 2021. | |
Collection: | FAU Electronic Theses and Dissertations Collection | |
Note(s): | Includes bibliography. | |
Subject(s): |
Machine learning Computer security Algorithms Cybersecurity |
|
Persistent Link to This Record: | http://purl.flvc.org/fau/fd/FA00013823 | |
Use and Reproduction: | Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. | |
Use and Reproduction: | http://rightsstatements.org/vocab/InC/1.0/ | |
Host Institution: | FAU | |
Is Part of Series: | Florida Atlantic University Digital Library Collections. |