AI and Machine Learning in Phishing Detection

By: Nicko Cajes; Northern Bukidnon State College, Philippines

Abstract

Phishing attacks which deceive a victim into disclosing their important information have been one of the crucial cyber threats that needs to be solved. Conventional security mechanisms have found it difficult to cope up and detect novel smishing techniques, leading to the dependence of AI and ML techniques to enhance the detection in cyber security. This article will explore the effectiveness of using AI and ML techniques, specifically the ensemble methods like bagging, stacking, and boosting to improve the accuracy of the prediction model. By combining multiple classifying techniques, it achieves a high reliability and adaptability. However, even with its effectiveness, some challenges like the dataset issues need to be addressed to have reliable model performance.

Introduction

For many years, phishing email attacks became a common method used by cybercriminals [1]. Customers and web-based organizations are at serious risk from phishing, a type of fraud in which victims are tricked into disclosing important sensitive data [2]. Phishing is one of the top concerning problems in in digital era that is evolving all the time. Since online activity has increased, phishing has emerged as a novel form of identity theft. This kind of cyberattack focuses on technological weaknesses and human weaknesses to compromise security and steal confidential data [3], an example of phishing and legitimate email are shown in figure 1. There are a lot of things done such as the implementation of security mechanisms, however it is still not enough since the attack mechanisms of phishing criminals are constantly evolving while traditional security systems focus on rule-based approach, making them incapable of detecting new types of phishing attacks. The remarkable advantages of AI and ML models, such as their ability to lower modeling variability, improve predictions, increase accuracy in predicting, and exhibit great generality potential have drawn attention recently in relation to the rising problem of phishing attacks [4, 5]. This article will highlight the effectiveness of using ensemble methods in improving the accuracy of phishing detection models.

Figure 1: Example of Legitimate (https://www.thegoodgeeks.com) and Phishing Emails (https://global-uploads.webflow.com)

Ensemble Methods for Phishing Detection

An important breakthrough in phishing detection was made with the advent of ensemble approaches [6]. The use of EL entails combining several machine learning methods and has demonstrated notable progress in a number of research areas, such as identifying fraud and phishing website detection [7]. When compared to ordinary machine learning models, ensemble learning that integrates many algorithms typically has a better potential for prediction [1]. The process of combining multiple separate models to create a more reliable and accurate prediction model than any one model alone helps this technique to establish its legacy and have provided an enhanced accuracy, durability, and adaptation to new data are frequently achieved by combining predictions or choices from several models [8]. In an ensemble technique the reason for an increased detection accuracy is that these approaches utilize a variety of variables, including the number of words, the existence of keywords or phrases, as well as email size [1]. The fundamental premise of ensemble learning is that the combination of different models makes up for the shortcomings of each model, leading to better efficiency than any single model [8]. A comparison of single and ensemble models is shown in table 1.

Characteristics	Single Learning	Ensemble Learning
Model Quantity	One Classifier	Multiple Classifier
Prediction Method	Direct Prediction	Combined Prediction
Efficiency	Depends on Single Model	Depends on Multiple Models
Utilized Techniques	Decision Trees, SVM, Naïve Bayes, etc.	Bagging, Boosting, Stacking
Output	Based on Single Classifier	Integration of Multiple Classifiers

Table 1: Brief Comparison of Single and Ensemble Learning Approach

Popular Ensemble Techniques

Due to its wide use and effectiveness, a lot of researchers have already utilized this technique and have found the three most popular ensemble methods, which are the bagging, boosting, and stacking. The workflow of the ensemble method is shown in Figure 2.

Bagging: Bagging is technique uses different bootstrap samples from an initial dataset to train several baseline classification algorithms, usually decision trees, predictions done from these major classifiers are combined to figure out the final classification, which helps to improve generalization and reduces overfitting by introducing variation within the foundational models [9]. The key concept is to apply an average or vote on predictions with the objective to reduce variance and increase predictability. Several different instances of a particular model are trained utilizing different parts of the data used for training [10].

Boosting: Boosting is a continuous method where every new model aims to reduce the mistakes of the previous models [11]. It focuses on the errors of its previous model and creates a series of substandard models soon after another. By giving incorrectly categorized examples more weight, bias is decreased, and overall accuracy of models is increased [10]. Under the gradient-based boosting method, boosting is viewed as an issue of optimization that reduces the classification algorithm’s loss function through the addition of a weaker learner at once [12].

Stacking: One particularly effective ensemble learning technique is stacking, which combines the predictions made by multiple separate models to produce an overall prediction which is more precise and reliable [8]. integrates many models by sending the results to a “meta-learner,” which is frequently a classifier or linear regressor it will then strengthens the advantages of each model and makes up for its shortcomings, creating a more reliable model as a whole, seeking to combine the perspectives of multiple models in order to identify a greater variety of trends in the data [4, 10].

Figure 2: Ensemble Learning Workflow

Challenges and Consideration

The effectiveness of ensemble learning specifically on successfully identifying phishing emails and legitimate ones is a huge boost in the cyber security field. However, in the backend part of the development, certain challenges are present and making a huge contribution in the performance of the ensemble model. There are insufficient evaluation criteria and dataset related issues, like a lack of data from study sources, these shortcomings make the existing works less useful in real-life situations [1]. In order to make up for these challenges, [8] suggested a solution in their study which uses a Decision Tree-Recursive Feature Elimination (DT-RFECV) wrapper method in conjunction with SMOTE oversampling to address the issues of sample disparity and feature selection[13]. DT-RFECV uses cross-validation to avoid overfitting and determines the significance of features. Because of the dataset’s huge impact on the model, particularly in the training and testing phase, it is very important to handle this problem carefully, so that the efficiency and accuracy of the model is reliable and will not have a problem soon.

Conclusion

Combining multiple classification models to effectively detect legitimate and fraudulent email has enhanced the reliability of the security systems against phishing attacks. These techniques are called ensemble methods, which was termed boosting stacking, and bagging. It addressed the weakness of their distinct characteristics and offered an excellent result. Dataset imbalance is one of the challenges that hugely affect the performance of the ensemble model and by handling these issues carefully the effectiveness of ensemble models can be utilized to its full extent and will help in combating this ever-evolving cyber threat of phishing.

Reference

Salah, Z., Owida, H. A., Elsoud, E. A., Alhenawi, E., Abuowaida, S., & Alshdaifat, N. (2024). An effective ensemble approach for preventing and detecting phishing attacks in textual form. Future Internet, 16(11), 414. https://doi.org/10.3390/fi16110414
Innab, N., Osman, A. A. F., Ataelfadiel, M. A. M., Abu-Zanona, M., Elzaghmouri, B. M., Zawaideh, F. H., & Alawneh, M. F. (2024). Phishing Attacks Detection Using Ensemble Machine Learning Algorithms. Computers, Materials & Continua, 80(1).
Jawad, S. K., & Alnajjar, S. H. (2024, May). Enhancing Phishing Detection Through Ensemble Learning and Cross-Validation. In 2024 International Conference on Smart Applications, Communications and Networking (SmartNets) (pp. 1-7). IEEE.
Omolara, A. E., & Alawida, M. (2025). DaE2: Unmasking malicious URLs by leveraging diverse and efficient ensemble machine learning for online security. Computers & Security, 148, 104170.
Rahaman, M., Pappachan, P., Orozco, S. M., Bansal, S., & Arya, V. (2024). AI Safety and Security. In Challenges in Large Language Model Development and AI Ethics (pp. 354-383). IGI Global.
Ahmadi, C., & Chen, J. L. (2024, June). Enhancing Phishing Detection: A Multi-Layer Ensemble Approach Integrating Machine Learning for Robust Cybersecurity. In 2024 IEEE Symposium on Computers and Communications (ISCC) (pp. 1-6). IEEE.
Opara, C.; Chen, Y.;Wei, B. Look before You leap: Detecting phishing web pages by exploiting raw URL And HTML characteristics. Expert Syst. Appl. 2024, 236, 121183. [CrossRef]
Ahmadi, C., & Chen, J. L. (2024, June). Enhancing Phishing Detection: A Multi-Layer Ensemble Approach Integrating Machine Learning for Robust Cybersecurity. In 2024 IEEE Symposium on Computers and Communications (ISCC) (pp. 1-6). IEEE.
Ramaiah, M., Chandrasekaran, V., Chand, V., Vasudevan, A., & Ibrahim, S. (2024). Enhanced Phishing Detection: An Ensemble Stacking Model with DT-RFECV and SMOTE. Appl. Math, 18(6), 1481-1493.
I. D. Mienye and Y. Sun, “A survey of ensemble learning: Concepts, algorithms, applications, and prospects,” IEEE Access, vol. 10, pp. 99129–99149, 2022.
Sankaranarayanan, S., Sivachandran, A. T., Mohd Khairuddin, A. S., Hasikin, K., & Wahab Sait, A. R. (2024). An ensemble classification method based on machine learning models for malicious Uniform Resource Locators (URL). Plos one, 19(5), e0302196.
Fatima, R., Fareed, M. M. S., Ullah, S., Ahmad, G., & Mahmood, S. (2024). An Optimized Approach for Detection and Classification of Spam Email’s Using Ensemble Methods. Wireless Personal Communications, 1-27.
Vajrobol, V., Saxena, G. J., Pundir, A., Singh, S., B. Gupta, B., Gaurav, A., & Rahaman, M. (2024). Identify spoofing attacks in Internet of Things (IoT) environments using machine learning algorithms. Journal of High Speed Networks, 09266801241295886.
Lv, L., Wu, Z., Zhang, L., Gupta, B. B., & Tian, Z. (2022). An edge-AI based forecasting approach for improving smart microgrid efficiency. IEEE Transactions on Industrial Informatics, 18(11), 7946-7954.
Rahaman M. (2025) The Anatomy of a Smishing Attack: Common Techniques and Tactics Used by Cybercriminals, Insights2Techinfo, pp.1

Cite As

Cajes N. (2025) AI and Machine Learning in Phishing Detection: Using Ensemble Methods for Improved Accuracy, Insights2Techinfo, pp.1

818600cookie-checkAI and Machine Learning in Phishing Detection: Using Ensemble Methods for Improved Accuracy

Post Views: 96

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

AI and Machine Learning in Phishing Detection: Using Ensemble Methods for Improved Accuracy

Abstract

Introduction

Ensemble Methods for Phishing Detection

Popular Ensemble Techniques

Challenges and Consideration

Conclusion

Reference

Cite As

Leave a Reply Cancel reply

Detecting and Preventing Phishing Attacks in IoT-Based Smart Healthcare Systems

Data-Driven Insights into Rare Disease Diagnosis and Treatment with AI

Genetic Algorithms and Data Analytics for Cybersecurity in Phishing and Blockchain Systems

Machine Learning in Biometric Security Systems

The Role of AI and Machine Learning in Cloud Storage

How AI is Revolutionizing Cyber Forensics

DDoS Protection Strategies : How to Safeguard Your Network against Massive Attacks

Real time DDoS Mitigation Using FlowGuard and Entropy Analysis

Adaptive Defense Mechanism : The Role of Machine learning in countering DDoS

Blockchain Enabled Distributed System for Securing Network Against DDoS Attacks Current Trends

Artificial Intelligence-Based Approach for Proactive Defense Against DDoS Attacks