By: Nicko Cajes; Northern Bukidnon State College, Philippines
Abstract
Phishing attacks which deceive a victim into disclosing their important information have been one of the crucial cyber threats that needs to be solved. Conventional security mechanisms have found it difficult to cope up and detect novel smishing techniques, leading to the dependence of AI and ML techniques to enhance the detection in cyber security. This article will explore the effectiveness of using AI and ML techniques, specifically the ensemble methods like bagging, stacking, and boosting to improve the accuracy of the prediction model. By combining multiple classifying techniques, it achieves a high reliability and adaptability. However, even with its effectiveness, some challenges like the dataset issues need to be addressed to have reliable model performance.
Introduction
For many years, phishing email attacks became a common method used by cybercriminals [1]. Customers and web-based organizations are at serious risk from phishing, a type of fraud in which victims are tricked into disclosing important sensitive data [2]. Phishing is one of the top concerning problems in in digital era that is evolving all the time. Since online activity has increased, phishing has emerged as a novel form of identity theft. This kind of cyberattack focuses on technological weaknesses and human weaknesses to compromise security and steal confidential data [3], an example of phishing and legitimate email are shown in figure 1. There are a lot of things done such as the implementation of security mechanisms, however it is still not enough since the attack mechanisms of phishing criminals are constantly evolving while traditional security systems focus on rule-based approach, making them incapable of detecting new types of phishing attacks. The remarkable advantages of AI and ML models, such as their ability to lower modeling variability, improve predictions, increase accuracy in predicting, and exhibit great generality potential have drawn attention recently in relation to the rising problem of phishing attacks [4, 5]. This article will highlight the effectiveness of using ensemble methods in improving the accuracy of phishing detection models.
Figure 1: Example of Legitimate (https://www.thegoodgeeks.com) and Phishing Emails (https://global-uploads.webflow.com)
Ensemble Methods for Phishing Detection
An important breakthrough in phishing detection was made with the advent of ensemble approaches [6]. The use of EL entails combining several machine learning methods and has demonstrated notable progress in a number of research areas, such as identifying fraud and phishing website detection [7]. When compared to ordinary machine learning models, ensemble learning that integrates many algorithms typically has a better potential for prediction [1]. The process of combining multiple separate models to create a more reliable and accurate prediction model than any one model alone helps this technique to establish its legacy and have provided an enhanced accuracy, durability, and adaptation to new data are frequently achieved by combining predictions or choices from several models [8]. In an ensemble technique the reason for an increased detection accuracy is that these approaches utilize a variety of variables, including the number of words, the existence of keywords or phrases, as well as email size [1]. The fundamental premise of ensemble learning is that the combination of different models makes up for the shortcomings of each model, leading to better efficiency than any single model [8]. A comparison of single and ensemble models is shown in table 1.
Characteristics | Single Learning | Ensemble Learning |
---|---|---|
Model Quantity | One Classifier | Multiple Classifier |
Prediction Method | Direct Prediction | Combined Prediction |
Efficiency | Depends on Single Model | Depends on Multiple Models |
Utilized Techniques | Decision Trees, SVM, Naïve Bayes, etc. | Bagging, Boosting, Stacking |
Output | Based on Single Classifier | Integration of Multiple Classifiers |
Table 1: Brief Comparison of Single and Ensemble Learning Approach
Popular Ensemble Techniques
Due to its wide use and effectiveness, a lot of researchers have already utilized this technique and have found the three most popular ensemble methods, which are the bagging, boosting, and stacking. The workflow of the ensemble method is shown in Figure 2.
Bagging: Bagging is technique uses different bootstrap samples from an initial dataset to train several baseline classification algorithms, usually decision trees, predictions done from these major classifiers are combined to figure out the final classification, which helps to improve generalization and reduces overfitting by introducing variation within the foundational models [9]. The key concept is to apply an average or vote on predictions with the objective to reduce variance and increase predictability. Several different instances of a particular model are trained utilizing different parts of the data used for training [10].
Boosting: Boosting is a continuous method where every new model aims to reduce the mistakes of the previous models [11]. It focuses on the errors of its previous model and creates a series of substandard models soon after another. By giving incorrectly categorized examples more weight, bias is decreased, and overall accuracy of models is increased [10]. Under the gradient-based boosting method, boosting is viewed as an issue of optimization that reduces the classification algorithm’s loss function through the addition of a weaker learner at once [12].
Stacking: One particularly effective ensemble learning technique is stacking, which combines the predictions made by multiple separate models to produce an overall prediction which is more precise and reliable [8]. integrates many models by sending the results to a “meta-learner,” which is frequently a classifier or linear regressor it will then strengthens the advantages of each model and makes up for its shortcomings, creating a more reliable model as a whole, seeking to combine the perspectives of multiple models in order to identify a greater variety of trends in the data [4, 10].

Figure 2: Ensemble Learning Workflow
Challenges and Consideration
The effectiveness of ensemble learning specifically on successfully identifying phishing emails and legitimate ones is a huge boost in the cyber security field. However, in the backend part of the development, certain challenges are present and making a huge contribution in the performance of the ensemble model. There are insufficient evaluation criteria and dataset related issues, like a lack of data from study sources, these shortcomings make the existing works less useful in real-life situations [1]. In order to make up for these challenges, [8] suggested a solution in their study which uses a Decision Tree-Recursive Feature Elimination (DT-RFECV) wrapper method in conjunction with SMOTE oversampling to address the issues of sample disparity and feature selection[13]. DT-RFECV uses cross-validation to avoid overfitting and determines the significance of features. Because of the dataset’s huge impact on the model, particularly in the training and testing phase, it is very important to handle this problem carefully, so that the efficiency and accuracy of the model is reliable and will not have a problem soon.
Conclusion
Combining multiple classification models to effectively detect legitimate and fraudulent email has enhanced the reliability of the security systems against phishing attacks. These techniques are called ensemble methods, which was termed boosting stacking, and bagging. It addressed the weakness of their distinct characteristics and offered an excellent result. Dataset imbalance is one of the challenges that hugely affect the performance of the ensemble model and by handling these issues carefully the effectiveness of ensemble models can be utilized to its full extent and will help in combating this ever-evolving cyber threat of phishing.
Reference
- Salah, Z., Owida, H. A., Elsoud, E. A., Alhenawi, E., Abuowaida, S., & Alshdaifat, N. (2024). An effective ensemble approach for preventing and detecting phishing attacks in textual form. Future Internet, 16(11), 414. https://doi.org/10.3390/fi16110414
- Innab, N., Osman, A. A. F., Ataelfadiel, M. A. M., Abu-Zanona, M., Elzaghmouri, B. M., Zawaideh, F. H., & Alawneh, M. F. (2024). Phishing Attacks Detection Using Ensemble Machine Learning Algorithms. Computers, Materials & Continua, 80(1).
- Jawad, S. K., & Alnajjar, S. H. (2024, May). Enhancing Phishing Detection Through Ensemble Learning and Cross-Validation. In 2024 International Conference on Smart Applications, Communications and Networking (SmartNets) (pp. 1-7). IEEE.
- Omolara, A. E., & Alawida, M. (2025). DaE2: Unmasking malicious URLs by leveraging diverse and efficient ensemble machine learning for online security. Computers & Security, 148, 104170.
- Rahaman, M., Pappachan, P., Orozco, S. M., Bansal, S., & Arya, V. (2024). AI Safety and Security. In Challenges in Large Language Model Development and AI Ethics (pp. 354-383). IGI Global.
- Ahmadi, C., & Chen, J. L. (2024, June). Enhancing Phishing Detection: A Multi-Layer Ensemble Approach Integrating Machine Learning for Robust Cybersecurity. In 2024 IEEE Symposium on Computers and Communications (ISCC) (pp. 1-6). IEEE.
- Opara, C.; Chen, Y.;Wei, B. Look before You leap: Detecting phishing web pages by exploiting raw URL And HTML characteristics. Expert Syst. Appl. 2024, 236, 121183. [CrossRef]
- Ahmadi, C., & Chen, J. L. (2024, June). Enhancing Phishing Detection: A Multi-Layer Ensemble Approach Integrating Machine Learning for Robust Cybersecurity. In 2024 IEEE Symposium on Computers and Communications (ISCC) (pp. 1-6). IEEE.
- Ramaiah, M., Chandrasekaran, V., Chand, V., Vasudevan, A., & Ibrahim, S. (2024). Enhanced Phishing Detection: An Ensemble Stacking Model with DT-RFECV and SMOTE. Appl. Math, 18(6), 1481-1493.
- I. D. Mienye and Y. Sun, “A survey of ensemble learning: Concepts, algorithms, applications, and prospects,” IEEE Access, vol. 10, pp. 99129–99149, 2022.
- Sankaranarayanan, S., Sivachandran, A. T., Mohd Khairuddin, A. S., Hasikin, K., & Wahab Sait, A. R. (2024). An ensemble classification method based on machine learning models for malicious Uniform Resource Locators (URL). Plos one, 19(5), e0302196.
- Fatima, R., Fareed, M. M. S., Ullah, S., Ahmad, G., & Mahmood, S. (2024). An Optimized Approach for Detection and Classification of Spam Email’s Using Ensemble Methods. Wireless Personal Communications, 1-27.
- Vajrobol, V., Saxena, G. J., Pundir, A., Singh, S., B. Gupta, B., Gaurav, A., & Rahaman, M. (2024). Identify spoofing attacks in Internet of Things (IoT) environments using machine learning algorithms. Journal of High Speed Networks, 09266801241295886.
- Lv, L., Wu, Z., Zhang, L., Gupta, B. B., & Tian, Z. (2022). An edge-AI based forecasting approach for improving smart microgrid efficiency. IEEE Transactions on Industrial Informatics, 18(11), 7946-7954.
- Rahaman M. (2025) The Anatomy of a Smishing Attack: Common Techniques and Tactics Used by Cybercriminals, Insights2Techinfo, pp.1
Cite As
Cajes N. (2025) AI and Machine Learning in Phishing Detection: Using Ensemble Methods for Improved Accuracy, Insights2Techinfo, pp.1