By: Jampula Navaneeth1
1Vel Tech University, Chennai, India
2International Center for AI and Cyber Security Research and Innovations, Asia University, Taiwan Email: navaneethjampula@gmail.com
Abstract
Phishing is still one of the most common threats in the field of cybersecurity, where the offender tries to lure the person into submitting beneficial information. To counter these justifications, the use of machine learning (ML) models has been vital because it provides an efficient and automatic detection methods. This article discusses fully machine learning approaches and measures the performance when trained to act on/towards datasets containing features effective in distinguishing a Phishing Website from the normal or safe one. The ability to tell these sites apart from each other is important in the day’s modern internet browsing.
Keywords: Phishing, Cyber Security, Machine Learning
Introduction
Phishing has turned out to be prevalent as well as dynamic within the context of the information technology environment, and it affects both, individual users and companies. Such attacks are usually in a form of a phishing attack in which the attackers make the targets reveal personal information like passwords or financial information under the disguise of authenticity [1]. And this is where the machine learning (ML) comes into play as the more efficient, and flexible approach to phishing detection [2]. This paper focuses on discussing some of the best ML models that can help facilitate the detection of the phishing attacks and in doing so showcases their strengths to the cybersecurity fraternity.
ML Models for Detecting Phishing Attacks
- Randon Forest: Random Forest is an erosive learning model which involves construction of many decision trees and then combining over them so as to enhance the proficiency of the estimation. When it comes to feature analysis in phishing detection Random Forest can examine the URL length, domain age and whether the URL contains special characters hence making it easier to differentiate between a genuine website and a phishing one. This is a simple model that can accommodate big data, as well as avert the issue of over-training on a given set [3].
- Support Vector Machines: SVM is a strong classifier that goes through the course of selecting the best hyperplane to be used in the classification procedures for various classes of data. In phishing detection, SVM can be more useful when the data being used is with more features such as URL contents or body text of the email messages. Due to the ability of transforming the data to higher dimension, SVM can have the ability of separating between the phishing and non-phishing instances even in complicated cases [3].
- Neural Networks: Deep learning models under Neural Networks have been of immense success when it comes to identification of the phishing attacks. Such models can deal with rather simple patterns and relationships within the data, mimicking the human brain. CNNs and CNNs have been applied for analysing phishing URLs, emails and web content. Because of their capacity to automatically learn features and enhance the proficiency with new data, the algorithms are effective tools against altering phishing strategies [4].
- Gradient Boosting Machines: XGBoost and LightGBM are among the most famous models that are based on the GBM, this technique works by building an array of models one after the other correcting the errors made by other. This is because they are very effective with cases of phishing because of the ability of GBMs to deal with imbalanced datasets and because the approach of the GBMs is to work mainly on the difficult second-stage calls. They are useful in the real-time phishing detection where time and accuracy of detection is of essence [5].
- Naive Bayes: Naive Bayes is a classifier that is based on the probability theory called Bayes probability and it has the feature of achieving independence. For all of that, the Naive Bayes has been successfully used in phishing detection, especially in the case of e-mail filters [3]. Thus, Naive Bayes with feature extraction makes it possible to classify a given URL or email as phishing or not with reasonable accuracy in reasonable time, and that is why it can be used in real-time applications [6].
Conclusion
This is why machine learning models provide a major useful in the identification of phishing attacks since the identification is automated and the models learn about new threats. Despite of their differences and advantages, the type of model to be selected is usually determined by the nature of the problem at hand, the requirement for real time detection, and the power of the hardware that is available. Since the tactics in phishing are changing constantly, research and development of difficult ML models will keep on being essential to protect such data.
References
- M. Rahaman, S. S. Bakkireddygari, S. Chattopadhyay, A. L. Gomez, V. Arya, and S. Bansal, “Infrastructure and Network Security,” in Metaverse Security Paradigms, IGI Global, 2024, pp. 108–144. doi: 10.4018/979-8-3693-3824-7.ch005.
- E. Gandotra and D. Gupta, “An Efficient Approach for Phishing Detection using Machine Learning,” in Multimedia Security: Algorithm Development, Analysis and Applications, K. J. Giri, S. A. Parah, R. Bashir, and K. Muhammad, Eds., Singapore: Springer, 2021, pp. 239–253. doi: 10.1007/978-981-15-8711-5_12.
- S. Hossain, D. Sarma, and R. Joyti, “Machine Learning-Based Phishing Attack Detection,” IJACSA, vol. 11, no. 9, 2020, doi: 10.14569/IJACSA.2020.0110945.
- O. K. Sahingoz, S. Işılay Baykal, and D. Bulut, “PHISHING DETECTION FROM URLS BY USING NEURAL NETWORKS,” in Computer Science & Information Technology (CS & IT), AIRCC Publication Corporation, Dec. 2018, pp. 41–54. doi: 10.5121/csit.2018.81705.
- K. Omari, “Phishing Detection using Gradient Boosting Classifier,” Procedia Computer Science, vol. 230, pp. 120–127, Jan. 2023, doi: 10.1016/j.procs.2023.12.067.
- L. Triyono, R. Gernowo, P. Prayitno, M. Rahaman, and T. R. Yudantoro, “Fake News Detection in Indonesian Popular News Portal Using Machine Learning For Visual Impairment,” JOIV : International Journal on Informatics Visualization, vol. 7, no. 3, pp. 726–732, Sep. 2023, doi: 10.30630/joiv.7.3.1243.
- Li, K. C., Gupta, B. B., & Agrawal, D. P. (Eds.). (2020). Recent advances in security, privacy, and trust for internet of things (IoT) and cyber-physical systems (CPS).
- Chaudhary, P., Gupta, B. B., Choi, C., & Chui, K. T. (2020). Xsspro: Xss attack detection proxy to defend social networking platforms. In Computational Data and Social Networks: 9th International Conference, CSoNet 2020, Dallas, TX, USA, December 11–13, 2020, Proceedings 9 (pp. 411-422). Springer International Publishing.
- Gupta, B. B., Gaurav, A., Arya, V., Alhalabi, W., Alsalman, D., & Vijayakumar, P. (2024). Enhancing user prompt confidentiality in Large Language Models through advanced differential encryption. Computers and Electrical Engineering, 116, 109215.
Cite As
Navaneeth J. (2024) Machine Learning Models that Excel in Detecting Phishing Attacks, Insights2Techinfo, pp.1