By: Jampula Navaneeth1
1Vel Tech University, Chennai, India
2International Center for AI and Cyber Security Research and Innovations, Asia University, Taiwan Email: navaneethjampula@gmail.com
Abstract
As we see today’s generation is habituated to internet. Due to our laziness, we always use mobile from morning to evening. We daily order food, cloths and utilities all from home using our internet. This makes us in trouble and makes easier to phishers. Basically phishing is a type of trap which involves in getting the trusted user’s personal data like bank details and personal images or videos just by clicking a fake links or visiting fake websites. To detect those type of fake links we are using machine learning techniques. In this article we are going to see how we use machine learning for phishing detection and highlighting the various machine learning techniques.
Keywords: Machine Learning, Fake Websites, Links, Techniques
Introduction
Machine Learning a sub branch of artificial intelligence. Phishing is a sort of cyber assault that uses sites to get substantial client data such as store card numbers, accounts, login ids, and the sky is the limit from there. It can be said that the aspects of accuracy are central to a mechanism for an automated gadget to detect phishes based on machine learning. The majority of the anti-phishers researchers focus on proposing new feature concepts or enhancing classification methods, where the concise definition of effective feature analysis and selection procedures is not among the main priorities [1].
Background
Phishing email is among the Internet’s most rapidly developing kinds of cybercrime. Even though it might look like a warning message coming from somebody you might know or a company you are affiliated with, its main purpose is deception and, primarily, the theft of a user’s identity by prompting that user to reveal personal details. All in all, for cybercriminals, phishing has transfer into a viable commercial enterprise. Phishing scams are dangerous and when performed effectively, the victim loses money or has personal details stolen. It is apparent that it impacts even modern and developing technologies such as the IoT and cloud computing [2].
Machine Learning
A machine learning algorithm is a procedure, which is fed the input and identifies the best way of arriving at a given output without having been specifically coded to do so. These algorithms are in some way ‘soft coded’ in that they modify or redesign themselves, in the sense of improving themselves or their architecture, the more the task they are set is performed. The process of learning is called training in which samples of the input data are given along with the expected results [3].
Phishing Attacks
Currently, clarity of definition of phishing attack is lacking in the literature which is due to the fact that the phishing problem comprises of different scenarios. For instance, looking at PhishTank1: “Phishing therefore is an online scam most of the time involving e-mail, with the aim of trying to get your identity details”
Reduced to its fundamental components, Phish Tank’s definition is valid in a number of scenarios which, broadly, encompass the solid majority of phishing incidents (though no numbers research has yet been conducted to determine the veracity of this statement) [4]. However, the definition restricts phishing attack to focus only on other people’s private information and this may not always be true [5].
Figure 1 shows the step-by-step process of phishing detection. Based on various models we can make changes in the process of detecting the phishing attacks.
Motive of Phishing
One of the attacker’s major reasons for carrying out phishing attacks is the grab the person’s banking credentials and steal his money.
- Financial gain: Apart from the PC devices, pilfering of the banking credentials is disguised by phishing as recommended transactions.
- Identity hiding: Having stolen personal data of people, criminals reuse such data for one of the following purposes, including the sale of these data to fake customers that may be involved in illegal activities such as drug trafficking.
- Fame and notoriety: Fraudsters, in particular, the phishers, can be the attackers who anonymously delight in pain simply [4].
Challenges
Implementing machine learning for the identification of phishing comes with certain factors that include, quality of data as well as quantity, identifying the features and training of the model [1]. Collecting a huge amount of label data is not easy, moreover, working with imbalanced data is addressed by oversampling. Selecting the features to use is also important to include and updating one’s knowledge on the developing techniques in phishing is as important [4]. Also, issues with excess reliance on the model and choosing the correct measures of quality, as well as processing time, are significant [6]. High false positive and negative rates are a disadvantage since they influence the usability and security of the model that requires constant updates and fine-tuning [1]. Finally, the legal and ethical part should be discussed for appropriate handling of sensitive data and for the model training fairness.
Conclusion
Implementing machine learning in fighting cyber threats with a specific focus on phishing, provides a strong weapon against such threats. The following procedures provided in the existing guide help absolute novice to create a simple model of phishing detection. Since blocking is a machine learning process, refinement and modification are always significant for ensuring the implementation’s effectiveness in response to emerging phishers strategies.
References
- M. N. Alam, D. Sarma, F. F. Lima, I. Saha, R.-E.- Ulfath, and S. Hossain, “Phishing Attacks Detection using Machine Learning Approach,” in 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), Aug. 2020, pp. 1173–1179. doi: 10.1109/ICSSIT48917.2020.9214225.
- A. Alhogail and A. Alsabih, “Applying machine learning and natural language processing to detect phishing email,” Computers & Security, vol. 110, p. 102414, Nov. 2021, doi: 10.1016/j.cose.2021.102414.
- I. El Naqa and M. J. Murphy, “What Is Machine Learning?,” in Machine Learning in Radiation Oncology: Theory and Applications, I. El Naqa, R. Li, and M. J. Murphy, Eds., Cham: Springer International Publishing, 2015, pp. 3–11. doi: 10.1007/978-3-319-18305-3_1.
- M. Khonji, Y. Iraqi, and A. Jones, “Phishing Detection: A Literature Survey,” IEEE Communications Surveys & Tutorials, vol. 15, no. 4, pp. 2091–2121, 2013, doi: 10.1109/SURV.2013.032213.00009.
- M. Rahaman, V. Arya, S. M. Orozco, and P. Pappachan, “Secure Multi-Party Computation (SMPC) Protocols and Privacy,” in Innovations in Modern Cryptography, IGI Global, 2024, pp. 190–214. doi: 10.4018/979-8-3693-5330-1.ch008.
- L. Triyono, R. Gernowo, P. Prayitno, M. Rahaman, and T. R. Yudantoro, “Fake News Detection in Indonesian Popular News Portal Using Machine Learning For Visual Impairment,” JOIV : International Journal on Informatics Visualization, vol. 7, no. 3, pp. 726–732, Sep. 2023, doi: 10.30630/joiv.7.3.1243.
- Gupta, B. B., & Panigrahi, P. K. (2022). Analysis of the Role of Global Information Management in Advanced Decision Support Systems (DSS) for Sustainable Development. Journal of Global Information Management (JGIM), 31(2), 1-13.
- Gupta, B. B., & Narayan, S. (2021). A key-based mutual authentication framework for mobile contactless payment system using authentication server. Journal of Organizational and End User Computing (JOEUC), 33(2), 1-16.
- Gupta, B. B., & Narayan, S. (2021). A key-based mutual authentication framework for mobile contactless payment system using authentication server. Journal of Organizational and End User Computing (JOEUC), 33(2), 1-16.
Cite As
Navaneeth J. (2024) A Beginners Guide to Using Machine Learning for Phishing Detection, Insights2Techifo, pp.1