Machine Learning for Phishing detection

By: Praneetha Neelapareddigari, Department of Computer Science & Engineering, Madanapalle Institute of Technology and Science, Angallu (517325), Andhra Pradesh. praneetha867reddy@gmail.com

Abstract

Present day phishing is a severe and emerging form of security threat to the people and the corporate entities with a malicious intention of getting their passwords, their bank account, and every other detail. The centennial techniques that mainly rely on axioms fail to contain the novelty that is trickery from the enemy tacticians. Consequently, according to the given problem, which concerns the process of filtering out the phishing threats, machine learning presents a rather efficient practical application of vast amounts of data and numerous complex mathematical operations. Real-time detection of Phishing attempts, using Machine learning with near 100% accuracy Hence it is possible to predict the content in the body of the email, the behaviour of the sender and URL pattern of the phishing links. In analysing the research domain of this study, the emphasis here is given to the identification of Machine Learning methods to be used in the determination of the said Phishing domain. The poor success rate that is evidenced by low detection talents, the many types of algorithms that are used, feature engineering procedure and the problems of deploying effective detection systems are discussed.

Keywords: Machine Learning, Phishing Detection, Artificial Intelligence, Cybersecurity

Introduction

It should be mentioned that at the present time, the threats on the internet that involve phishing attacks are the most frequent in the sphere of cybercriminal activity that affects both individuals and companies. These attacks mostly employ some tricks through which the assailant can coax the victim into revealing or disclosing information such as credit card number, password, or identity. Even though there are numerous typical procedures that are used to address the issue of phishing, the constant increase in the level of these attacks’ elaboration calls for employing more effective and diverse solutions. In this lies the possibilities offered by machine learning: The existence of potent instruments for analysing, estimating and recognising the phishing attack.

It is not easy to design the perfect solution to detect phishing; instead, it has many aspects. Blacklist and heuristic schemes, for instance, tend not to yield positive results because they are based on conventional knowledge about existing programming concepts, which the smartest of the intruders can easily avoid. Possible solutions for these problems are in the capacity of machine learning algorithms to adapt from huge quantities of information[1], ability to see the conspicuous patterns in the incoming phishing threats as well as redesign the operating process depending on the new kind of phishing threats.

For the identification of such a motivating force, the machine learning for ground surveillance for phishing detection focuses on the models and algorithm, which have an analysis for the several features that are in the emails, URLs, web pages etc., Such a system may greatly improve the detection and interference of the several attempts at phishing through supervised learning, that is the use of data sets with labels and the use of unsupervised learning of the labelled data. The combination of machine learning into cyberattacks decreases the chances of the act of phishing and aids towards the formation of a robust protective network since the threats are evolving in the cyberspace[2].

1. Introduction to Phishing

Phishing

Phishing is a type of cyber-attack in which the criminals imitate the well-established institution to achieve the goal of forcing people to provide their passwords, account numbers, or any information which is desirable to the imitator[3]. Developed from ‘fishing’, this kind of attack style developed middle of the nineties as a trick technique in which the attacker tried to get some details out of the victims. Phishing has evolved a lot since it began: With the help of new media like Social Networking Sites, Instant Messaging Application and E-mail the Phishing links are circulated and People are taking Benefit of it.

1.1 Types of Phishing

Phishing can be of many types and is used where one or the other weakness is intended to be exploited. Some of the frequent types include the whaling technique that targets important personalities such as managers and leaders or government employees, clone phishing where actual correspondences in an organization or company are copied and used in the transmission of the infection, spear phishing where the infection is targeted on certain personalities or organizations[3]. Other variations demonstrate how the fundamental secrets of phishing can be extended to other technical environments such as smishing, the impersonation of Short Messaging System (SMS), and vishing, which entails voice phishing.

A screenshot of a computer program

Description automatically generated
Figure 1: Types of Phishing Attacks

1.2 Impacts

Phishing is very dangerous to people and firms in that it can cause very adverse effects. Making it possible for those behind phishing to benefit and their targets to incur the following effects, severe an emotional distress, loss of money lots of cash, identity theft[4]. On the other hand, organizations are on the receiving end as these incidents attract severe consequences including financial loses, the company’s credibility is compromised and more often their information is stolen.

2. Basics of Machine Learning

A branch of artificial intelligence called machine learning (ML) is concerned with creating statistical models and algorithms that let computers carry out tasks without direct human guidance[5]. Rather, these algorithms acquire knowledge from data, seeing trends and coming to conclusions with little help from humans. Many companies and businesses can implement more adaptable and expandable applications in various areas like security, medical care, or insurance. The three broad classification of machine learning are supervised learning, unsupervised learning and reinforcement learning. Supervised learning is where a model is trained to work on a labelled dataset, a dataset where there’s an output for every input. This approach is frequently used for regression and classification applications. Conversely, unsupervised learning works with unlabelled data and concentrates on finding underlying structures or hidden patterns in the data. Unsupervised learning is commonly applied to tasks like dimensionality reduction and clustering. Last but not least, an agent may learn to make decisions using reinforcement learning, a kind of machine learning, by acting and then getting feedback in the form of rewards or penalties. This method is frequently applied in industries like gaming and robotics.

3. Role of Machine Learning in Phishing Detection

Since, machine learning approaches are adaptive, dynamic, and capable of dealing with the issues of the traditional approaches of phishing detection. Modern machine learning techniques can easily train a model to learn different aspects of an email including the content of the email, URL patterns, and metadata to determine whether it is a phishing attempt or not. Compared to other methods, machine learning models can learn and update themselves and thus are capable of learning new types of phishing strategies. Supervised learning methods make it possible to train the models on the labelled datasets to identify and report phishing attempts accurately[6]. It can find some new tendencies of phishing and can inspect the given large sets of data that contain many examples with anomalies[7]. When it comes to the effectiveness of the new technologies, it can be stated that by using machine learning, organisations can boost their counter-phishing capabilities within the constantly growing threat landscape[8].

4. Machine Learning Algorithms used for Phishing Detection

Phishing detection incorporates several strategies of artificial intelligence to perform this the detection of fraudulent behaviours effectively. This is due to the flexibility of the Algorithms namely Decision Trees, Random Forests, and Support Vector Machines (SVM) to work well on different type of data and their ability to give out high accuracy. There are also ensemble dependencies where several models are used to raise the efficiency of the detection abilities. In more recent times, advanced learning algorithms have been prominent, especially the Neural Networks, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and so on, especially when it comes to analysis of Intricate data patterns such as in Emails and URLS[9].

5. Model Training and Evaluation

The training of the machine learning models entails feeding them with labelled data to identify other distinguishing features of the phishing attacks. Whereas Performance evaluation measures such as, Accuracy, recall, F1 measure, ROC-AUC are significant for the assessment of the overall performance of the built models, which not only checks whether models are correct, but also specifies if they can identify real positives and minimising false positives. Cross-validation techniques help to check how well ‘‘the wrapped up’ model performs on new data that it has not encountered hence through the repeated separating of data into training and testing data Cross-validation techniques aids in validating the model’s performance and reliability.

A diagram of a machine learning

Description automatically generated
Figure 2: Process of Machine Learning in Phishing Detection

6. Challenges and Future

Today, employment of machine learning for identification of phishing has the following issues like, there is constant evolution in the tactics of phishing and he or she may surpass in developing algorithms for identifying the same. In their operations, attackers do things in an inconspicuous manner such as concealing the URLs and employing the field to avoid detection by the traditional methods. This means that the models that are used in countering this type of attack are always updated and refreshed to respond to the attack types. It is anticipated that future research will focus on improving the flexibility of the identification systems, applying more complex methods, combining multimode data sources, and applying a live threat intelligence. In addition, it will be befitting to consider the federated learning and the privacy-preserving approaches in an endeavour to improve the detection capabilities while at the same time preserving privacy.

Conclusion

Thus, it can be concluded that machine learning is one of the efficient techniques to prevent and detect phishing attack and their relevant activities with higher percentage of efficiency rates. Due to the integration of nonlinear factors and massive data, the machine-learning can identify the patterns and exceptions Classic methods cannot and this makes the individuals and organizations more secured. Hence the further enhancement of machine learning algorithms regarding the ever-extended activity of phishers seems to emerge as a crucial factor which will contribute to the development of solid base for a safer world Wide Web.

References

  1. B. Mahesh, Machine Learning Algorithms -A Review, vol. 9. 2019. doi: 10.21275/ART20203995.
  2. A. Handa, A. Sharma, and S. Shukla, “Machine learning in cybersecurity: A review,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 9, p. e1306, Feb. 2019, doi: 10.1002/widm.1306.
  3. K. L. Chiew, K. S. C. Yong, and C. L. Tan, “A survey of phishing attacks: Their types, vectors and technical approaches,” Expert Syst. Appl., vol. 106, pp. 1–20, Sep. 2018, doi: 10.1016/j.eswa.2018.03.050.
  4. M. Mohd Ali and N. Zaharon, “Phishing—A Cyber Fraud: The Types, Implications and Governance,” Int. J. Educ. Reform, vol. 33, p. 105678792210829, Mar. 2022, doi: 10.1177/10567879221082966.
  5. M. O. K. Mendonça, S. L. Netto, P. S. R. Diniz, and S. Theodoridis, “Chapter 13 – Machine learning: Review and trends,” in Signal Processing and Machine Learning Theory, P. S. R. Diniz, Ed., Academic Press, 2024, pp. 869–959. doi: 10.1016/B978-0-32-391772-8.00019-3.
  6. M. Rahaman, C.-Y. Lin, P. Pappachan, B. B. Gupta, and C.-H. Hsu, “Privacy-Centric AI and IoT Solutions for Smart Rural Farm Monitoring and Control,” Sensors, vol. 24, no. 13, Art. no. 13, Jan. 2024, doi: 10.3390/s24134157.
  7. T. Agboola, “Development of a Novel Approach to Phishing Detection using Machine Learning,” vol. 12, pp. 336–351, Jun. 2024.
  8. L. Triyono, – Prayitno, M. Rahaman, – Sukamto, and A. Yobioktabera, “Smartphone-based Indoor Navigation for Guidance in Finding Location Buildings Using Measured WiFi-RSSI,” JOIV Int. J. Inform. Vis., vol. 6, no. 4, pp. 829–834, Dec. 2022, doi: 10.30630/joiv.6.4.1528.
  9. J. Tanimu, S. Shiaeles, and M. Adda, “A Comparative Analysis of Feature Eliminator Methods to Improve Machine Learning Phishing Detection,” J. Data Sci. Intell. Syst., vol. 2, no. 2, Art. no. 2, 2024, doi: 10.47852/bonviewJDSIS32021736.
  10. Vajrobol, V., Gupta, B. B., & Gaurav, A. (2024). Mutual information based logistic regression for phishing URL detection. Cyber Security and Applications, 2, 100044.
  11. Gupta, B. B., Gaurav, A., Panigrahi, P. K., & Arya, V. (2023). Analysis of cutting-edge technologies for enterprise information system and management. Enterprise Information Systems, 17(11), 2197406.
  12. Gupta, B. B., Gaurav, A., & Panigrahi, P. K. (2023). Analysis of retail sector research evolution and trends during COVID-19. Technological Forecasting and Social Change, 194, 122671.

Cite As

 Neelapareddigari P. (2024) Machine Learning for Phishing detection, Insights2Techinfo, pp.1

75580cookie-checkMachine Learning for Phishing detection
Share this:

Leave a Reply

Your email address will not be published.