Detecting Phishing Scam Using Deep Learning: A Cutting-Edge Approach

By: Nicko Cajes; Northern Bukidnon State College, Philippines

Abstract

Cyber-threat that needs to be paid attention is Phishing, as it can successfully disclose important information from the victim using social engineering techniques. Deep Learning (DL) has provided a sophisticated approach in effectively detecting these fraudulent phishing attempts. This study explores the general understanding of what deep learning is and how it can effectively detect phishing attempts. Challenges present on developing effective DL models are also mentioned in the paper, highlighting its importance and impact on how well the model will predict.

Introduction

Phishing has emerged presently a danger to networks all over the world, as harmful connections can be concealed as legitimate URLs and methods of social engineering like emails and messaging applications are used to obtain valuable information, possibly the main crucial strategies for identifying or even stopping scams is feature engineering, which is typically used by phishing detectors [1, 2]. The growing number of network-connected gadgets, such as mobile phones, desktop computers, and Internet of Things devices, has made it possible for people to carry out a variety of everyday tasks online, for example financial transactions, retail purchases, and messaging [3]. However, the anonymous nature of the Internet and the simplicity of carrying out assaults are being used by fraudsters and to protect users from being attacked by this, advanced detection systems are needed to be developed, as phishing along with the quantity of fake websites increase, businesses and people throughout the globe have become more susceptible to attacks via the internet [4]. Therefore, enhanced detection of phishing is necessary for better cyber security. The emergence of Artificial Intelligence [5] has made this possible, specifically the utilization of its subset which is Deep Learning (DL). This article will discuss the utilization of DL as a cutting-edge approach in effectively detecting phishing scams.

What is Deep Learning?

Among the algorithms or subcategories of ML is deep learning. Deep artificial neural networks and deep reinforcement learning are the main examples, and both utilize the word “deep” to describe how many layers the algorithm has [6]. Techniques based on deep learning are currently widely used in numerous domains. Instead of using feature engineering for acquiring deep features, the method of deep learning computes deep features right away from the intended datasets [1, 7]. DL has its key aspects that drive its functionality, these are the neural networks and layers [5, 8]. Figure 1 shows the simplified architecture of a DL model.

Neural Network: Neural networks are machine learning programs, or models, that use methods that resemble how neurons in our brains collaborate to recognize occurrences, evaluate possibilities, and reach judgments. This allows them to make judgments in a fashion that is closer to the function of the brain of a person.

Layers: Neural networks have a number of layers namely, an input layer, several hidden layers, plus an output layer. Every single node possesses distinct weights and thresholds and is connected to all other nodes. Any node that is connected becomes active and sends data towards the network’s subsequent layer if its output exceeds the designated threshold amount. If not, no information is transferred to the network’s subsequent tier.

A diagram of software components

Description automatically generated — Figure 1: Simplified Architecture of Deep Learning Model

Deep Learning Techniques for Phishing Detection

To utilize the advantage of DL in effectively detecting phishing attacks, crucial techniques that need to be employed during the process of development can be present, this includes data preparation, feature extraction, and training the deep learning models.

Feature Extraction: Feature extraction is an important part of the development of a robust DL model, with a good feature extraction technique, many problems can be bypassed in the development phase, enabling a seamless and reliable DL model. Features need to be selected carefully in the given dataset, for example in a URL based detection model using DL conducted by [4] URL features can be extracted and will gain information such as the width of the URL which can be effectively used to identify phishing sites.

Data Preparation: To effectively proceed, let us use the URL extraction scenario once again, URL are made up of a mix of letters and distinctive symbols that indicate different parts. In data preparation the data are methodically extracted out of a URL and compared against a selection of online sites and a randomly selected sample. Finding words that closely match the names of brands, identifying important phrases in the URL, and identifying randomized sequences of letters are the main goals of this procedure. Because it serves as the foundation for distinguishing between authentic and possibly harmful web pages, the accuracy with which these words are identified is essential for the effective identification of URLs [9].

Training the Model: Training a deep learning model is the essential step to choose an appropriate optimization algorithm [10]. The reason for that is because the training phase is where the model learns from various data that you will provide based on what you have prepared in your previous steps. This large number of contents inside the datasets will be the training ground of the deep learning model, which will play a huge role on how well it can predict and detect phishing attacks during the testing phase.

A screenshot of a computer

Description automatically generated — Figure 2: General Process Flow of Training a Deep Learning Model

Challenges and Considerations

Challenges in developing a robust DL method is inevitable, especially in the scarcity of public datasets available. There are lots of datasets that can be accessed as of now but there is a certain problem that was hard to solve, and it is the dataset imbalance.

Since the model will train using the dataset, having a high-quality dataset should be the first thing to acquire in order to have an efficient and reliable model, as it will greatly influence how well the model will predict the phishing and legitimate emails. Fortunately, there are existing techniques to balance the instance in the dataset, and one of those common techniques is the use of synthetic minority over-sampling (SMOTE) and Generative Adversarial Network (GAN). The majority of the instances in a dataset of phishing websites were genuine URLs. Therefore, employing such datasets to train neural networks could result in uneven accuracy, in which a class containing a greater proportion of instance achieves excellent performance while the class containing a minority of training examples achieves poor performance. This tendency restricts the model’s ability to generalize and impacts its performance as a whole [11].

Another difficulty is avoiding overfitting, which happens to be a crucial component of problems with training and can happen whenever an algorithm matches data used for training excessively closely or loosely, leaving the trained model unable to produce reliable predictions or inferences coming from any additional data [12]. In future research, this problem must be handled with utmost priority as it will have a great impact on how the model will behave and predict cyber threats such as phishing correctly.

Conclusion

The growing threat of Phishing attacks have been noticed, utilizing social engineering techniques to fool their victims. Emergence of DL models have transformed how cyber threats like phishing are detected, enabling an advanced approach that is robust and enhances its efficiency. However, challenges related in the process of developing the model such as the data imbalance have been one of the inevitable problems that needs to be solved, the presence of balancing techniques like SMOTE and GAN have effectively helped in solving this issue. Future research should consider handling these challenges carefully as it can make a huge impact on how the model will perform.

References

Zhu, E., Yuan, Q., Chen, Z., Li, X., & Fang, X. (2023). CCBLA: a lightweight phishing detection model based on CNN, BiLSTM, and attention mechanism. Cognitive Computation, 15(4), 1320-1333.
Hussain, M., Cheng, C., Xu, R., & Afzal, M. (2023). CNN-Fusion: An effective and lightweight phishing detection method based on multi-variant ConvNet. Information Sciences, 631, 328–345. https://doi.org/10.1016/j.ins.2023.02.039
Sahingoz, O. K., BUBE, E., & Kugu, E. (2024). Dephides: Deep learning based phishing detection system. IEEE Access, 12, 8052-8070.
Aldakheel, E. A., Zakariah, M., Gashgari, G. A., Almarshad, F. A., & Alzahrani, A. I. (2023). A Deep learning-based innovative technique for phishing detection in modern security with uniform resource locators. Sensors, 23(9), 4403.
Rahaman, M., Pappachan, P., Orozco, S. M., Bansal, S., & Arya, V. (2024). AI Safety and Security. In Challenges in Large Language Model Development and AI Ethics (pp. 354-383). IGI Global.
Deep Learning Based Phishing Detection System (DEPHIDES). Accessed: Dec. 3, 2023. [Online]. Available: https://codeocean.com/ capsule/0874584/tree
Pappachan, P., Adi, N. S., Firmansyah, G., & Rahaman, M. (2025). Deep Learning-Based Forensics and Anti-Forensics. In Digital Forensics and Cyber Crime Investigation (pp. 211-240). CRC Press.
Ibm. (2025, January 27). Neural network. neural-networks. https://www.ibm.com/think/topics/neural-networks
Kamble, N., & Mishra, N. (2024, January). Securing Cyberspace: Unveiling Phishing Attacks Through Deep Neural Networks for Enhanced Detection. In 2024 International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC) (pp. 1-5). IEEE.
Somesha, M., & Pais, A. R. (2024). DeepEPhishNet: a deep learning framework for email phishing detection using word embedding algorithms. Sādhanā, 49(3), 212.
Said, Y., Alsheikhy, A. A., Lahza, H., & Shawly, T. (2024). Detecting phishing websites through improving convolutional neural networks with Self-Attention mechanism. Ain Shams Engineering Journal, 15(4), 102643. https://doi.org/10.1016/j.asej.2024.102643
Mutasa, S., Sun, S., & Ha, R. (2020). Understanding artificial intelligence based radiology studies: What is overfitting?. Clinical imaging, 65, 96-99.
Sedik, A., Hammad, M., Abd El-Samie, F. E., Gupta, B. B., & Abd El-Latif, A. A. (2022). Efficient deep learning approach for augmented detection of Coronavirus disease. Neural Computing and Applications, 1-18.
Mirsadeghi, F., Rafsanjani, M. K., & Gupta, B. B. (2021). A trust infrastructure based authentication method for clustered vehicular ad hoc networks. Peer-to-Peer Networking and Applications, 14, 2537-2553.
Shaik D.A. (2024) AI and Machine Learning in Cloud Security, Insights2Techinfo, pp.1

Cite As

Cajes N. (2025) Detecting Phishing Scam Using Deep Learning: A Cutting-Edge Approach, Insights2Techinfo, pp.1

820000cookie-checkDetecting Phishing Scam Using Deep Learning: A Cutting-Edge Approach

Post Views: 98

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Detecting Phishing Scam Using Deep Learning: A Cutting-Edge Approach

Abstract

Introduction

What is Deep Learning?

Deep Learning Techniques for Phishing Detection

Challenges and Considerations

Conclusion

References

Cite As

Leave a Reply Cancel reply

Detecting and Preventing Phishing Attacks in IoT-Based Smart Healthcare Systems

Data-Driven Insights into Rare Disease Diagnosis and Treatment with AI

Genetic Algorithms and Data Analytics for Cybersecurity in Phishing and Blockchain Systems

Machine Learning in Biometric Security Systems

The Role of AI and Machine Learning in Cloud Storage

How AI is Revolutionizing Cyber Forensics

DDoS Protection Strategies : How to Safeguard Your Network against Massive Attacks

Real time DDoS Mitigation Using FlowGuard and Entropy Analysis

Adaptive Defense Mechanism : The Role of Machine learning in countering DDoS

Blockchain Enabled Distributed System for Securing Network Against DDoS Attacks Current Trends

Artificial Intelligence-Based Approach for Proactive Defense Against DDoS Attacks