By: Nicko Cajes; Northern Bukidnon State College, Philippines
Abstract
Cyber-threat that needs to be paid attention is Phishing, as it can successfully disclose important information from the victim using social engineering techniques. Deep Learning (DL) has provided a sophisticated approach in effectively detecting these fraudulent phishing attempts. This study explores the general understanding of what deep learning is and how it can effectively detect phishing attempts. Challenges present on developing effective DL models are also mentioned in the paper, highlighting its importance and impact on how well the model will predict.
Introduction
Phishing has emerged presently a danger to networks all over the world, as harmful connections can be concealed as legitimate URLs and methods of social engineering like emails and messaging applications are used to obtain valuable information, possibly the main crucial strategies for identifying or even stopping scams is feature engineering, which is typically used by phishing detectors [1, 2]. The growing number of network-connected gadgets, such as mobile phones, desktop computers, and Internet of Things devices, has made it possible for people to carry out a variety of everyday tasks online, for example financial transactions, retail purchases, and messaging [3]. However, the anonymous nature of the Internet and the simplicity of carrying out assaults are being used by fraudsters and to protect users from being attacked by this, advanced detection systems are needed to be developed, as phishing along with the quantity of fake websites increase, businesses and people throughout the globe have become more susceptible to attacks via the internet [4]. Therefore, enhanced detection of phishing is necessary for better cyber security. The emergence of Artificial Intelligence [5] has made this possible, specifically the utilization of its subset which is Deep Learning (DL). This article will discuss the utilization of DL as a cutting-edge approach in effectively detecting phishing scams.
What is Deep Learning?
Among the algorithms or subcategories of ML is deep learning. Deep artificial neural networks and deep reinforcement learning are the main examples, and both utilize the word “deep” to describe how many layers the algorithm has [6]. Techniques based on deep learning are currently widely used in numerous domains. Instead of using feature engineering for acquiring deep features, the method of deep learning computes deep features right away from the intended datasets [1, 7]. DL has its key aspects that drive its functionality, these are the neural networks and layers [5, 8]. Figure 1 shows the simplified architecture of a DL model.
Neural Network: Neural networks are machine learning programs, or models, that use methods that resemble how neurons in our brains collaborate to recognize occurrences, evaluate possibilities, and reach judgments. This allows them to make judgments in a fashion that is closer to the function of the brain of a person.
Layers: Neural networks have a number of layers namely, an input layer, several hidden layers, plus an output layer. Every single node possesses distinct weights and thresholds and is connected to all other nodes. Any node that is connected becomes active and sends data towards the network’s subsequent layer if its output exceeds the designated threshold amount. If not, no information is transferred to the network’s subsequent tier.

Deep Learning Techniques for Phishing Detection
To utilize the advantage of DL in effectively detecting phishing attacks, crucial techniques that need to be employed during the process of development can be present, this includes data preparation, feature extraction, and training the deep learning models.
Feature Extraction: Feature extraction is an important part of the development of a robust DL model, with a good feature extraction technique, many problems can be bypassed in the development phase, enabling a seamless and reliable DL model. Features need to be selected carefully in the given dataset, for example in a URL based detection model using DL conducted by [4] URL features can be extracted and will gain information such as the width of the URL which can be effectively used to identify phishing sites.
Data Preparation: To effectively proceed, let us use the URL extraction scenario once again, URL are made up of a mix of letters and distinctive symbols that indicate different parts. In data preparation the data are methodically extracted out of a URL and compared against a selection of online sites and a randomly selected sample. Finding words that closely match the names of brands, identifying important phrases in the URL, and identifying randomized sequences of letters are the main goals of this procedure. Because it serves as the foundation for distinguishing between authentic and possibly harmful web pages, the accuracy with which these words are identified is essential for the effective identification of URLs [9].
Training the Model: Training a deep learning model is the essential step to choose an appropriate optimization algorithm [10]. The reason for that is because the training phase is where the model learns from various data that you will provide based on what you have prepared in your previous steps. This large number of contents inside the datasets will be the training ground of the deep learning model, which will play a huge role on how well it can predict and detect phishing attacks during the testing phase.

Challenges and Considerations
Challenges in developing a robust DL method is inevitable, especially in the scarcity of public datasets available. There are lots of datasets that can be accessed as of now but there is a certain problem that was hard to solve, and it is the dataset imbalance.
Since the model will train using the dataset, having a high-quality dataset should be the first thing to acquire in order to have an efficient and reliable model, as it will greatly influence how well the model will predict the phishing and legitimate emails. Fortunately, there are existing techniques to balance the instance in the dataset, and one of those common techniques is the use of synthetic minority over-sampling (SMOTE) and Generative Adversarial Network (GAN). The majority of the instances in a dataset of phishing websites were genuine URLs. Therefore, employing such datasets to train neural networks could result in uneven accuracy, in which a class containing a greater proportion of instance achieves excellent performance while the class containing a minority of training examples achieves poor performance. This tendency restricts the model’s ability to generalize and impacts its performance as a whole [11].
Another difficulty is avoiding overfitting, which happens to be a crucial component of problems with training and can happen whenever an algorithm matches data used for training excessively closely or loosely, leaving the trained model unable to produce reliable predictions or inferences coming from any additional data [12]. In future research, this problem must be handled with utmost priority as it will have a great impact on how the model will behave and predict cyber threats such as phishing correctly.
Conclusion
The growing threat of Phishing attacks have been noticed, utilizing social engineering techniques to fool their victims. Emergence of DL models have transformed how cyber threats like phishing are detected, enabling an advanced approach that is robust and enhances its efficiency. However, challenges related in the process of developing the model such as the data imbalance have been one of the inevitable problems that needs to be solved, the presence of balancing techniques like SMOTE and GAN have effectively helped in solving this issue. Future research should consider handling these challenges carefully as it can make a huge impact on how the model will perform.
References
- Zhu, E., Yuan, Q., Chen, Z., Li, X., & Fang, X. (2023). CCBLA: a lightweight phishing detection model based on CNN, BiLSTM, and attention mechanism. Cognitive Computation, 15(4), 1320-1333.
- Hussain, M., Cheng, C., Xu, R., & Afzal, M. (2023). CNN-Fusion: An effective and lightweight phishing detection method based on multi-variant ConvNet. Information Sciences, 631, 328–345. https://doi.org/10.1016/j.ins.2023.02.039
- Sahingoz, O. K., BUBE, E., & Kugu, E. (2024). Dephides: Deep learning based phishing detection system. IEEE Access, 12, 8052-8070.
- Aldakheel, E. A., Zakariah, M., Gashgari, G. A., Almarshad, F. A., & Alzahrani, A. I. (2023). A Deep learning-based innovative technique for phishing detection in modern security with uniform resource locators. Sensors, 23(9), 4403.
- Rahaman, M., Pappachan, P., Orozco, S. M., Bansal, S., & Arya, V. (2024). AI Safety and Security. In Challenges in Large Language Model Development and AI Ethics (pp. 354-383). IGI Global.
- Deep Learning Based Phishing Detection System (DEPHIDES). Accessed: Dec. 3, 2023. [Online]. Available: https://codeocean.com/ capsule/0874584/tree
- Pappachan, P., Adi, N. S., Firmansyah, G., & Rahaman, M. (2025). Deep Learning-Based Forensics and Anti-Forensics. In Digital Forensics and Cyber Crime Investigation (pp. 211-240). CRC Press.
- Ibm. (2025, January 27). Neural network. neural-networks. https://www.ibm.com/think/topics/neural-networks
- Kamble, N., & Mishra, N. (2024, January). Securing Cyberspace: Unveiling Phishing Attacks Through Deep Neural Networks for Enhanced Detection. In 2024 International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC) (pp. 1-5). IEEE.
- Somesha, M., & Pais, A. R. (2024). DeepEPhishNet: a deep learning framework for email phishing detection using word embedding algorithms. Sādhanā, 49(3), 212.
- Said, Y., Alsheikhy, A. A., Lahza, H., & Shawly, T. (2024). Detecting phishing websites through improving convolutional neural networks with Self-Attention mechanism. Ain Shams Engineering Journal, 15(4), 102643. https://doi.org/10.1016/j.asej.2024.102643
- Mutasa, S., Sun, S., & Ha, R. (2020). Understanding artificial intelligence based radiology studies: What is overfitting?. Clinical imaging, 65, 96-99.
- Sedik, A., Hammad, M., Abd El-Samie, F. E., Gupta, B. B., & Abd El-Latif, A. A. (2022). Efficient deep learning approach for augmented detection of Coronavirus disease. Neural Computing and Applications, 1-18.
- Mirsadeghi, F., Rafsanjani, M. K., & Gupta, B. B. (2021). A trust infrastructure based authentication method for clustered vehicular ad hoc networks. Peer-to-Peer Networking and Applications, 14, 2537-2553.
- Shaik D.A. (2024) AI and Machine Learning in Cloud Security, Insights2Techinfo, pp.1
Cite As
Cajes N. (2025) Detecting Phishing Scam Using Deep Learning: A Cutting-Edge Approach, Insights2Techinfo, pp.1