By: Mosiur Rahaman, International Center for AI and Cyber Security Research and Innovations,Asia University,Taiwan
Abstract:
Phishing attacks are a major danger to anyone who use the internet, so it is important to have strong and flexible detection systems in place to effectively reduce these risks. Conventional phishing detection systems frequently face difficulties in keeping up with the swiftly changing strategies utilized by attackers. This research investigates the capabilities of adaptive phishing detection systems that utilize online learning techniques to constantly update their models with new data. By integrating online learning methods, these systems can improve their ability to detect new phishing attacks quickly and accurately. By providing a design for a dynamic phishing detection system, analyse each of its components, and assess its effectiveness using actual data from real-world scenarios.
Keywords: Phishing attack, Online Learning, Feature Extraction
Introduction:
Phishing seems to be a widespread cyber threat, tricking consumers into revealing confidential information through deceptive emails, websites, and texts. Conventional phishing detection systems depend on predetermined rules and unchanging machine learning models that are updated at regular intervals [1]. Nevertheless, the ever-changing nature of phishing attempts requires a more flexible approach. Online learning approaches provide a promising answer by enabling models to continually learn from fresh data, hence enhancing their ability to detect over time. This paper provides a thorough examination of adaptive phishing detection systems that utilize online learning methods. It emphasizes the benefits, difficulties, and approaches to implementation associated with these systems [2].
Proposed Method:
The proposed adaptive phishing detection system incorporates online learning techniques to guarantee ongoing model upgrades and enhanced detection accuracy. The system architecture consists of four primary components: data gathering, feature extraction, online learning, and Model Evaluation and Feedback Loop.
1. Data Collection: The system consistently acquires data from multiple sources, such as email servers, online traffic, and user reports. This dataset includes both genuine and phishing occurrences, offering a varied collection of data for training and assessing models.
2. Feature Extraction: The acquired data is analysed to identify and extract the specific attributes that distinguish phishing communications from authentic ones. The features encompassed in this category may consist of URL patterns, email headers, text analysis, and network behaviour indicators. Feature extraction is an essential process to ensure that the online learning model can accurately distinguish between phishing and authentic cases [3].
3. Online Learning: The essential aspect of the adaptive system is its online learning component. Our approach utilizes an incremental learning technique that continuously adjusts the model parameters as fresh data instances are introduced. This method enables the system to dynamically adjust to emerging phishing tactics and trends in real-time. Well-known online learning methods, such as stochastic gradient descent (SGD) and online support vector machines (SVM), are very suitable for this objective.
To demonstrate the functioning of online learning in the suggested system, let us examine the following example:
- Initial Model Training: The model is initially trained using a dataset that contains labelled phishing and authentic emails. The primary purpose of this initial training step is to establish an initial framework capable of detecting prevalent phishing patterns.
- Incremental Updates: As additional emails are received, the system categorizes each email as either phishing or legitimate using the initial model. Nevertheless, phishing techniques undergo changes, and the baseline model may fail to identify new patterns. This is when online learning becomes relevant.
- Stochastic Gradient Descent (SGD) is an optimization technique employed to train machine learning models progressively. Upon receiving a new email, the model’s parameters are adjusted to minimize the categorization error. The equation for updating the parameters 𝜃.
θ in Stochastic Gradient Descent (SGD) is: where, is the learning rate, is the gradient of the loss function L, with respect to the parameters and is the new email instance.
During Real-Time adaption, the model parameters are modified for each new email, enabling the system to acquire knowledge and adjust to novel phishing techniques in real-time. The ongoing process of updating improves the model’s capacity to identify phishing patterns that were not previously known, ensuring consistent and accurate detection over time.
4. Model Evaluation and Feedback Loop: The model’s effectiveness is consistently assessed using metrics such as accuracy, precision, recall, and F1-score to assure its performance. An iterative procedure is employed, where misclassified cases are identified and reintegrated into the training process, enhancing the precision of the model [4-7]. Figure 1 shows the Adaptive Phishing Detection System framework.
Conclusion:
A major development in the fight against phishing attempts is the use of adaptive phishing detection systems applying online learning approaches. These systems can quickly react to developing phishing strategies by always updating detection models with fresh data, hence improving their accuracy and responsiveness. The suggested approach emphasizes in the development of strong adaptive detection systems the need of data collecting, feature extraction, and incremental learning Future studies should concentrate on improving online learning algorithms and investigating their interaction with other cybersecurity policies to provide complete defences against phishing risks.
Reference:
- B. B. Gupta, N. Arachchilage, and K. Psannis, “Defending against Phishing Attacks: Taxonomy of Methods, Current Issues and Future Directions,” Telecommunication Systems, vol. 67, Feb. 2018, doi: 10.1007/s11235-017-0334-z.
- R. J. van Geest, G. Cascavilla, J. Hulstijn, and N. Zannone, “The applicability of a hybrid framework for automated phishing detection,” Computers & Security, vol. 139, p. 103736, Apr. 2024, doi: 10.1016/j.cose.2024.103736.
- L. Gallo, D. Gentile, S. Ruggiero, A. Botta, and G. Ventre, “The human factor in phishing: Collecting and analyzing user behavior when reading emails,” Computers & Security, vol. 139, p. 103671, Apr. 2024, doi: 10.1016/j.cose.2023.103671.
- S. A. Hicks et al., “On evaluation metrics for medical applications of artificial intelligence,” Sci Rep, vol. 12, p. 5979, Apr. 2022, doi: 10.1038/s41598-022-09954-8.
- Jain, A. K., et al. (2022). A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterprise Information Systems, 16(4), 527-565.
- Sahoo, S. R., et al. (2020). Behavioral analysis to detect social spammer in online social networks (OSNs). In Computational Data and Social Networks: 9th International Conference, CSoNet 2020, Dallas, TX, USA, December 11–13, 2020, Proceedings 9 (pp. 321-332). Springer International Publishing.
- Jain, A. K., et al. (2018). Two-level authentication approach to protect from phishing attacks in real time. Journal of Ambient Intelligence and Humanized Computing, 9(6), 1783-1796.
Cite As
Rahaman M. (2024) Adaptive Phishing Detection Systems Using Online Learning Methods, Insights2Techinfo, pp.1