Integrating User Behaviour Analytics for Enhanced Phishing Detection

By: Mosiur Rahaman, International Center for AI and Cyber Security Research and Innovations, Asia University, Taiwan

Abstract: Attacks like phishing are still a big problem for information security, so strong ways to find them need to be made. The article investigates how User Behaviour Analytics (UBA) can be used to make scam detection models more accurate and cut down on false positives. This method uses patterns in how users connect with each other to better tell the difference between good and bad activities.

Keywords: Phishing Detection, User Behaviour Analytics

Introduction: Attacks on computers still mostly use phishing, which takes advantage of people’s weaknesses. Despite improvements in security measures, phishing attacks keep changing, getting smarter and harder to spot. It’s hard for traditional detection methods that rely on static rule-based systems to keep up with attackers’ changing strategies, which results in a lot of false positives and poor security. New adaptive phishing tactics change their look and behaviour all the time to avoid being caught, which makes these old methods less effective [1].

Individuals are usually manipulated into giving away private data like login details or financial information during phishing attacks by pretending to be legitimate organizations. The success of these attacks depends on taking advantage of people, which is harder to stop with only technical methods. In response, more flexible and adaptable ways of finding fake emails have been created [2].

User Behaviour Analytics (UBA) is a potential way to improve phishing detection because it looks at how people interact with websites to find phishing-related patterns. In UBA, data on user behaviour is collected and analysed, including login times, email interactions, and web browsing habits, to set a baseline of normal activity. Differences from this standard can then be marked as possible threats. While standard methods focus on the phishing attempt’s content and features, UBA looks at how the user acts, adding an extra layer of security.

Integration of User Behaviour Analytics into phishing detection models to improve accuracy and lower false positives is looked at in this study. This method tries to better tell the difference between good and bad activities by using trends in how users interact with computer systems. Improving phishing detection models with UBA requires several important steps, such as gathering data, creating features, building models, and testing them [3].

Methodology:

Data Collection: Logs of what users did on different platforms, like email, the web, and network behaviour, were used to collect data. For training and testing reasons, this dataset was expanded to include known phishing and fake activities.

Feature Engineering:

Login Patterns: Attempts to log in how often and when they happen.

Email Interaction: Communication through email how long it takes to respond, how long it takes to read, and how often people click on links in emails.

Browsing Behaviour: Communication through email: how long it takes to respond, how long it takes to read, and how often people click on links in emails.

Network Activity: How much data is being sent, how often strange IP addresses are accessed, and how protocols are used are all examples of network activity.

Model Integration:

A hybrid model was made that combines UBA with standard methods for finding phishing emails.

A machine learning model that was trained on information about URLs, emails, and their contents. A recurrent neural network (RNN) is used in the UBA Layer to look at patterns of human behaviour and understand how they change over time during interaction sequences.

Training and Evaluation:

Supervised learning was used to teach the model on a named dataset. The performance was improved by using cross-validation and hyperparameter tuning. To measure how much the accuracy of spotting improved and how many false positives were found, evaluation metrics such as precision, recall, and the F1-score were used.

Simulation:

The dataset was split into two parts: a training set that was 70% filled and a testing set that was 30% filled. Samples of both phishing and real actions were evenly spread out in the training set. Regular features like URL, text, and metadata were used to train a Random Forest classifier as a baseline model. After training on the sequence of user interactions, an RNN with Long Short-Term Memory (LSTM) units was used for the UBA layer. Each sequence of interactions represented a session of user activities. To make the end prediction, a meta-classifier was used to combine the outputs of the baseline model and the UBA layer.

Evaluation:

To make sure the data were reliable, 10-fold cross-validation was used. How well the model worked was judged by its precision, memory, and F1-score. Comparing the baseline and blend models’ false-positive rates was the only way to measure the drop in false positives.

Results

The combined model showed big changes in how well it could find things. Some important results are:

  • Compared to the basic model, the UBA layer cut false positives by 25%.
  • The combined model got an F1-score of 0.92, which was higher than the standard score of 0.85 shown in figure 1.
  • The model can spot new phishing schemes, showing that the UBA method is flexible.

Data Set:

  • The dataset was made up of 1,000,000 user contacts from business settings, which included 50,000 known cases of phishing.
  • Email logs, web server logs, and records of network traffic were some of the data sources.

Training:

The dataset was split into 70% subsets for training and 30% subsets for testing. The training set had a good mix of examples of both phishing and real actions. Based on the baseline model, a Random Forest classifier was trained on standard features like URL, text, and metadata.

An RNN with Long Short-Term Memory (LSTM) units was trained on the series of user interactions. Each series of interactions represents a session of activities by the user. To make the end prediction, the outputs of the baseline model and the UBA layer were put together using a meta-classifier.

10-fold cross-validation was done to make sure that the data were reliable. To measure how well the model worked, we found the precision, recall, and F1-score. The drop in false positives was measured by looking at the difference between the baseline and blend models’ false-positive rates.

Figure 1:The performance of the baseline phishing detection model and the UBA-enhanced model.

Table 1:Comparison with Recent Research

Study

Methodology

Data Source

Key Findings

F1 Score

False Positive Rate

Proposed UBA-Enhanced Model

Hybrid (UBA + Traditional)

Email, web, and network logs

25% reduction in false positives; robust to evolving threats

0.92

11%

[3]

Content-based analysis + URL inspection

Email metadata

High accuracy for known phishing; struggles with novel attacks

0.87

18%

[4]

Machine learning with feature extraction

Email and URL data

Good performance with static features; high false positives

0.84

20%

[5]

Heuristic-based detection

Email content

Fast detection; high false negatives and positives

0.78

20%

[6]

Behavioural biometrics

Network and device usage logs

Effective for insider threats; limited phishing detection

0.80

17%

Conclusion:

Combining User Behaviour Analytics into phishing detection models greatly improves detection accuracy and minimizes false positives. This technique utilizes patterns in user interactions to offer a flexible and strong defence against phishing attacks. Subsequent studies should investigate the real-time integration of UBA-enhanced phishing detection and assess its scalability in expansive organizational settings. In addition, using supplementary data sources, such as interactions from social media, could yield a more comprehensive assessment of user activity.

Reference:

  1. A. Aleroud and L. Zhou, “Phishing environments, techniques, and countermeasures: A survey,” Computers & Security, vol. 68, pp. 160–196, Jul. 2017, doi: 10.1016/j.cose.2017.04.006.
  2. Z. Alkhalil, C. Hewage, L. Nawaf, and I. Khan, “Phishing Attacks: A Recent Comprehensive Study and a New Anatomy,” Front. Comput. Sci., vol. 3, Mar. 2021, doi: 10.3389/fcomp.2021.563060.
  3. L. Gallo, D. Gentile, S. Ruggiero, A. Botta, and G. Ventre, “The human factor in phishing: Collecting and analyzing user behavior when reading emails,” Computers & Security, vol. 139, p. 103671, Apr. 2024, doi: 10.1016/j.cose.2023.103671.
  4. S. Mohmmed et al., “A Two-Stage Hybrid Approach for Phishing Attack Detection Using URL and Content Analysis in IoT,” presented at the BIO Web of Conferences, Apr. 2024. doi: 10.1051/bioconf/20249700059.
  5. S. Dangwal and A.-N. Moldovan, “Feature Selection for Machine Learning-based Phishing Websites Detection,” in 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), Jun. 2021, pp. 1–6. doi: 10.1109/CyberSA52016.2021.9478242.
  6. C. M. R. da Silva, E. L. Feitosa, and V. C. Garcia, “Heuristic-based strategy for Phishing prediction: A survey of URL-based approach,” Computers & Security, vol. 88, p. 101613, Jan. 2020, doi: 10.1016/j.cose.2019.101613.
  7. R. Nasir, M. Afzal, R. Latif, and W. Iqabl, “Behavioral Based Insider Threat Detection Using Deep Learning,” IEEE Access, vol. PP, pp. 1–1, Oct. 2021, doi: 10.1109/ACCESS.2021.3118297.
  8. Abd El-Latif, A. A., et al. (Eds.). (2023). Artificial Intelligence for Biometrics and Cybersecurity: Technology and Applications. IET.
  9. Almomani, A., et al. (2022). Phishing website detection with semantic features based on machine learning classifiers: a comparative study. International Journal on Semantic Web and Information Systems (IJSWIS), 18(1), 1-24.
  10. Jain, A. K., & Gupta, B. B. (2022). A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterprise Information Systems, 16(4), 527-565.

Cite As

Rahaman M (2024) Integrating User Behaviour Analytics for Enhanced Phishing Detection, Insights2Techinfo, pp.1

71380cookie-checkIntegrating User Behaviour Analytics for Enhanced Phishing Detection
Share this:

Leave a Reply

Your email address will not be published.