Leveraging Logistic Regression for Phishing Threat Identification

By: KUKUTLA TEJONATH REDDY, International Center for AI and Cyber Security Research and Innovations (CCRI), Asia University, Taiwan, tejonath45@gmail.com


Individuals and organisations are continually threatened by sophisticated phishing attacks hence cybersecurity needs continuous improvement. Logistic regression is one of the most efficient statistical methods whose advantages and drawbacks will be discussed in this paper. Therefore, logistics regression would be suitable in discriminating the relevant phishing patterns as it can deal with binary class issues. Logistic regression is an effective tool that can provide improved security against cybercrimes. This article also looks at how logistic regression can help detect phishing attacks that have become pervasive over time.


The modern phishing attacks have advanced greatly making their threats serious to any individual or organization. With time, cybercrime is gaining a higher level of sophistication and will therefore require large scale and advanced security controls. More advanced and specific phishing will now be revealed by the popular data science statistical approach – logistic regression. This paper will examine the implementation of logistic regression in phishing detection, heightening cybersecurity fortifications.

Understanding Logistic Regression:

Logistic Regression can be described as a statistic tool for solving binary classification issues. For phishing detection, the goal is to classify emails or web pages into two categories: legitimate or phishing [2]. Logistic Regression differs from the linear regression, which predicts a continuous outcome. It uses a logistic function-a Sigmoid, which converts the output to a value between zero and one denoting the probability of membership of the best member of the class [3].

Figure 1:Working of Logistic Regression

Feature Selection and Extraction:

Successful phishing detection with Logistic Regression depends on the characteristics on which the model is based. The algorithm extracts different features from this data, called features. There are several factors relevant for phishing detection such as structured URLs, domain age, availability of emails, IP addresses, and email content. This process is crucial for enhancing the model’s performance and curbing overfitting by employing optimal feature selection and extraction.

Data Preprocessing:

It is also imperative that the data be pre-processed in order to enhance the quality and suitability of the data for use with Logistic Regression. Handling of missing values, normalization of numerical features, recording categorical variables, and resampling of an unbalanced data set. Additionally, a good pre-processed dataset impacts on the precision and generalization of the model.

Training the Logistic Regression Model:

Having prepared the dataset, the next step will involve training of a Logistic regression model. The algorithm uses the weight of every feature to make the appropriate guess during the training procedure. To ensure adequate model accuracy, cross-validation methods are commonly utilized.

Evaluating Model Performance:

Finally, the performance of the trained model must be assessed with an independent testing data set. The common binary classification model indicators are accuracy, precision, recall and F1-scores. These metrics allow assessment of accuracy of distinction between real phishing and other non-phishing activities generated by the model.

Benefits of Logistic Regression in Phishing Detection:

Interpretation: The effects of all the components are interpreted clearly in logistic regression for classification results. In this case, it is vital to understand each indicator regarding a malware-based phishing attack.

Efficiency: Since logistics regression is computational efficient. thus, suitable for real time or near real time phishing detection systems.

Adaptation: Also, new phishing techniques that appear are quickly incorporated into this model making it easy to update [6] [4].


Logistic Regression has proven to be a valuable asset in phishing detection due to its simplicity and robust interpretation. By integrating this audit trail into an advanced cybersecurity strategy, organizations can enhance their security measures and stay one step ahead of cyber threats As phishing attacks continue to evolve, logistic regression will play an increasingly important role in strengthening cybersecurity.


  1. Bapat, R., Mandya, A., Liu, X., Abraham, B., Brown, D. E., Kang, H., & Veeraraghavan, M. (2018, April). Identifying malicious botnet traffic using logistic regression. In 2018 systems and information engineering design symposium (SIEDS) (pp. 266-271). IEEE.
  2. Soumya, T. R., Ramesh, P., Rosy, N. A., Pughazendi, N., Padmapriya, S., & Khilar, R. (2022, September). Logistic Regression based Machine Learning Technique for Phishing Website Detection. In 2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA) (pp. 683-686). IEEE.
  3. Chiramdasu, R., Srivastava, G., Bhattacharya, S., Reddy, P. K., & Gadekallu, T. R. (2021, August). Malicious url detection using logistic regression. In 2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS) (pp. 1-6). IEEE.
  4. Vanitha, N., & Vinodhini, V. (2019). Malicious-URL detection using logistic regression technique. International Journal of Engineering and Management Research, 9(6), 108-113.
  5. Rymarczyk, T., Kozłowski, E., Kłosowski, G., & Niderla, K. (2019). Logistic regression for machine learning in process tomography. Sensors, 19(15), 3400.
  6. Chui, K. T., Gupta, B. B., Jhaveri, R. H., Chi, H. R., Arya, V., Almomani, A., & Nauman, A. (2023). Multiround transfer learning and modified generative adversarial network for lung cancer detection. International Journal of Intelligent Systems, 2023, 1-14.
  7. Ahvanooey, M. T., Zhu, M. X., Li, Q., Mazurczyk, W., Choo, K. K. R., Gupta, B. B., & Conti, M. (2021). Modern authentication schemes in smartphones and IoT devices: An empirical survey. IEEE Internet of Things Journal, 9(10), 7639-7663.
  8. Chui, K. T., Gupta, B. B., Jhaveri, R. H., Chi, H. R., Arya, V., Almomani, A., & Nauman, A. (2023). Multiround transfer learning and modified generative adversarial network for lung cancer detection. International Journal of Intelligent Systems, 2023, 1-14.
  9. Mishra, A., Gupta, B. B., Peraković, D., Yamaguchi, S., & Hsu, C. H. (2021, January). Entropy based defensive mechanism against DDoS attack in SDN-Cloud enabled online social networks. In 2021 IEEE International Conference on Consumer Electronics (ICCE) (pp. 1-6). IEEE.
  10. Gupta, B. B., & Chaturvedi, C. (2019, July). Software defined networking (SDN) based secure integrated framework against distributed denial of service (DDoS) attack in cloud environment. In 2019 International Conference on Communication and Electronics Systems (ICCES) (pp. 1310-1315). IEEE.

Cite As

REDDY K.T (2023) Leveraging Logistic Regression for Phishing Threat Identification, Insights2Techinfo, pp.1

60210cookie-checkLeveraging Logistic Regression for Phishing Threat Identification
Share this:

Leave a Reply

Your email address will not be published.