By: KUKUTLA TEJONATH REDDY, International Center for AI and Cyber Security Research and Innovations (CCRI), Asia University, Taiwan, tejonath45@gmail.com
Abstract:
Phishing attacks are an ever-present threat in today’s digital landscape, requiring sophisticated detection techniques. This article examines the application of the k-Nearest Neighbors (k-NN) machine learning algorithm in phishing detection. Considering the similarity between instances, k-NN classifies web pages based on features such as URL length, HTTPS usage, and unique characters. The algorithm’s flexibility, real-time capability, and translation make it a valuable tool for cybersecurity. This article explores the application of k-NN in detail, discussing feature selection, data preprocessing, telemetry, and critical selection of ‘k’. Emphasizing benefits such as adaptability and real-time detection, the article also addresses challenges such as scalability and the importance of robust feature engineering By understanding and modifying k-NN parameters, cybersecurity professionals can empower them if identifying and mitigating phishing threats has been enhanced, and helped create more secure digital environments.
Introduction:
Phishing attacks continue to be a common threat in the digital landscape, targeting individuals and organizations. As cybercriminals become increasingly sophisticated, the need for robust and effective phishing detection methods has never been greater. In this article, we explore the use of k-Nearest Neighbors (k-NN), a machine learning technique, in phishing detection [1].
Understanding k-Nearest Neighbors:
k-Nearest Neighbors is a supervised machine learning algorithm for classification and regression tasks. For phishing detection, it’s based on the principle that similar instances or data points can work in the same category. In k-NN, ‘k’ refers to the number of nearest neighbors considered during prediction [2][3].
How k-NN Works in Phishing Detection:
Feature Selection:
- Phishing detection involves analysing a website’s features or features for legitimacy.
- Attributes can include URL length, HTTPS availability, number of lines in the URL, and use of special characters.
Pre-processing data:
- Training the K-NN model requires a dataset with labelled examples of phishing and relevant websites.
- To check the efficiency of the model, the dataset is divided into training and testing sets.
Distance Metric:
- k-NN relies on distance measures (e.g., Euclidean distance) to measure similarity between data points.
- The algorithm calculates the distance between the feature vectors of the instances in the data set.
Choosing ‘k’:
- The value of ‘k’ determines the number of nearest neighbors considered for classification.
- The optimal ‘k’ value is usually determined through cross-validation methods to avoid overfitting or poor fit.
Decision Rules:
- The set of k-nearest neighbors is assigned to the new model, determining its classification as phishing or legitimate.
Advantages of k-NN in phishing detection:
Flexibility:
k-NN is flexible across features and can handle both statistical and categorical data, making it versatile for phishing detection.
Real-time detection:
The simplicity and efficiency of the k-NN algorithm enable real-time phishing detection, which is important in combating immediate threats.
Speaking Ability:
k-NN’s decision-making process is transparent, making it easy to understand and interpret, which is important for cybersecurity professionals.
Challenges and Considerations:
How to change:
k-NN may face scalability challenges, especially when dealing with large data sets.
Key Technologies:
The effectiveness of k-NN depends largely on the choice of features and technology, which requires a thorough understanding of phishing characteristics.
Conclusion:
As phishing attacks become more sophisticated, the use of advanced technologies such as machine learning will become increasingly important to enhance detection. k-Nearest Neighbors, simple and flexible, is proving to be a valuable tool in the cybersecurity arsenal. By understanding its principles and adapting its concepts, security professionals can enhance their ability to identify and prevent phishing threats, ultimately contributing to a more secure digital environment
References:
- Aswani, R., Ghrera, S. P., Kar, A. K., & Chandra, S. (2017). Identifying buzz in social media: a hybrid approach using artificial bee colony and k-nearest neighbors for outlier detection. Social Network Analysis and Mining, 7, 1-10.
- Zareapoor, M., & Shamsolmoali, P. (2015). Application of credit card fraud detection: Based on bagging ensemble classifier. Procedia computer science, 48(2015), 679-685.
- Mehbodniya, A., Alam, I., Pande, S., Neware, R., Rane, K. P., Shabaz, M., & Madhavan, M. V. (2021). Financial fraud detection in healthcare using machine learning and deep learning techniques. Security and Communication Networks, 2021, 1-8.
- Wang, X., Wang, X., Wilkes, M., Wang, X., Wang, X., & Wilkes, M. (2021). A k-nearest neighbour spectral clustering-based outlier detection technique. New Developments in Unsupervised Outlier Detection: Algorithms and Applications, 147-172.
- Wang, Y., Cao, X., & Li, Y. (2022). Unsupervised outlier detection for mixed-valued dataset based on the adaptive k-nearest neighbor global network. IEEE Access, 10, 32093-32103.
- Ogunsuyi Opeyemi, J., & Adebola, K. (2022). K-nearest neighbors bayesian approach to false news detection from text on social media. Int. J. Educ. Manag. Eng, 12, 22-32.
- Poonia, V., Goyal, M. K., Gupta, B. B., Gupta, A. K., Jha, S., & Das, J. (2021). Drought occurrence in different river basins of India and blockchain technology based framework for disaster management. Journal of Cleaner Production, 312, 127737.
- Gupta, B. B., & Sheng, Q. Z. (Eds.). (2019). Machine learning for computer and cyber security: principle, algorithms, and practices. CRC Press.
- Singh, A., & Gupta, B. B. (2022). Distributed denial-of-service (DDoS) attacks and defense mechanisms in various web-enabled computing platforms: issues, challenges, and future research directions. International Journal on Semantic Web and Information Systems (IJSWIS), 18(1), 1-43.
- Almomani, A., Alauthman, M., Shatnawi, M. T., Alweshah, M., Alrosan, A., Alomoush, W., & Gupta, B. B. (2022). Phishing website detection with semantic features based on machine learning classifiers: a comparative study. International Journal on Semantic Web and Information Systems (IJSWIS), 18(1), 1-24.
Cite As
REDDY K.T (2023) Unveiling the Power of k-Nearest Neighbors in Phishing Detection, Insights2Techinfo, pp.1