By: Reka Rius, CCRI, Asia University, Taiwan

Abstract

The behaviour of phishing attacks that mimic legitimate website pages can be detected using several approaches such as page-based similarity. however this approach will fail if the attacker uses images or embedded objects instead of HTML text. image-based similarity, this approach also has drawbacks such as requiring images to have the same aspect ratio. in addition, it is limited when segmenting pages with complex backgrounds. The visual similarity classification method shows the results of the experiments reveal the wHash mechanism with a color histogram is more accurate than the currently used perceptual Hashing(pHash) mechanism. The accuracies of SIFT technique are 97.93%, 98.61%, and 99.95% related to Microsoft, Dropbox, and Bank of America data, respectively. Additionally, based on qualitative analysis of the successful cases of Visual- PhishNet, the network identified easy phishing pages (highly similar to pages in training), and more importantly, phishing pages that were partially copied, obfuscated, or unseen.

Keywords Visual Similarity-Based, Phishing Detection

Introduction

Phishing pages impersonate legitimate websites without permission [7], Attack- ers replicate authentic sites of financial services or social media (e.g., PayPal, Facebook), copying visual elements (e.g., logos and layouts) to trick users into revealing sensitive credentials [5].

Approaches that have been proposed such as page-based similarity and image-based similarity have a significant weakness. Page-based similarity ap- proach will fail if attackers used images or embedded objects instead of HTML text [14]. They are also vulnerable to code obfuscation techniques where a different code produces similar rendered images [4, 6].

Image-based similarity approaches have disadvantages such as required the images to have same aspect ratio [6], limited when segmenting pages with com- plex backgrounds [2].these approaches assumed a fixed location for the website logo which could be bypassed (visual).This article discuses about the visual similarity-based.

Method

This section focuses on methods used in visual similarity approach, as follows:

visual similarity classification

In this method, the similarity levels are categorized into three groups: very similar, locally similar, and non-imitating. Figure 1 shows the visual similarity classification phishing detection process.

Very Similar Cases

Locality-Sensitive Hashing (LSH) is used to detect pages that have a high level of similarity. In cases of very high similarity, the screenshots of whole webpages must be compared. To improve the rate of comparison without loss of accuracy, an LSH method wavelet Hashing (wHash) mechanism with a color histogram is proposed herein [3]. Table 1 shows the summary of similarity metrics used for phishing detection.

Table 1: Summary of similarity metrics used for phishing detection

No.	Target Website	Similarity Type	Threshold	Mechanism	Number of Websites
1	Microsoft	Contour similarity α	≥ 0.85	wHash	1661
2	Microsoft	Color similarity β	≥ 0.78	wHash	1661
3	Microsoft	pHash similarity	≥ 0.65	pHash	1661
4	Dropbox	Color similarity β	≥ 0.94	wHash	1843
5	Dropbox	pHash similarity	≥ 0.78	pHash	1843
6	Bank of America	Contour similarity α	≥ 0.85	wHash	1867
7	Bank of America	Color similarity β	≥ 0.78	wHash	1867
8	Bank of America	pHash similarity	≥ 0.75	pHash	1867

Local Similar Cases

Scale-Invariant Feature Transform (SIFT) is used to detect pages with similarity level categorized as locally similar.

A SIFT technique is based on the feature of local appearance at a point of interest on an object. Such features of an image are independent of its size and rotation [3]. Table 2 shows the SIFT performance using unbalanced dataset.

Table 2: Overall performance using unbalanced dataset

Target Webpage	Imitation Webpages	Match Points	Detected Webpages	Accuracy	Precision
Microsoft	393	3	363	98.14%	99.17%
Dropbox	207	3	180	98.61%	100.00%
Bank of America	152	11	151	99.95%	100.00%

Similarity Learning Based on Deep Learning

Visual similarity-based phishing detection relies on whether there is a high visual resemblance between the visited web page and one of the trusted websites, despite having a different domain. if the visited page is not sufficiently similar to any site in the trusted list, it will be classified as a legitimate page with a valid identity [1]. Similarity learning based on deep learning implements a triplet network to detect visual similarity between websites. The dataset used is VisualPhish (155 websites with 9,363 screenshots)[1]. Table 3 shows the performance of VisualPhishNet compared to other methods.

Table 3: Our experiments to compare VisualPhishNet ’s performance against prior methods and alternative baselines.

Method	Top-1 Match	ROC Area
*VisualPhishNet*	81.03%	0.9879
VGG16	51.32%	0.8134
ResNet50	52.21%	0.7008
ORB	24.9%	0.6922
HOG	27.61%	0.58
SURF	6.55%	0.488

Conclusion

Based on qualitative analysis of the successful cases of VisualPhishNet,the net- work identified easy phishing pages (highly similar to pages in training), and more importantly, phishing pages that were partially copied, obfuscated, or un- seen. In the case of ‘very similar’, the wHash mechanism with the color histogram has a higher accuracy than the pHash mechanism, and the former is more stable than the pHash mechanism. In the case of ‘local similar’, logo detection by SIFT technique is a suitable choice. This study also adds a cache to reduce the detection time, increasing the detection speed up to 4.6 times. In a complete test with imbalanced data, the accuracies of Microsoft, Dropbox, and Bank of America data were 98.14%, 98.61%, and 99.95% separately. However, the performance difference is not obvious in a complete test with balanced data. The threshold setting and processing speed should be discussed in the future.

References

Sahar Abdelnabi, Katharina Krombholz, and Mario Fritz. Visualphishnet: Zero-day phishing website detection by visual similarity. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pages 1681–1698, 2020.
Ahmet Selman Bozkir and Ebru Akcapinar Sezer. Use of hog descriptors in phishing detection. In 2016 4th International Symposium on Digital Forensic and Security (ISDFS), pages 148–153. IEEE, 2016.
Jiann-Liang Chen, Yi-Wei Ma, and Kuan-Lung Huang. Intelligent visual similarity-based phishing websites detection. Symmetry, 12(10):1681, 2020.
Anthony Y Fu, Liu Wenyin, and Xiaotie Deng. Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (emd). IEEE transactions on dependable and secure computing, 3(4):301–311, 2006.
Fujiao Ji, Kiho Lee, Hyungjoon Koo, Wenhao You, Euijin Choo, Hyoung- shick Kim, and Doowon Kim. Evaluating the effectiveness and robust-ness of visual similarity-based phishing detection models. arXiv preprint arXiv:2405.19598, 2024.
Ieng-Fat Lam, Wei-Cheng Xiao, Szu-Chi Wang, and Kuan-Ta Chen. Coun- teracting phishing page polymorphism: An image layout analysis approach. In International conference on information security and assurance, pages 270–279. Springer, 2009.
Colin Whittaker, Brian Ryner, and Marria Nazif. Large-scale automatic classification of phishing pages. In Ndss, volume 10, page 2010, 2010.
Sedik, A., Maleh, Y., El Banby, G. M., Khalaf, A. A., Abd El-Samie, F. E., Gupta, B. B., … & Abd El-Latif, A. A. (2022). AI-enabled digital forgery analysis and crucial interactions monitoring in smart communities. Technological Forecasting and Social Change, 177, 121555.
Agrawal, D. P., Gupta, B. B., Yamaguchi, S., & Psannis, K. E. (2018). Recent Advances in Mobile Cloud Computing. Wireless Communications and Mobile Computing, 2018.
Goyal, S., Kumar, S., Singh, S. K., Sarin, S., Priyanshu, Gupta, B. B., … & Colace, F. (2024). Synergistic application of neuro-fuzzy mechanisms in advanced neural networks for real-time stream data flux mitigation. Soft Computing, 28(20), 12425-12437.
Kulkarni, A. D., & Brown III, L. L. (2019). Phishing websites detection using machine learning.
Kumar, J., Santhanavijayan, A., Janet, B., Rajendran, B., & Bindhumadhava, B. S. (2020, January). Phishing website classification and detection using machine learning. In 2020 international conference on computer communication and informatics (ICCCI) (pp. 1-6). IEEE.
Zamir, A., Khan, H. U., Iqbal, T., Yousaf, N., Aslam, F., Anjum, A., & Hamdani, M. (2020). Phishing web site detection using diverse machine learning algorithms. The Electronic Library, 38(1), 65-80.

Cite As

Rius R. (2025) Visual Similarity-Based Phishing Websites Detection, Insights2Techinfo, pp.1

869200cookie-checkVisual Similarity-Based Phishing Websites Detection

Post Views: 98

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Visual Similarity-Based Phishing Websites Detection

Abstract

Introduction

Method

visual similarity classification

Very Similar Cases

Local Similar Cases

Similarity Learning Based on Deep Learning

Conclusion

References

Cite As

Leave a Reply Cancel reply

Detecting and Preventing Phishing Attacks in IoT-Based Smart Healthcare Systems

Data-Driven Insights into Rare Disease Diagnosis and Treatment with AI

Genetic Algorithms and Data Analytics for Cybersecurity in Phishing and Blockchain Systems

Machine Learning in Biometric Security Systems

The Role of AI and Machine Learning in Cloud Storage

How AI is Revolutionizing Cyber Forensics

Edge AI Security: Protecting Tiny Models with Big Impact

Memory in Conversational AI Agents: The Backbone of Long-Term Intelligence

The Future of Remote Work and Hybrid Models in 2025

Photonic AI Processors: Architectures, Applications, and Limitations

Neuro-Symbolic AI: The Comeback of Logic in an LLM World