Dimensionality Reduction: Feature Extraction and Feature Selection

In order to reduce the number of features, also known as dimensionality reduction, there are two approaches that can be used.

Feature Extraction
Feature Selection

Feature Extraction

it is intended to modify the features and generate new ones by combining them with the raw/provided features. It is the goal of feature extraction to reduce the number of features in a dataset by developing new features from the existing ones in a dataset and then discarding the original features[1-2].

Regularization can obviously assist to limit the risk of overfitting, but utilizing Feature Extraction techniques instead can result in a variety of other benefits, such as accuracy gains, for example.

Overfitting can result in risk reduction.

Increase your training speed.

Data visualization has been improved.

Our model’s explainability has improved as a result.

The Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Multidimensional Scaling are the most often used methodologies. The libraries in python are as following:

from sklearn.decomposition import PCA

from sklearn.decomposition import TruncatedSVD

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

2. Feature Selection

It can be thought of as a pre-processing phase because it does not introduce any new features, but rather picks a subset of the raw features that are more interpretable[3-4].

Finding the most valuable characteristics in a large initial number can aid in the extraction of valuable information and the discovery of new knowledge.

When it comes to classification difficulties, the significance of features is evaluated in terms of their ability to address diverse classes of problems.

The term “feature relevance” refers to the attribute that provides an evaluation of the usefulness of each feature in distinguishing between different classes.

There are a variety of goals in feature selection.

It eliminates irrelevant and noisy features while retaining the ones that have the least amount of redundancy and the greatest amount of relevance to the target variable.
The reduction in computational effort and complexity associated with training and testing a classifier results in more cost-effective models as a result of this.
It improves the effectiveness of learning algorithms, prevents overfitting, and aids in the development of more general models.

References:

Alweshah, M. et al. (2020). The monarch butterfly optimization algorithm for solving feature selection problems. Neural Computing and Applications, 1-15.
Hammad, M., et al. (2021). Myocardial infarction detection based on deep neural network on imbalanced data. Multimedia Systems, 1-13.
García-Peñalvo, et al. (2021). A Survey on Data mining classification approaches.
Jain, A. K., et al. (2018). Rule-based framework for detection of smishing messages in mobile environment. Procedia Computer Science, 125, 617-623.

431800cookie-checkDimensionality Reduction: Feature Extraction and Feature Selection

Post Views: 457

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Dimensionality Reduction: Feature Extraction and Feature Selection

Leave a Reply Cancel reply

Smart grid and cyber defences

Revolutionizing Healthcare: The Role of Machine Learning in IoMT

Revolutionizing Software Engineering using Quantum Computing

AGILE METHODOLOGIES IN THE ERA OF MACHINE LEARNING DEVELOPMENT

The Marvels of Large Language Models: Unleashing The Power of Generative AI

The differences between Edge Computing and Federated Learning

Evaluating the Efficacy of Phishing Detection Models in Multi-Lingual Environments

Cross-Platform Phishing Detection: Applying Unified Models across Email and Web

Adaptive Phishing Detection Systems Using Online Learning Methods

Real-Time Phishing Detection: Challenges and Solutions in Streaming Data

Incorporating NLP Techniques to Enhance Contextual Understanding in Phishing Detection