Exploratory Data Analysis (EDA)

By: A. Mishra

EDA is a technique for analyzing data that makes use of visual techniques. By using statistical summary and graphical representations, it can be utilized to discover trends and patterns, as well as to test assumptions and hypotheses. As well as assisting in the identification of obvious errors, it can also aid in the better understanding of patterns within the data, the detection of outliers or unexpected events, and the finding of interesting relationships between the variables, among other things[1-2].

There are various types of EDA

Types of EDA are as follows [3]: 

• Univariate analysis

The study of univariate data, as a result, is the easiest sort of data analysis because the information is concerned with only one variable that is subject to variability. Unlike other types of analysis, this sort of analysis does not concern itself with causes or links; rather, its major objective is to characterize the data and discover patterns that can be found within it.

• Bivariate analysis

It consists of two separate variables that are analyzed together. The analysis of this type of data is concerned with causes and relationships, and the analysis is carried out in order to determine the relationship between the two variables under consideration.

• Multivariate analysis

When there are three or more variables in the data, the data is classified as multivariate.

Exploratory Data Analysis Tools

The language is an interpreted object-oriented programming language with dynamic semantics, and it can be used to develop EDAs[4].

Python as scripting or glue language to tie together previously produced components, it is particularly appealing because of its high-level, built-in data structures, as well as its dynamic typing and binding. It is also quite desirable for rapid application construction. Missing values in a data collection may be discovered by combining Python with EDA, which is important since it lets you choose how missing values are treated in machine learning applications, which is important.

R is an open-source programming language and free software environment for statistical computing and graphics, which is maintained by the R Foundation for Statistical Computing. R is used extensively in the field of statistical computing and graphics. R is a programming language that statisticians in the field of data science typically employ for the production of statistical observations and the analysis of vast volumes of data.

EDA Applied in:

Data Sciences

Data Analysics

Business Analysis

Data Mining

References:

[1] Sahoo, S. R. et al. Classification of various attacks and their defence mechanism in online social networks: a survey. Enterprise Information Systems13(6), 832-864.

[2] Nguyen, G. N. et. al (2021). Secure blockchain enabled Cyber–physical systems in healthcare using deep belief network with ResNet model. Journal of Parallel and Distributed Computing153, 150-160.

[3] García-Peñalvo, et. al. (2021). A Survey on Data mining classification approaches.

[4] Gudivada, A., et al. (2020). Developing concept enriched models for big data processing within the medical domain. International Journal of Software Science and Computational Intelligence (IJSSCI)12(3), 55-71.

43100cookie-checkExploratory Data Analysis (EDA)
Share this:

Leave a Reply

Your email address will not be published.