Predictive Analytics: Building and Evaluating Predictive Models for Business Intelligence

By: Vanshika Bhardwaj1, Adilah Widiasti2, Alfina Febiani2

1 Chandigarh College of Engineering & Technology, Chandigarh, India; vanshika7april@gmail.com

2 Department of computer science, Esa Unggul University, Indonesia; adilahwidiasti86@gmail.com; alfinafebiani@gmail.com

Abstract. The main goal is to investigate business analytics, which is an important combination of knowledge, tools, and processes that evaluates the data and performance of an organization. This study focuses on predictive analytics, a subfield of business analytics that forecasts future trends or estimates the chance of specific events using machine learning, statistical combinations, and input data. Enhancing overall corporate performance through well-informed decision-making is the ultimate objective. Despite having a long history, predictive analytics—which includes methods like data mining and big data analytics—became very popular in the late 20th century. This paper examines the use of supervised classification tool, the decision tree methodology, in predictive business analytics for various business applications.

Keywords: Big data, prescriptive analysis, naive Bayes, pruning, machine learning, decision tree, business analytics, predictive analytics.

1 Introduction

Business analysis is a broad discipline that employs methods, tools, and applications to evaluate the data and operations of an organization in order to support data-driven choices about investments and future course. Data-driven analytics is essential for big businesses in manufacturing, IT, marketing, and logistics in the big data era to gain a competitive edge and comprehend customer behavior. Predictive, prescriptive, and descriptive analytics are all included in BA. A key component of managing and enhancing business processes is descriptive analytics, which analyzes historical data to spot trends. It has been essential to understanding the impact of COVID-19, community food insecurity, weather-electricity correlations, and consumer behavior [1-3].

Prescriptive analytics uses computational and mathematical methods to improve real-time solutions, reduce bias in decision-making, and optimize results. Prescriptive analytics has shown promise in the fields of sports planning, stock market forecasting, and healthcare, where it has helped to lower medical expenses, minimize injuries, and maximize investment returns. By using statistics to predict future trends, predictive analytics, or PA, enhances business performance. The nuances of PA will be covered in more detail in the section that follows.

2 Business Analysis and Predictive Analytics Integration

2.1 Business Analysis

Business analysis (BA) is the combination of techniques, tools, and applications used to examine the data and performance of an organization in order to support data-driven decision-making on future business strategy and investment plans[4]. Corporations embracing a data-driven approach consider their data as valuable assets, actively seeking opportunities to transform it into a competitive advantage over rivals. In the current big data age, data-driven analytics is crucial for large organizations in sectors including manufacturing, information technology, marketing, and logistics [5-6]. The focus lies in understanding consumer spending and behavior to optimize profits, marking a strategic shift toward data-driven decision-making in various industries.

Prescriptive, predictive, and descriptive analytics are the three main categories of analytics that make up business analysis (BA) [5]. Using descriptive analytics, one can identify important trends and patterns by analyzing historical datasets over a given period of time. The aforementioned procedure comprises a comprehensive examination of the information available to reveal insights like failure modes, operational values, and event occurrences. Descriptive analytics is, at its core, a technique that uses real-time data to offer insights that support companies in managing and enhancing their operational procedures without depending on specific references[6].

In order to improve future resilience, descriptive analytics (DA) has proven to be a valuable tool in comprehending the impact of COVID-19. It has done this by looking at things like how workspaces are organized, how consumer behavior has changed, and how global value chains, e-supply networks, marketing, and operations have been affected. Furthermore, DA has been utilized to acquire understanding of Food insecurity in households among African Americans, taking into account factors such as food circumstances, attributes, and inhabitants’ perspectives in this food milieu. The information gathered and examined using descriptive analytics indicates that a sizable segment of the African American population regularly faces food insecurity, notably those from lower-class backgrounds, those not having access to a car and those reliant on food assistance programs.

Using descriptive analytics, the relationship between weather reports and power use in Melaka’s academic buildings was examined. According to the analysis, places with more rainfall typically have lower electricity usage. Descriptive analytics was also used to look into what influences Nepalese consumers’ intentions to buy smartphones. The research indicates that purchase intentions are primarily influenced by price, with brand personality and features having a less significant impact.

Prescriptive analytics makes use of mathematical or computational techniques to enhance performance and maximize outcomes in certain circumstances. This method looks at opportunities within decisions, correlations between decisions, and the influences influencing these decisions in addition to striving for the best outcomes. Finding the best real-time solution is the ultimate objective [4].

Prescriptive analytics has been employed to better sports planning by decreasing human specialists’ cognitive biases that might result in injuries from training. Similarly, prescriptive analytics has been used to analyze the stock flow pattern in order to predict the stock market. By reducing associated risks, this application helps stockbrokers make more profitable investments on the stock platform.

Prescriptive analytics makes use of mathematical or computational methods to achieve the best possible results in a particular situation, improving performance all around. In order to arrive at the best real-time solution, it further investigates opportunities within a decision, correlations between decisions, and influences affecting these decisions.

By reducing human expert cognitive biases that could result in training-related injuries, prescriptive analytics has been used to enhance sports planning. Likewise, prescriptive analytics has been applied to the analysis of stock flow patterns in stock market prediction [5]. By reducing associated risks, this application helps stockbrokers make more profitable investments on the stock platform.

Predictive analytics (PA) utilizes statistical methods to anticipate future trends or outcomes based on current available data, aiming to enhance the overall performance of a corporation. The subsequent section will provide a more detailed explanation of PA.

2.2 Predictive Analysis

Prescriptive analytics uses computational and mathematical methods to improve real-time solutions, reduce bias in decision-making, and optimize results. Prescriptive analytics has shown promise in healthcare, stock market forecasting, and sports planning as a means of lowering IPredictive analytics (PA) is a specialized field that forecasts future trends or outcomes and makes specific event predictions using machine learning statistics, statistical combinations, and input data [6]. This study explores PA’s ability to identify correlations between variables, evaluate risks, and allocate scores or weights in order to predict future patterns. Businesses successfully interpret big data for strategic advantages by utilizing PA [7].

The PA methodology enables businesses to take a proactive, forward-looking stance by projecting outcomes and behaviors based on data rather than conjecture. For the benefit of users, it offers practical instructions in addition to forecasts[8]. It is intended for PA to be incorporated into business applications, extending beyond its traditional boundaries as the sole purview of statisticians and mathematicians. Businesses are compelled to adopt PA because of things like the increasing amount and variety of data, the need to use existing data for predictive insights, the accessibility of fast, affordable computers, intuitive software, and the necessity to gain a competitive advantage in difficult economic times[9].

The process of applying PA requires a number of careful steps, such as identifying the project, specifying deliverables, defining scope, coordinating with business goals, and choosing pertinent datasets for prediction [8]. A thorough picture of customer interactions must be created by correlating data from various sources, which is why the data collection phase is so important.

3 Decision Tree

The decision tree is used as a supervised categorization tool in decision-making, applying particular conditions to group data sets. It is a well-known and powerful tool that is frequently used in many different domains, such as image processing, machine learning, data mining, and pattern recognition, and it has a short interpretability learning curve[10]. Compared to algorithms such as Naive Bayes, Random Forest, Support Vector Machine, and Logistic Regression, the decision tree is said to be the most easily interpretable supervised machine learning algorithm. This distinction is justified by its dependence on simple mathematics, which eliminates the need for complex formulas and statistical understanding.

The decision tree (DT) is a tree-based method in which every path that begins at the root is defined by a series of data partitioning stages that, at the leaf node, result in a Boolean outcome (either true or false). At each level, there are a series of questions in this process, and split points—which can be discrete values, ranges, or probability distributions—are derived from these questions. DT demonstrates adaptability when working with a variety of datasets that combine numerical and/or categorical variables, and it handles missing data in a given column with ease [10]. Furthermore, because DT can be used with a variety of input data types, it is useful for identifying important variables in outcome prediction[ 11].

The decision tree (DT) organizes features into columns for each row, which is referred to as a record, and processes input data in a row format. A class label is given to each row that corresponds to the target that it has been assigned[12]. For every record that it contains, every node in the DT structure has a distinct count of values associated with the target class. The DT structure is composed of nodes and branches [13]. The target class in its distribution with the greatest number of records is shown first by the node.

Fig. 1. Decision Tree

The root node, is the starting point at the top of the tree in DT’s node classification and hierarchy, displaying all current records. An extensive system of branches connected to internal nodes extends from the parent node [14].

Since they are connected by branches, internal nodes, which originate from the parent node, are readily recognized [15]. These internal nodes have branches that link to leaf nodes or other internal nodes. In contrast, leaf nodes indicate the termination of nodes and the outcomes of decisions or occurrences because they have branches going into them but none going out[16]. A split dataset is indicated by branches within nodes, which are frequently connected to questions posed in response to the branch’s description. Binary or range modes are two possible formats for decision tree (DT) splits, which combine several answers derived from each input trait in the DT [17].

The following procedure iteratively splits input data into smaller groups depending on the class label used to build decision trees (DTs). To find the split at each node, the algorithm depends on a measure of data impurity[17-18]. Different impurity modes determine the splitting of nodes according to different types of impurities, such as Gini impurity, entropy, information gain, and classification error[19]. The child nodes’ weighted average of impurities is used to determine the overall impurity for the split after analyzing the impurities for each child node[20]. To make the content more consistent, child nodes with respect to the outcome variable, the process is repeated for every node[21]. After the split is finished, a node is regarded as a leaf node.

Split stopping is used to guard against overfitting and guarantee the DT’s dependability. By doing this, the tree is kept from growing too much, which would have reduced its generalizability[22-23]. The stopping process comprises parameters like the depth (number of steps) of any leaf from the root node and the minimal amount of records in a leaf and node prior to splitting.Furthermore, pruning is another tactic to stop overfitting[24-25]. When pruning, a large tree called the DT is first grown, and nodes that contribute less new information are then removed[26]. The process of choosing the optimal sub-tree includes techniques like utilizing a validation dataset and assessing prediction errors. Both forward and backward pruning are terms used to describe pre- and post-pruning are two more subcategories of pruning[27-28].

4 Conclusion

To sum up, business analytics—especially predictive analytics—represents a tried-and-true process for anticipating and obtaining important insights. This primer provides a foundational understanding of predictive analytics and is especially helpful for decision tree users. In order to maximize their synergy, decision trees (DT) and predictive analytics (PA) will be integrated into new industries like manufacturing, transportation, medical, and supply chain. Because of its approachable nature, DT has the potential to be a very powerful PA tool even for those without extensive statistical knowledge. To take into account the most recent findings and developments in business analytics, more research is required. One of the most notable drawbacks of DT is that predictive analytics cannot be conducted without first knowing the target (predicted data) and input data.

References

  1. Gupta, A., Singh, S. K., Gupta, B. B., Chopra, M., & Gill, S. S. (2023). Evaluating the Sustainable COVID-19 Vaccination Framework of India Using Recurrent Neural Networks. Wireless Personal Communications, 1-19.
  2. Gupta, A., Singh, S. K., Chopra, M., & Gill, S. S. (2022). An inquisitive prospect on the shift toward online media, before, during, and after the COVID-19 pandemic: a technological analysis. In Advances in Data Computing, Communication and Security: Proceedings of I3CS2021 (pp. 229-238). Singapore: Springer Nature Singapore.
  3. Aggarwal, K., Singh, S. K., Chopra, M., & Kumar, S. (2022). Role of social media in the COVID-19 pandemic: A literature review. Data mining approaches for big data and sentiment analysis in social media, 91-115.
  4. Balkan, S., & Goul, M. (2010). Advances in predictive modeling: how in-database analytics will evolve to change the game. Business Intelligence Journal, 15(2), 17-25.
  5. Chopra, M., Singh, S. K., Gupta, A., Aggarwal, K., Gupta, B. B., & Colace, F. (2022). Analysis & prognosis of sustainable development goals using big data-based approach during COVID-19 pandemic. Sustainable Technology and Entrepreneurship, 1(2), 100012.
  6. Zhou, Q. (2022). A Study on Human Transiting Based on Big Data and Web Semantics: Distinguishment and Detection. International Journal on Semantic Web and Information Systems (IJSWIS), 18(1), 1-18.
  7. Singh, S. K., Singh, R. K., & Bhatia, M. P. S. (2012, December). Design flow of reconfigurable embedded system architecture using LUTs/PLAs. In 2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing (pp. 385-390). IEEE.
  8. Sharda, R., Delen, D., Turban, E., Aronson, J., & Liang, T. (2014). Business intelligence and analytics. System for Decesion Support, 398, 2014.
  9. Lee, C. S., Cheang, P. Y. S., & Moslehpour, M. (2022). Predictive analytics in business analytics: decision tree. Advances in Decision Sciences, 26(1), 1-29.
  10. Yeo, B., & Grant, D. (2018). Predicting service industry performance using decision tree analysis. International Journal of Information Management, 38(1), 288-300.
  11. Song, Y. Y., & Ying, L. U. (2015). Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry, 27(2), 130.
  12. Bhardwaj, A., & Kaushik, K. (2022). Predictive analytics-based cybersecurity framework for cloud infrastructure. International Journal of Cloud Applications and Computing (IJCAC), 12(1), 1-20.
  13. Zhang, Y., Liu, M., Guo, J., Wang, Z., Wang, Y., Liang, T., & Singh, S. K. (2022, December). Optimal Revenue Analysis of the Stubborn Mining Based on Markov Decision Process. In International Conference on Machine Learning for Cyber Security (pp. 299-308). Cham: Springer Nature Switzerland.
  14. Gupta, A., Singh, S. K., Chopra, M., & Gill, S. S. (2022). An inquisitive prospect on the shift toward online media, before, during, and after the COVID-19 pandemic: a technological analysis. In Advances in Data Computing, Communication and Security: Proceedings of I3CS2021 (pp. 229-238). Singapore: Springer Nature Singapore.
  15. Usha Rani, M., Saravana Selvam, N., & Jegatha Deborah, L. (2022). An improvement of yield production rate for crops by predicting disease rate using intelligent decision systems. Int. J. Softw. Sci. Comput. Intell.(IJSSCI), 14(1), 1-22.
  16. Baranyi, J., Pin, C., & Ross, T. (1999). Validating and comparing predictive models. International Journal of food microbiology, 48(3), 159-166.
  17. Vagelatos, A., & Sarivougioukas, J. (2021). Using Denotational Mathematics for the Formal Description of Home UbiHealth Decision-Support Systems With Knowledge Flow. International Journal of Software Science and Computational Intelligence (IJSSCI), 13(4), 1-17.
  18. Piper, J., & Rodger, J. A. (2022). Longitudinal study of a website for assessing American Presidential candidates and decision making of potential election irregularities detection. International Journal on Semantic Web and Information Systems (IJSWIS), 18(1), 1-20.
  19. Bui, Q. T., & Lo, F. Y. (2022). Technology multinational enterprises from emerging markets: Competitive interplay of international entry timing decisions. Sustainable Technology and Entrepreneurship, 1(3), 100019.
  20. Dubey, H. A. R. S. H. I. T., Kumar, S. U. D. H. A. K. A. R., & Chhabra, A. N. U. R. E. E. T. (2022). Cyber Security Model to Secure Data Transmission using Cloud Cryptography. Cyber Secur. Insights Mag, 2, 9-12.
  21. Jayaram, A., & Singal, S. (2017, January). An enterprise resource management model for business intelligence, data mining and predictive analytics. In 2017 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence (pp. 485-490). IEEE.
  22. Shi‐Nash, A., & Hardoon, D. R. (2017). Data analytics and predictive analytics in the era of big data. Internet of things and data analytics handbook, 329-345.
  23. Nalchigar, S., & Yu, E. (2018). Business-driven data analytics: A conceptual modeling framework. Data & Knowledge Engineering, 117, 359-372.
  24. Singh, S. K. (2021). Linux Yourself: Concept and Programming. CRC Press.
  25. Tamang, M. D., Shukla, V. K., Anwar, S., & Punhani, R. (2021, April). Improving business intelligence through machine learning algorithms. In 2021 2nd International Conference on Intelligent Engineering and Management (ICIEM) (pp. 63-68). IEEE.
  26. Singh, S. K., Singh, R. K., Bhatia, M. P. S., & Singh, S. P. (2013). CAD for delay optimization of symmetrical FPGA architecture through hybrid LUTs/PLAs. In Advances in Computing and Information Technology: Proceedings of the Second International Conference on Advances in Computing and Information Technology (ACITY) July 13-15, 2012, Chennai, India-Volume 3 (pp. 581-591). Springer Berlin Heidelberg.
  27. Chopra, M., Singh, S. K., Aggarwal, K., & Gupta, A. (2022). Predicting catastrophic events using machine learning models for natural language processing. In Data mining approaches for big data and sentiment analysis in social media (pp. 223-243). IGI Global.
  28. Niu, Y., Ying, L., Yang, J., Bao, M., & Sivaparthipan, C. B. (2021). Organizational business intelligence and decision making using big data analytics. Information Processing & Management, 58(6), 102725.

Cite As:

Bhardwaj V., Widiasti A., Febiani A. (2024) Predictive Analytics: Building and Evaluating Predictive Models for Business Intelligence, Insights2Techinfo, pp.1

67960cookie-checkPredictive Analytics: Building and Evaluating Predictive Models for Business Intelligence
Share this:

Leave a Reply

Your email address will not be published.