By:SHREY MEHRA (email id: lco20376@ccet.ac.in) Department of CSE, Chandigarh College of Engineering and Technology, India
Abstract
Natural Language Processing (NLP) is a dynamic and ever-evolving field that has undergone significant transformations from its inception in the 1950s to its present state in 2024. This paper provides a comprehensive overview of NLP, tracing its historical development, highlighting current trends, addressing persistent challenges, and unveiling exciting prospects for the future. In this paper, we delve into the fundamental components of NLP, including Natural Language Understanding (NLU), Natural Language Inference (NLI), and Natural Language Generation (NLG).
1. Introduction
Language is the fundamental medium through which humans communicate, express thoughts, and convey ideas. Understanding and processing this intricate form of communication has long been a challenge for machines. Natural Language Processing (NLP), an offshoot of artificial intelligence (AI), serves as the conduit between human language and computational analysis. Over the decades, NLP has evolved from a nascent concept in the 1950s to a thriving field of research and application in 2024[1]. The journey of NLP has been marked by remarkable progress, driven by a confluence of factors including advances in linguistics, machine learning, and the exponential growth of computing power. In this research article, we embark on a comprehensive exploration of NLP, tracing its historical development, examining its current state, identifying key challenges, and envisioning its promising future.
The structure of this article offers a comprehensive perspective on NLP, exploring its core components: Natural Language Understanding (NLU), Natural Language Inference (NLI), and Natural Language Generation (NLG).A detailed explanation of the seven steps involved in NLU will shed light on how machines make sense of human language. The turning point in the history of NLP occurred around the year 2000, with the advent of machine learning techniques, particularly neural networks[2]. This transformative era unleashed the potential of deep learning, propelling NLP into new dimensions of capability and application. We will explore how NLP has evolved since this pivotal moment, with a focus on advancements in recent years. NLP’s applications are far-reaching and continue to grow. From machine translation and text categorization to information extraction and summarization, NLP has found utility across various domains, including healthcare, finance, and entertainment. This article will highlight the practical implications of NLP in these diverse fields. Moreover, NLP is not confined to the realm of academia and research laboratories. It has become an integral part of real-world applications, as evidenced by its integration into the top recent projects [3-10]. These projects exemplify the transformative impact of NLP on industries and society.
The article is organized as follows: Section 2 discusses the components of NLP. In section 3 the history of NLP is discussed along with its applications. Finally, we conclude the article in Section 4.
2. Components of NLP
It is a multifaceted field that involves several core components, each serving a specific purpose in understanding and generating human language. These components work together to bridge the gap between natural language and computational analysis. Here are the fundamental components of NLP:
Fig 1: Components of NLP
2.1 NLU
Natural Language Understanding (NLU) is a critical component of Natural Language Processing (NLP) that focuses on enabling machines to comprehend and interpret human language [11]. NLU involves a series of steps to transform unstructured text or speech data into structured representations that a computer can work with. Here are the seven key steps involved in NLU
a) Phonology: Phonology deals with the sound structure of language, including phonemes (distinctive speech sounds) and their rules of combination [12]. In NLU, phonological analysis may be relevant in applications like speech recognition and text-to-speech synthesis to handle pronunciation and sound patterns.
b) Lexical: Lexical analysis involves the study of words and their meanings (lexemes) within a language. In NLU, lexical analysis helps in identifying words, their definitions, and semantic associations, which is crucial for tasks like word sense disambiguation and building a lexicon?
c) Syntax: Syntax focuses on the rules governing the structure and arrangement of words in sentences, including aspects like sentence structure, word order, and grammatical relationships. In NLU, syntactic analysis aids in parsing sentences to understand their grammatical structure, which is vital for tasks like parsing, syntactic role labeling, and sentence generation.
d) Morphology: Morphology deals with the structure of words, including their inflections, prefixes, suffixes, and root forms. In NLU, morphological analysis helps in understanding the forms of words and how they change to convey different meanings, aiding tasks like stemming and lemmatization.
e) Semantics: Semantics explores the meaning of words, phrases, and sentences and how they relate to the world. It deals with the interpretation of meaning in context [13]. In NLU, semantic analysis enables the understanding of the meaning of text or speech, including word sense disambiguation, entity recognition, and sentiment analysis.
f) Pragmatics: Pragmatics deals with the use of language in context, including the interpretation of implied meaning, presupposition, and speech acts [14]. In NLU, pragmatic analysis helps in understanding the intended meaning, context, and implied information in communication, contributing to tasks like natural language generation and dialogue systems.
These linguistic components collectively contribute to the holistic understanding of natural language by NLU systems. NLU involves integrating information from these levels to comprehend language at both surface and deep semantic levels, enabling machines to extract meaningful information and generate linguistically coherent responses in various NLP applications.
2.2 NLI
Natural Language Inference (NLI), also known as textual entailment, is a fundamental task in Natural Language Processing (NLP) that focuses on determining the logical relationship between two pieces of text. Specifically, NLI aims to decide whether one piece of text (the “hypothesis”) can be inferred or logically implied by another piece of text (the “premise”) [15]. This task plays a vital role in various NLP applications, including question answering, information retrieval, and automated reasoning. Components of NLI:
a) Premise: The premise is the first piece of text that serves as the basis for the inference. It contains information or statements that are used to assess the logical relationship with the hypothesis.
b) Hypothesis: The hypothesis is the second piece of text, and it represents a claim or statement that needs to be evaluated in the context of the premise. The goal is to determine if the hypothesis can be logically derived or inferred from the premise.
NLI tasks can be categorized into three main classes based on the relationship between the premise and hypothesis:
1) Entailment (E): In entailment, the hypothesis can be logically inferred or implied by the premise. If the premise is true, it logically follows that the hypothesis must also be true. Example:
Premise: “It is raining outside.”
Hypothesis: “The ground is wet.”
Relation: Entailment (The wet ground logically follows from the premise of rain.) 2) Contradiction (C): In contradiction, the hypothesis directly contradicts or is incompatible with the premise. If the premise is true, the hypothesis cannot be true at the same time.
Example:
Premise: “The sky is clear.”
Hypothesis: “It is raining heavily.”
Relation: Contradiction (Clear sky and heavy rain cannot coexist.)
3) Neutral (N): In neutrality, there is no clear logical relationship between the premise and hypothesis. The hypothesis neither logically follows from the premise nor contradicts it. Example:
Premise: “She is reading a book.”
Hypothesis: “She is wearing glasses.”
Relation: Neutral (Reading a book does not entail or contradict wearing glasses.
2.3 NLG
Natural Language Generation (NLG) is a critical component of Natural Language Processing (NLP) that focuses on the generation of human-like text or speech based on structured data or instructions. NLG systems take structured information and convert it into coherent and linguistically correct language output, making it easier for humans to understand and interact with data-driven systems. Key Elements of NLG:
a) Data Input: NLG begins with structured data or information. This data can take various forms, such as databases, spreadsheets, charts, or other structured formats. NLG systems need this structured data as their input.
b) Text Generation Rules: NLG systems employ predefined rules, templates, or algorithms that determine how to convert the structured data into natural language text. These rules are created based on linguistic and contextual knowledge to ensure grammatical and coherent output.
NLG typically involves several steps to generate human-like text:
a) Data Analysis: NLG systems analyze the structured data input to identify relevant information, relationships, and patterns. This step helps the system understand the context and content of the data.
b) Content Planning: Content planning involves deciding what information to include in the generated text and organizing it logically. NLG systems prioritize and structure the content based on user requirements and context.
c) Text Generation: This is the core step where NLG systems generate the actual text. Depending on the approach, NLG can use various techniques, including template-based generation, rule-based generation, or more advanced methods like machine learning-based approaches.
d) Linguistic Realization: In this step, the generated content is converted into grammatically correct and coherent sentences. Linguistic realization involves choosing appropriate words, constructing sentences, and ensuring that the text flows naturally.
e) Surface Generation: Surface generation deals with formatting, punctuation, and stylistic aspects of the text. NLG systems ensure that the generated text adheres to specific writing conventions, such as capitalization, punctuation, and numbering.
3. NLP: Then & Now
In the years leading up to the 2000s, the field of Natural Language Processing (NLP) underwent significant developments, marked by pioneering research and the exploration of novel approaches. Here, an overview of select articles is provided that shed light on the state of NLP before the turn of the century.
In the nascent stages of Natural Language Processing (NLP) research, a collection of pioneering articles laid the cornerstones of this dynamic field. Beattie et al.’s 1969 work provided an inaugural glimpse into computer-based NLP, delving into applications within information storage and retrieval. Beyond mere computation, Beattie pondered the enigmatic concept of computer ‘understanding’ of natural language [16]. A decade later, in 1979, G. DeJong et al.’s innovative article introduced a transformative approach to NLP. DeJong’s proposition of integrating the parser with the larger system proved to be ground breaking. This novel approach allowed the parser to draw upon predictive insights generated by the system during the processing of language. Such insights proved invaluable in resolving reference ambiguities and deciphering word meanings [17]. Meanwhile, Robert F. Simmons’ 1970 review of natural language question-answering systems offered a panoramic view of the early landscape. The article summarized the methods developed for syntactic, semantic, and logical analysis within these systems. Simmons highlighted the development of minimally effective techniques for answering questions posed in subsets of natural language and ushered in the conceptual foundation for future research in this domain [18]. In the realm of computational complexity, Church et al.’s 1980 article challenged conventional wisdom. Church proposed a hypothesis that questioned the necessity of powerful computational machinery in processing natural language syntax. This article sparked a paradigm shift in the NLP community, encouraging contemplation of more streamlined computational solutions [19].
Venturing into linguistic structures, Bruce et al.’s 1975 article explored the intriguing realm of ‘cases’ within various languages. Bruce scrutinized the relationship of noun phrases to the wider context of sentences, exploring the intricate interplay between surface, deep, and conceptual levels of cases. The article’s discussions on the efficiency of case representations opened doors to deeper understanding of language structures [20]. J. F. Kelley et al.’s 1983 article introduced a methodology aimed at bridging the gap between NLP applications and user-friendliness. This work championed user-centric design by developing CAL, the Calendar Access Language. This system revolutionized the management of personal calendars for computer-naive users, responding effectively to unconstrained English inputs[21]. Finally, in 1994, Friedman et al. undertook the development of a “General Natural-language Text Processor for Clinical Radiology.” This pioneering work aimed to identify clinical information within narrative reports and map it into a structured representation containing clinical terms. By employing a multi-phased approach, including parsing, regularization, and encoding, the processor achieved impressive recall and precision rates. This endeavor addressed a critical need in the medical domain, underscoring the practical applications of NLP in healthcare and clinical informatics [22].
Collectively, these pioneering articles forged the bedrock of NLP research. They addressed fundamental challenges, from parsing complexities to question answering, and from semantic structures to the intricacies of cases in language. These early contributions continue to resonate in contemporary NLP, shaping its trajectory and inspiring future generations of researchers.
3.1 Current Trends
In the realm of Natural Language Processing (NLP), the 21st century has witnessed an explosion of innovative research and groundbreaking developments. NLP, the domain where machines decode, comprehend, and generate human language, has transformed the way we interact with technology and the vast repositories of textual information available today.
J.B. Michael et al. in 2001 [23] proposed a system for natural-language processing support in policy-driven software systems which introduced a policy workbench and a tool called the natural language input-processing tool (NLIPT), emphasizing the need to map natural language policy statements into a computational form for policy analysis and maintenance.
Tetsuya Nasukawa and Jeonghee Yi in 2003 introduced a sentiment analysis approach aimed at extracting sentiments and polarities for specific subjects from documents [24]. Their method focused on identifying sentiment expressions within the text and determining whether they indicated positive or negative opinions toward the subject matter. J. Yi et al. in 2003 presented the Sentiment Analyzer (SA), a system designed to extract sentiment or opinion about a subject from online text documents. Instead of classifying entire documents, SA detected references to a given subject and determined sentiment in each reference using natural language processing techniques [25].
I. Androutsopoulos et al. in 2008 provided an introduction to natural language interfaces to databases (NLIDBS) and discussed their advantages and disadvantages compared to other query interfaces. They highlighted linguistic challenges in processing natural language queries and explored various aspects of NLIDB architectures [26]. Ronan Collobert and Jason Weston in 2008 described a unified convolutional neural network architecture that addressed multiple language processing tasks, including part-of-speech tagging, named entity recognition, and language modelling [27]. They emphasized the effectiveness of multitask learning and semi-supervised learning in achieving state-of-the-art performance.
Richard Socher et al. in 2011 introduced a max-margin structure prediction architecture based on recursive neural networks, capable of recovering complex structures in natural scenes and sentences [28]. They demonstrated the versatility of the algorithm in syntactic parsing and semantic scene segmentation. Lanbo She et al. in 2014 explored the task of teaching robots new high-level actions through natural language instructions [29]. They introduced a representation of actions based on desired goal states and demonstrated the robot’s ability to apply newly learned action knowledge to novel situations.
Ewoud Pons et al. in 2016 conducted a systematic review on the application of natural language processing in radiology [30]. They highlighted the potential of NLP in extracting structured information from radiology reports, but also noted challenges in clinical adoption. Shiliang Sun et al. in 2017 provided a comprehensive review of NLP techniques for opinion mining which covered text preprocessing, sentiment analysis, comparative opinion mining, deep learning approaches, opinion summarization, and highlighted challenges and open problems in the field [31]. Matt Gardner et al. in 2018 introduced AllenNLP, a platform for deep learning in natural language understanding [32]. This article emphasized its flexible data API, high-level abstractions, and reference implementations for various semantic tasks, making it accessible for researchers.
Mark Neumann et al. in 2019 presented scispaCy, a tool for biomedical text processing, focusing on its robustness in handling biomedical and clinical text[33]. They discussed the performance of pre-trained models on various tasks and datasets. Yifan Peng et al. in 2019 introduced the Biomedical Language Understanding Evaluation (BLUE) benchmark, designed to evaluate pre-trained models in the biomedical domain [34]. The benchmark covered multiple tasks and datasets, facilitating research in biomedical NLP.
Andrea Galassi et al. in 2021 presented a unified model for attention architectures in NLP, categorizing attention models based on input representation, compatibility function, distribution function, and multiplicity. They discussed the exploitation of prior information in attention models and ongoing research efforts [35]. Yuexiong Ding et al. in 2022 explored the applications of NLP in the construction industry, emphasizing the need for interdisciplinary research and cross-modal approaches to address data isolation challenges [36].
Chengwei Qin et al. in 2023 empirically analysed ChatGPT’s zero-shot learning ability across 20 NLP datasets, providing insights into its strengths and limitations for various NLP tasks [37]. Salvatore Claudio Fanni et al. in 2023 emphasized the significance of NLP in converting human language into structured data, particularly in the context of radiology and medical informatics [38].
3.2 Applications of NLP
“Its dynamism extends its profound impact across various industries and daily life, fostering the development of numerous applications that automate, enhance, and revolutionize our interaction with text and language. Some of the key applications of NLP:
a) Machine Translation: Machine translation is the process of automatically translating text or speech from one language to another, preserving the meaning while ensuring fluency in the target language. Machine translation has enabled seamless cross-lingual communication. It is widely used for translating web pages, documents, and even in real-time conversations. While it may seem like a straightforward task, the real challenge lies not in merely substituting words from one language with their counterparts in another. Instead, it requires preserving the meaning of sentences, maintaining grammatical accuracy, and respecting verb tenses and linguistic nuances. Google Translate, DeepL, and Microsoft Translator are popular machine translation tools that support a multitude of languages. They empower users to overcome language barriers, whether for travel, business, or research.
b) Text Categorization: Text Categorization, a fundamental application of Natural Language Processing (NLP), is tasked with classifying large volumes of text into predefined categories or topics. It is a critical component of information retrieval systems and plays a pivotal role in organizing and making sense of vast textual data. In essence, text categorization involves training machine learning models to recognize patterns and features within textual content that are indicative of specific categories or topics. This process typically starts with a labeled dataset where documents are assigned to their respective categories. The machine learning algorithms then learn from these labeled examples to predict the category of unseen documents accurately.
One of the most prominent applications of text categorization is in news articles. News agencies employ text categorization systems to automatically classify articles into sections such as politics, sports, entertainment, and more. Similarly, email providers utilize this technology to classify incoming emails as spam or legitimate based on their content.
Text categorization has also found applications in sentiment analysis, where it determines the sentiment or emotion conveyed in a piece of text, whether it’s positive, negative, or neutral. This is invaluable for businesses seeking to understand customer feedback and sentiment on products or services. Furthermore, in legal and regulatory domains, text categorization aids in the efficient sorting and retrieval of legal documents, making it easier for legal professionals to access relevant information quickly.
In healthcare, text categorization assists in classifying electronic health records and medical literature into relevant medical categories, facilitating medical research and decision support. As NLP technologies continue to advance, text categorization remains a cornerstone for structuring and organizing vast textual datasets, enabling efficient information retrieval, analysis, and decision-making across various domains.
c) Spam Filtering: Spam Filtering is a vital application of Natural Language Processing (NLP) and machine learning techniques designed to automatically identify and filter out unwanted or unsolicited messages from a user’s communication channels, such as emails, text messages, and comments.
Spam, also known as “unsolicited bulk email” or “junk messages,” includes irrelevant or potentially harmful content sent in large quantities to a wide audience. Spam filtering aims to distinguish legitimate messages from spam, ensuring that users receive only relevant and safe communication.
Some of the applications include: Filtering out spam emails to keep users’ inboxes free from unwanted messages, phishing attempts, and malware. Preventing unsolicited text messages and SMS spam on mobile devices. Automatically identifying and removing spam comments, posts, or forum threads on websites and social media platforms. Detecting and reporting spam accounts, posts, or messages on social networking sites.
d) Information Extraction: Information Extraction is a critical application of Natural Language Processing (NLP) that involves automatically extracting structured information from unstructured text data. Information extraction involves identifying specific pieces of information, such as entities (e.g., names of people, organizations, locations), relationships (e.g., who works where), and events (e.g., conferences, product launches) from unstructured text documents like news articles, research articles, and social media posts. NLP techniques are used to recognize, extract, and structure this information in a machine-readable format.
NLP models can extract key information, such as headlines, dates, locations, and important events, from news articles to generate concise and informative summaries. Extracting structured information from biomedical literature aids researchers in discovering valuable insights, such as relationships between genes, proteins, diseases, and drug interactions.
Companies use NLP-powered information extraction to parse and extract relevant details from job applicants’ resumes, facilitating the hiring process. Virtual assistants extract relevant information from user queries to provide accurate responses, whether it’s finding restaurant recommendations or answering questions about flight details.
e) Text Summarization: Text Summarization is an essential application of Natural Language Processing (NLP) that aims to condense lengthy documents or texts into shorter versions while retaining their core information and meaning. Text summarization involves automatically generating a concise and coherent summary of a longer text, such as articles, research articles, or documents. It assists users in quickly understanding the main points and key details without reading the entire document.
Text summarization finds utility in various domains and scenarios such as Information Retrieval (Summarized versions of documents help users quickly identify relevant content during online searches), Content Generation (Summaries can be automatically generated for news articles, research articles, and reports to provide readers with a brief overview), Document Management (Summarization simplifies the organization of large document collections, making it easier to locate specific information), E-learning( Summarized versions of educational materials enhance learning efficiency by presenting essential concepts) and in Legal Documents (Legal professionals use summarization to review lengthy legal texts and case documents more efficiently).
3.3 Recent projects
Some of the recent NLP projects:
a) Self-assessment chatbot for COVID-19 prognosis [39]
In 2023, Thwala et al. introduced an innovative approach to chatbot development for COVID-19 symptom assessment using deep learning-based Natural Language Processing (NLP). Their research resulted in a highly efficient chatbot with a rapid response time of 0.3 seconds and an impressive accuracy rate of 95.35%. This recent work by Thwala et al. represents a significant advancement in the field of NLP-driven chatbots, offering improved capabilities in understanding nuanced language elements and enhancing the user experience in self-assessing COVID-19-related symptoms.
b) Lang2LTL [40]
In 2023, Liu et al. introduced Lang2LTL, a pioneering system for translating natural language commands into Linear Temporal Logic (LTL) task specifications for robots. Their innovative approach leverages large pre-trained language models and achieves remarkable accuracy, with an average rate of 88.4% in translating complex LTL formulas in previously unseen environments, surpassing the prior state of the art by a significant margin. The Lang2LTL system’s versatility allows it to interpret natural language navigation commands without the need for additional training, making it a valuable contribution to the field of human-robot interaction and task specification.
c) Framework for Implementation of Personality Inventory Model [41]
In 2023, William et al. introduced a novel framework for Personality Inventory Model implementation in Natural Language Processing (NLP). This research focuses on analysing personality traits based on textual content, particularly interview responses. By leveraging advanced NLP and machine learning techniques, the study automates personality prediction, bridging psychology and computer science. It offers a promising approach to deciphering individual personality profiles, making it a noteworthy contribution to the field.
d) Smart Question Answering System [42]
In 2023, Manjunath et al. introduced a Smart Question Answering System that addresses the limitations of conventional Information Retrieval Systems (IRS) like Google and Yahoo. These traditional systems rely on keyword-based queries, often resulting in long lists of documents, requiring users to manually search for answers. To enhance this process, the Smart Question Answering System accepts natural language questions in specific domains and provides concise answers using Natural Language Processing (NLP). The system incorporates techniques such as pre-processing, normalized term weighting using TF-IDF, cosine similarity for document retrieval, and an improved BM25 ranking function for answer extraction. With an emphasis on reducing response time and providing relevant responses, the system achieved an impressive accuracy rate of 80%, along with a Precision of 93.2%, Recall of 84.3%, and F-measure of 88.5%. This work contributes to the advancement of efficient and accurate question-answering systems in the digital era.
e) A role distinguishing Bert model for medical dialogue system [43]
In 2023, Wang et al. introduced a Role Distinguishing Bert Model for enhancing medical dialogue systems in smart cities. Smart medicine relies on intelligent dialogue systems to provide efficient healthcare services, but the standard Bert model faces limitations in handling role-specific dialogues. To address this, the authors segmented and labelled utterances by dialogue roles, using these segments as inputs. They also replaced the Next Sentence Prediction (NSP) task with the Sentence Order Prediction (SOP) task for improved sentence coherence learning. Their model outperformed Ernie in online E-commerce datasets, achieving a 1% average accuracy boost across various dialogue system tasks. This research demonstrates the model’s potential in advancing intelligent medical dialogue systems for sustainable smart cities.
f) TextFlint [44]
In 2021, Wang et al. introduced TextFlint, a ground breaking multilingual robustness evaluation toolkit designed for Natural Language Processing (NLP) tasks. TextFlint offers a comprehensive approach to robustness analysis by incorporating several key features. It provides universal text transformations and task-specific transformations to assess models’ robustness under various linguistic modifications. Additionally, TextFlint includes adversarial attacks to test how well NLP models withstand deliberate manipulation and offers subpopulation analysis to evaluate model performance on different subgroups within datasets. The toolkit allows users to customize evaluations according to their specific needs with minimal code changes. What sets TextFlint apart is
its practicality. It not only evaluates models but also provides solutions by generating analytical reports and augmented data to address model shortcomings in terms of robustness. Importantly, all text transformations are grounded in linguistic principles and have been validated through human evaluation.
TextFlint has undergone rigorous large-scale empirical evaluations, with over 67,000 assessments performed on a wide range of deep learning models, supervised methods, and real-world systems. Researchers and practitioners can leverage TextFlint to enhance the robustness of their NLP models, ensuring their effectiveness across diverse linguistic contexts and challenges
g) A smart System for Fake News Detection [45]
In 2019, Jain et al. presented a Smart System for Fake News Detection using Machine Learning. Given the preference of many smartphone users to consume news via social media, distinguishing authentic news from rumours and misinformation on platforms like WhatsApp, Facebook, Twitter, and microblogs is crucial, particularly in countries like India. To address this issue, the authors developed a model and methodology for detecting fake news. They harnessed Machine Learning and Natural Language Processing techniques, specifically using Support Vector Machine, to aggregate and assess news articles for authenticity. The proposed model achieved an impressive accuracy rate of 93.6%, outperforming existing models. This research contributes significantly to combating the spread of fake news and promoting accurate information in society.
4. Conclusion
This short article has strived to achieve three primary objectives that collectively shed light on the world of Natural Language Processing (NLP) and its applications. The objective was to provide valuable insights into the essential terminologies of NLP and Natural Language Generation (NLG). This foundational knowledge will serve as a valuable resource for individuals embarking on their journey in NLP, offering them a strong starting point for understanding the intricacies of this field. Along with this, the historical evolution of NLP, its diverse applications, and its recent developments has been explored. By exploring the rich history and myriad applications of NLP, a comprehensive understanding of the field’s growth and potential impact across industries was provided.
References
- Fanni, S. C., Febi, M., Aghakhanyan, G., & Neri, E. (2023). Natural Language Processing. In M. E. Klontzas, S. C. Fanni, & E. Neri (Eds.), Introduction to Artificial Intelligence (pp. 87–99). Springer International Publishing. https://doi.org/10.1007/978-3-031-25928-9_5
- Bengio, Y., Ducharme, R., & Vincent, P. (2000). A neural probabilistic language model. Advances in neural information processing systems, 13.
- Systems, R. (2018, June 26). Ravn systems launch the ace powered GDPR robot – artificial intelligence to expedite GDPR compliance. PR Newswire: press release distribution, targeting, monitoring and marketing. https://www.prnewswire.com/news-releases/ravn-systems-launch-the-ace-powered-gdpr-robot—artificial-intelligence-to-expedite-gdpr-compliance-616006113.html
- Here’s why natural language processing is the future of Bi. Sisense. (2022, March 18). https://www.sisense.com/blog/heres-natural-language-processing-future-bi/
- Hirschman L, Grishman R, Sager N (1976) from text to structured information: automatic processing of medical reports. In proceedings of the June 7-10, 1976, national computer conference and exposition (pp. 267-275). ACM
- Ogallo, W., & Kanter, A. S. (2017, February 10). Using natural language processing and network analysis to develop a conceptual framework for Medication Therapy Management Research. AMIA … Annual Symposium proceedings. AMIA Symposium. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5333323/
- Parupalli, S., Rao, V. A., & Mamidi, R. (2018). BCSAT: A Benchmark Corpus for Sentiment Analysis in Telugu Using Word-level Annotations (arXiv:1807.01679). arXiv. http://arxiv.org/abs/1807.01679
- Choudhary, N. (2021). LDC-IL: The Indian repository of resources for language technology. Language Resources and Evaluation, 55(3), 855-867.
- Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330
- Liu, Y., & Zhang, M. (2018). Neural network methods for natural language processing.
- Suen, C. Y. (1979). N-Gram Statistics for Natural Language Understanding and Text Processing. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 164–172. https://doi.org/10.1109/TPAMI.1979.4766902
- Hyman, L. (1985). Roger Lass (1984). Phonology: An introduction to basic concepts. (Cambridge Textbooks in Linguistics). Phonology, 2. https://doi.org/10.1017/S0952675700000506
- Liddy, E. (2001). Natural Language Processing. School of Information Studies – Faculty Scholarship. https://surface.syr.edu/istpub/63
- Walton, D. (1996). A Pragmatic Synthesis. In: Fallacies Arising from Ambiguity. Applied Logic Series, vol 1. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-8632-0_8
- Wang, S., & Jiang, J. (2016). Learning Natural Language Inference with LSTM (arXiv:1512.08849). arXiv. https://doi.org/10.48550/arXiv.1512.08849
- Beattie, J. D. (1969). Natural language processing by computer. International Journal of Man-Machine Studies, 1(3), 311–329. https://doi.org/10.1016/S0020-7373(69)80026-X
- DeJong, G. (1979). Prediction and substantiation: A new approach to natural language processing. Cognitive Science, 3(3), 251–273. https://doi.org/10.1016/S0364-0213(79)80009-9
- Simmons, R. F. (1970). Natural language question-answering systems: 1969. Communications of the ACM, 13(1), 15-30.
- Church, K. W. (1980). On memory limitations in natural language processing.
- Bruce, B. (1975). Case systems for natural language. Artificial Intelligence, 6(4), 327–360. https://doi.org/10.1016/0004-3702(75)90020-X
- Kelley, J. F. (1983). An empirical methodology for writing user-friendly natural language computer applications. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 193–196. https://doi.org/10.1145/800045.801609
- Friedman, C., Alderson, P. O., Austin, J. H., Cimino, J. J., & Johnson, S. B. (1994). A general natural-language text processor for clinical radiology. Journal of the American Medical Informatics Association, 1(2), 161-174.
- Michael, J. B., Ong, V. L., & Rowe, N. C. (2001, July). Natural-language processing support for developing policy-governed software systems. In Proceedings 39th International Conference and Exhibition on Technology of Object-Oriented Languages and Systems. TOOLS 39 (pp. 263-274). IEEE.
- Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. Proceedings of the 2nd International Conference on Knowledge Capture, 70–77. https://doi.org/10.1145/945645.945658
- Yi, J., Nasukawa, T., Bunescu, R., & Niblack, W. (2003, November). Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques. In Third IEEE international conference on data mining (pp. 427-434). IEEE.
- Androutsopoulos, I., Ritchie, G. D., & Thanisch, P. (1995). Natural language interfaces to databases – an introduction. Natural Language Engineering, 1(1), 29–81. https://doi.org/10.1017/S135132490000005X
- Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. Proceedings of the 25th International Conference on Machine Learning, 160–167. https://doi.org/10.1145/1390156.1390177
- Socher, R., Lin, C. C., Manning, C., & Ng, A. Y. (2011). Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 129-136).
- She, L., Cheng, Y., Chai, J. Y., Jia, Y., Yang, S., & Xi, N. (2014). Teaching Robots New Actions through Natural Language Instructions. The 23rd IEEE International Symposium on Robot and Human Interactive Communication, 868–873. https://doi.org/10.1109/ROMAN.2014.6926362
- Pons, E., Braun, L. M., Hunink, M. M., & Kors, J. A. (2016). Natural language processing in radiology: a systematic review. Radiology, 279(2), 329-343.
- Sun, S., Luo, C., & Chen, J. (2017). A review of natural language processing techniques for opinion mining systems. Information Fusion, 36, 10–25. https://doi.org/10.1016/j.inffus.2016.10.004
- Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi, P., Liu, N., Peters, M., Schmitz, M., & Zettlemoyer, L. (2018). AllenNLP: A Deep Semantic Natural Language Processing Platform (arXiv:1803.07640). arXiv. https://doi.org/10.48550/arXiv.1803.07640
- Neumann, M., King, D., Beltagy, I., & Ammar, W. (2019). ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. Proceedings of the 18th BioNLP Workshop and Shared Task, 319–327. https://doi.org/10.18653/v1/W19-5034
- Peng, Y., Yan, S., & Lu, Z. (2019). Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets (arXiv:1906.05474). arXiv. https://doi.org/10.48550/arXiv.1906.05474
- A. Galassi, M. Lippi and P. Torroni, “Attention in Natural Language Processing,” in IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 10, pp. 4291-4308, Oct. 2021, doi: 10.1109/TNNLS.2020.3019893.
- Ding, Y., Ma, J., & Luo, X. (2022). Applications of natural language processing in construction. Automation in Construction, 136, 104169. https://doi.org/10.1016/j.autcon.2022.104169
- Qin, C., Zhang, A., Zhang, Z., Chen, J., Yasunaga, M., & Yang, D. (2023). Is ChatGPT a General-Purpose Natural Language Processing Task Solver? (arXiv:2302.06476). arXiv. https://doi.org/10.48550/arXiv.2302.06476
- Fanni, S. C., Febi, M., Aghakhanyan, G., & Neri, E. (2023). Natural Language Processing. In M. E. Klontzas, S. C. Fanni, & E. Neri (Eds.), Introduction to Artificial Intelligence (pp. 87–99). Springer International Publishing. https://doi.org/10.1007/978-3-031-25928-9_5
- Thwala, E., Adegun, A., & Adigun, M. (2023). Self-Assessment Chatbot for COVID-19 prognosis using Deep Learning-based Natural Language Processing (NLP). 2023 International Conference on Science, Engineering and Business for Sustainable Development Goals (SEB-SDG), 1, 1–8. https://doi.org/10.1109/SEB-SDG57117.2023.1012461
- Liu, J. X., Yang, Z., Idrees, I., Liang, S., Schornstein, B., Tellex, S., & Shah, A. (2023). Lang2LTL: Translating Natural Language Commands to Temporal Robot Task Specification (arXiv:2302.11649). arXiv. https://doi.org/10.48550/arXiv.2302.11649
- William, P., N, Y., Tidake, V. M., Sumit Gondkar, S., R, Chetana., & Vengatesan, K. (2023). Framework for Implementation of Personality Inventory Model on Natural Language Processing with Personality Traits Analysis. 2023 International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), 625–628. https://doi.org/10.1109/IDCIoT56793.2023.10053501
- Manjunath, T. N., Yogish, D., Mahalakshmi, S., & Yogish, H. K. (2023). Smart question answering system using vectorization approach and statistical scoring method. Materials Today: Proceedings, 80, 3719–3725. https://doi.org/10.1016/j.matpr.2021.07.369
- Wang, S., Wang, S., Liu, Z., & Zhang, Q. (2023). A role distinguishing Bert model for medical dialogue system in sustainable smart city. Sustainable Energy Technologies and Assessments, 55, 102896. https://doi.org/10.1016/j.seta.2022.102896
- Wang, X., Liu, Q., Gui, T., Zhang, Q., Zou, Y., Zhou, X., Ye, J., Zhang, Y., Zheng, R., Pang, Z., Wu, Q., Li, Z., Zhang, C., Ma, R., Fei, Z., Cai, R., Zhao, J., Hu, X., Yan, Z., … Huang, X. (2021). TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, 347–355. https://doi.org/10.18653/v1/2021.acl-demo.41
- Jain, A., Shakya, A., Khatter, H., & Gupta, A. K. (2019). A smart System for Fake News Detection Using Machine Learning. 2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), 1, 1–4. https://doi.org/10.1109/ICICT46931.2019.8977659
Cite As
Mehra S (2024) Unravelling the Tapestry of Natural Language Processing: A Journey from Past to Present and Beyond, Insights2Techinfo, pp.1