Sentiment Analysis of COVID-19 Twitter Data Using LSTM and BERT Techniques A Project Report submitted to JAWAHARLAL NEHRU TECHNOLOGICAL UNVERSITY ANANTAPUR. in Partial Fulfillment of the Requirements for the Award of the degree of BACHELOR OF TECHNOLOGY IN COMPUTER SCIENCE AND SYSTEMS ENGINEERING Submitted by G ASHISH KUMAR (19121A1539) A THARUN SAI (19121A1504) G JASHWANTH (19121A1531) B SAI SATWIK (19121A1514) K SREERANGA (19121A1561) Under the Guidance of Dr. P. Dhanalakshmi Professor Department of Computer Science and Systems Engineering Sree Sainath Nagar, Tirupati – 517 102 (2022-2023) Sree Sainath Nagar, Tirupati DEPARTMENT OF COMPUTER SCIENCE AND SYSTEMS ENGINEERING CERTIFICATE This is to certify that the project report entitled “Sentiment Analysis of COVID-19 Twitter Data Using LSTM and BERT Techniques” is the Bonafide work done by G ASHISH KUMAR (19121A1539) A THARUN SAI (19121A1504) G JASHWANTH (19121A1530) B SAI SATWIK (19121A1514) K SREERANGA (19121A1561) in the Department of Computer Science and Systems Enginnering , and submitted to Jawaharlal Nehru Technological University Anantapur, Ananthapuramu in partial fulfillment of the requirements for the award of the degree of Bachelor of Technology in Computer Science and Systems Engineering during the academic year 2022-2023. This work has been carried out under my supervision. The results of this project work have not been submitted to any university for the award of any degree. Guide: Head: Dr. P. Dhanalakshmi Dr. K. Ramani Professor Professor & Head Dept. of CSSE Dept. of CSSE INTERNAL EXAMINER EXTERNALEXAMINER DECLARATION We hereby declare that this project report titled “Sentiment Analysis of COVID-19 Twitter Data Using LSTM and BERT Techniques” is a genuine work carried out by us, in B.Tech (Computer Science and Systems Engineering) degree course of Jawaharlal Nehru Technological University Anantapur and has not been submitted to any other course or University for the award of any degree by us. We declare that this written submission represents our ideas in our own words and where others' ideas or words have been included, we have adequately cited and referenced the original sources. We also declare that we have adhered to all principles of academic honesty and integrity and have not misrepresented or fabricated or falsified any idea / data / fact / source in our submission. We understand that any violation of the above will be cause for disciplinary action by the Institute and can also evoke penal action from the sources which have thus not been properly cited or from whom proper permission has not been taken when needed. Signature of the students 1. 2. 3. 4. 5. i ACKNOWLEDGEMENT We are extremely thankful to our beloved Chairman and Founder Dr. M. Mohan Babu, Padma Shri awardee who took keen interest and encouraged us in every effort throughout this course. We owe our gratitude to Dr. B. M. Satish , Principal, Sree Vidyanikethan Engineering College for permitting us to use the facilities available to accomplish the project successfully. We express our heartfelt thanks to Dr. K. Ramani , Professor and Head, Department of Computer Science and Systems Engineering, for her kind attention and valuable guidance to us throughout this course. We are thankful to our Project Coordinator Mr. M. Ramu, Assistant Professor of CSSE for his valuable support and guidance throughout the project work. We are extremely thankful to our project supervisor Dr. P. Dhanalakshmi , Professor of CSSE who took keen interest and encouraged us in every effort throughout this project. We also thank all the teaching and non-teaching staff of Computer Science and Systems Engineering department for their cooperation. We would like to thank our parents and friends who have extended their help and encouragement either directly or indirectly in completion of our project work. ii Abstract Sentiment analysis is a crucial task in understanding public opinion and perception towards a particular event or topic. The COVID-19 pandemic has greatly affected the world, and understanding public sentiment towards it is crucial for policymakers and organizations. In this project, a sentiment analysis model for COVID-19 related tweets on categories such as WFH, Online learning and Economy using Long Short-Term Memory (LSTM) and BERT (Bidirectional Encoder Representations from Transformers) architectures were used. Twitter data was collected using relevant keywords and hashtags related to COVID-19 such as WFH, Economy and Online Learning. The tweets were then tokenized and embedded using BERT, which provides a rich representation of the text by capturing contextual information. These embeddings were then passes on to a fully connected layer for classification of sentiment of text. Similarly, LSTM model was also used to classify the same tweets. The major reason for choosing LSTM and BERT for sentiment analysis over traditional machine learning algorithms is their ability to handle large dataset and long-term contextual dependencies. Experimental results show that the BERT model achieved an accuracy of 0.78, 0.85 and 0.92 on Economy, WFH and Online learning dataset respectively. Whereas, LSTM achieved accuracy of 0.71,0.76 and 0.81 on Economy, WFH and Online learning dataset respectively. It can be clearly seen that BERT model overperformed the LSTM model in terms of accuracy. The high accuracy score demonstrates the effectiveness of the BERT model in understanding public sentiment towards the ongoing pandemic. The BERT model can be applied to other real-time public opinion analysis tasks and can provide valuable insights for decision-making. The results also indicate that BERT is a better choice than LSTM in this specific task of sentiment analysis on twitter data. iii Table of Contents Title Page No. Acknowledgement i Abstract ii Table of contents iii List of figures iv List of tables v List of abbreviations vi Chapter 1: Introduction 1.1. Introduction of the topic 1 1.2. Problem statement 11 1.3. Motivation 11 1.4. Objectives 12 1.5. Organization of thesis 12 Chapter 2: Review of literature 13 Chapter 3: Methodology 3.1. Existing methods 18 3.2. Proposed methods 26 Chapter 4: Results and discussion 41 Chapter 5: Conclusion 48 Chapter 6: Future Work 49 References 50 iv List of Figures S.No Table Name Page No. 1.1 Different types of Emotions (Positive, Neutral, Negative) 3 1.2 Working procedure of Sentiment Analysis 8 3.1 Decision tree Algorithm for sentiment analysis 21 3.2 Random Forest Algorithm for sentiment analysis 22 3.3 Logistic Regression Algorithm for sentiment analysis 23 3.4 Naive Bayes Algorithm for sentiment analysis 24 3.5 Support vector machine (SVM) Algorithm for sentiment analysis 25 3.6 A sequence of LSTM Memory cells 31 3.7 Internal representation of a single LSTM memory cell 31 3.8 LSTM Algorithm for Sentiment analysis 32 3.9 BERT Algorithm for Sentiment analysis 36 3.10 Flow of sentiment analysis using BERT 37 3.11 Stack of Encoders in a BERT model 37 3.12 Internal components of a single encoder 38 3.13 BERT Tokenizer working example 40 4.1 Comparison of Performance metrics for Economy dataset 46 4.2 Comparison of Performance metrics for WFH dataset 46 4.3 Comparison of Performance metrics for Online Learning dataset 47 v List of Tables S.No Table Name Page No. 4.1 Performance metrics for WFH Dataset (Logistic Regression) 41 4.2 Performance metrics for WFH Dataset (LSTM) 41 4.3 Performance metrics for WFH Dataset (BERT) 42 4.4 Performance metrics for Economy Dataset (Logistic Regression) 42 4.5 Performance metrics for Economy Dataset (LSTM) 42 4.6 Performance metrics for Economy Dataset (BERT) 42 4.7 Performance metrics for Dataset (Logistic Regression) 43 4.8 Online Learning (LSTM) 43 4.9 Online Learning (BERT) 43 4.10 Comparison of Accuracy of LSTM and BERT on three different datasets 44 4.11 Performance metrics for Economy dataset 45 4.12 Performance metrics for WFH dataset 45 4.13 Performance metrics for Online Learning dataset 45 vi List of Abbreviations LSTM Long Shot-Term Memory BERT Bidirectional Encoder Representations from Transformers WFH Work From Home INTRODUCTION 1 1. INTRODUCTION 1.1. SENTIMENT ANALYSIS OF ONLINE DATA Online data refers to any type of information that is available on the internet. This includes data from social media platforms, websites, mobile apps, and other online sources. The importance of online data and social media in today's world cannot be overstated. In the past decade the amount of data generated and stored online has grown exponentially, and this trend is expected to continue in the future. The amount of data generated by individuals and businesses alike is staggering, with over 2.5 quintillion bytes of data created every day. With the rise of big data, the potential for organizations to gain insights and make data-driven decisions has never been greater. The use of online data has become increasingly important in a wide range of industries from marketing and advertising to healthcare and finance. for example, online data can be used to gain insights into customer behavior and preferences allowing businesses to target their advertising more effectively and increase conversions. In hospitalization, online data can be used to track and analyze patient data, leading to more personalized and effective treatments. In Financial matters, online data can be used to identify and prevent fraudulent activities, and to make more informed investment decisions. There are two main types of online data: structured and unstructured. In the context of sentiment analysis, structured data refers to a piece of text that follows a predefined sentence ordering. This means that the text is organized in a specific format that can be easily understood by the machine learning model and easily processed for sentiment analysis. In online data Twitter is one of the most widely-used social media platforms. It has become a valuable source of information for monitoring public opinion on various topics. The platform allows users to express their thoughts and feelings in real-time providing a wealth of data that can be used to understand public sentiment. This data can be used to monitor public opinion on various aspects of the crisis, such as the effectiveness of government response, the impact on the economy and individual experiences with the virus. This data can also be used to understand the impact of COVID-19 on mental health and well-being. Given the scale of the pandemic and the high volume of tweets generated about Twitter data provides a large diverse and up-to-date dataset for monitoring public opinion. Facebook is the largest social media platform with over 2.8 billion monthly active users. It provides a wealth of information on individuals, including their demographics, interests, and 2 purchasing habits. Additionally, Facebook also provides businesses with a platform to interact with their customers and gain valuable insights. For example, a business can use Facebook to conduct surveys, gather feedback, and track customer sentiment. In Twitter, Facebook the data may be given in the both Structured and Unstructured. Sentiment analysis is the process of identifying and extracting subjective information from text, often in order to determine the overall sentiment or emotion conveyed. Structured data in sentiment analysis typically refers to the use of pre-existing, organized data sets, such as customer reviews or social media posts, to train and test sentiment analysis algorithms. This structured data can include numerical ratings, labels, or categories that indicate the overall sentiment of a piece of text, which can be used as the ground truth for training machine learning models. For example, a structured data in sentiment analysis could be a set of customer reviews for a product, where each review follows a predefined format such as: "I liked the product because [reason], but I didn't like [reason]." This format makes it easy for the machine learning model to understand the sentiment of the review and to identify the specific aspects of the product that the customer liked or didn't like. An example of structured data in sentiment analysis: Eg: Camera is a good for photographers Noun Adjective Adverb Unstructured data in sentiment analysis refers to text data that is not pre-organized or labeled, such as tweets, blog posts, or news articles. Unstructured data can be more difficult to work with than structured data because it lacks clear categories or labels that indicate the overall sentiment. However, unstructured data can also be more representative of real-world text data, as it is not constrained by the biases that may be present in a pre-labelled data set. To analyse unstructured data, natural language processing techniques are used to extract features such as word count, part-of-speech tags, and named entities, which can then be used as input for machine learning models. These models can be trained on a labelled data set to identify sentiment, and can then be applied to new, unlabelled data to make predictions. Overall, while working with unstructured data in sentiment analysis can be more challenging, it is also more flexible and allows for a more generalizable approach. An example of unstructured data in sentiment analysis when working with tweets would be a collection of tweets that are not pre-organized or labelled in any way. For example, you 3 may collect tweets by searching for a specific keyword or hashtag, without any additional information about the sentiment of the tweets or the context in which they were written. For example, you could collect tweets containing the hashtag "#ClimateChange" without any additional information about the sentiment of the tweets. To analyze this unstructured data, you would need to use natural language processing techniques to extract features such as word count, part-of-speech tags, and named entities, which can then be used as input for machine learning models. For Unstructured data Example: “I bought this for my 4-year-old kid since he absolutely enjoys Rescue Heroes, and it seemed like it would have some fun for him. What the description will not inform you is that you can't decide for a while in sense which activities you want to play. Ok, better of it.” On the other hand, unstructured data in sentiment analysis refers to text that does not follow a predefined format and can be harder to process and understand for a machine learning model. Such as a random text written by a person with no specific format, the sentiment analysis model would have a harder time identifying the sentiment of the text, and would require more advanced natural language processing techniques to extract meaning from the text. The tone can be expressed in a text or an emoji type in twitter or any social media platform to process this we need an NLP. Angry No emotion Happy Figure 1.1 . Different types of Emotions (Positive, Neutral, Negative) In Fig 1.1 Red emoji refers to Anger tone of the Tweet, Blue emoji refers to Neither Angry nor Happy, Yellow emoji refers to the Positive tone of the Tweet. These emotions widely used to understand the attitudes, opinions, and emotions of people on social media platforms. In politics, it can be used to track public opinion and predict election outcomes. In finance, it can be used to predict stock prices and analyze customer sentiment about a 4 company or product. Sentiment analysis can provide valuable insights into public opinion on the pandemic, and help policymakers and healthcare professionals to make informed decisions. COVID-19 pandemic has had a profound impact on the world, affecting nearly every aspect of daily life. One of the most affected areas has been the field of healthcare, as the virus has caused an unprecedented strain on healthcare systems around the globe. In addition, the pandemic has also had a significant impact on the economy, education, and social interactions. With this, it is crucial to understand public opinion and sentiment towards the pandemic in order to inform policy decisions and mitigate its impact. This can be achieved by analyzing the vast amount of data generated by social media platforms, where individuals express their thoughts, feelings and experiences with the pandemic. Sentiment analysis can be traced back to the 1960s, when early methods mainly relied on dictionary-based techniques. These methods would use a predefined list of positive and negative words to determine the sentiment of a piece of text. However, these early methods were limited by the size of the dictionaries and the subjectivity of language, which made them less accurate. With the advent of the internet and the explosion of social media, sentiment analysis began to gain popularity in the 2000s. The abundance of text data and the need for automated methods to process it led to the development of more advanced techniques. In the late 2000s and early 2010s, machine learning-based sentiment analysis emerged as a popular approach. These methods used supervised learning algorithms to train models on labelled data, allowing them to identify sentiment without predefined rules. As a result, these methods were able to achieve higher levels of accuracy than dictionary-based techniques. Sentiment analysis can also be used to monitor mental health and well-being, by tracking changes in public sentiment towards the pandemic over time. Sentiment analysis is also increasingly used to monitor and analyze social media data, such as tweets, comments, and reviews, to gain insights into public opinion on various topics. The explosion of social media data and the increasing importance of understanding public opinion has led to a growing interest in sentiment analysis in recent years. Performing sentiment analysis on economy, online learning, and work from home (WFH) dataset taken from Twitter during the COVID-19 pandemic has several uses. It can help businesses and governments identify key concerns, monitor public opinion, improve customer service, plan marketing strategies, and track trends over time. By analyzing sentiment, businesses and governments can make informed decisions, adjust policies, and 5 address customer concerns. There are several types of sentiment analysis, each with their own unique characteristics and applications: • Aspect-based sentiment analysis sometimes referred to as fine-grained sentiment analysis. It is concerned with analysing the sentiment of particular elements or features of a text rather than the overall sentiment. Detecting the attitude towards a certain product feature in customer review for instance. • Binary Sentiment Analysis: This method assigns a text's sentiment as positive or negative value. One of Sentiment Analysis's is most basic forms. • Multi-class Sentiment Analysis: This technique divides a text's sentiment into a number of specified classes, such as positive, negative, and neutral. • Emotion Analysis: This kind of sentiment analysis focuses on locating and categorising emotions in a text, including joy, sorrow, rage, and fear. • Sarcasm Detection: Sarcasm is a type of irony, and it can be difficult to tell when someone is being sarcastic. Sarcasm identification is a particular branch of sentiment analysis that focuses on spotting irony in written work. • Position Detection: This is a specific type of sentiment analysis whose goal is to identify the text's stance towards a particular subject or entity. • Cross-lingual Sentiment Analysis: This method of sentiment analysis is concerned with examining and comprehending sentiment in several languages. Since social media has become more prevalent, there is a greater need for sentiment analysis in a variety of languages. There are several ways that sentiment analysis is used in social media: Brand monitoring : Companies use sentiment analysis to track public opinion about their brand, products, and services on social media. This can be used to identify areas for improvement and to measure the effectiveness of marketing campaigns. Product reviews : Many online retailers use sentiment analysis to automatically classify customer reviews as positive, negative, or neutral. This can be used to provide a quick overview of the general sentiment of customers towards a product and to identify areas for improvement. Crisis management : During a crisis, sentiment analysis can be used to track public opinion and sentiment on social media in real-time, which can help organizations to understand the impact of the crisis and to respond quickly. 6 Campaign analysis : Political and marketing campaigns use sentiment analysis to track public opinion on their messages, candidates, and campaigns. This allows them to adjust their strategies and messaging in real-time based on public sentiment. Social listening : Sentiment analysis allows businesses and organizations to "listen" to the conversations happening on social media, which can give them valuable insights into the opinions and interests of their target audiences. Numerous types of data, including social media data, news stories, and scientific literature, have been analysed using sentiment analysis in relation to the COVID-19 epidemic. The value of sentiment analysis on COVID-19 data rests in its capacity to offer insights into how people are feeling and responding to the pandemic in real-time, which can help with decision-making and influence policy. There are several ways that sentiment analysis has been used to analyze COVID-19 data: Tracking public opinion : Sentiment analysis has been used to track public opinion on various aspects of the pandemic, such as government response, vaccine development, and mask-wearing. This allows researchers and policymakers to understand how people are feeling and reacting to the pandemic and to make informed decisions. Identifying misinformation : Sentiment analysis has been used to identify misinformation and false information about COVID-19 on social media platforms. This is important as misinformation can lead to confusion and mistrust, which can hamper efforts to control the pandemic. Monitoring mental health : Sentiment analysis has been used to monitor mental health and well-being during the pandemic by analyzing social media data. Research has shown that social media data can be used to identify individuals who may be at risk for mental health issues, such as depression and anxiety. Evaluating the effectiveness of campaigns : Sentiment analysis has been used to evaluate the effectiveness of public health campaigns and messaging, such as campaigns to promote mask-wearing and social distancing. Analyzing scientific literature : Sentiment analysis has been used to analyze the sentiment of scientific literature related to COVID-19. This can be useful in understanding the public sentiment of various COVID-19 research and provide insights on what research is more important, relevant and well-received. 7 Monitoring the impact of the pandemic on businesses and the economy : Sentiment analysis has been used to monitor the impact of the pandemic on businesses and the economy by tracking public opinion on topics such as job loss and economic recovery. 1.1.1. Functioning of sentiment analysis The incorporation of NLP, ML, and dedicated algorithms is crucial for conducting sentiment analysis. Yet, going beyond this by creating a customized model can heighten precision and enhance outcomes. With the aid of AI algorithms, a diverse range of sentiments, both broad and specific, can be identified. To gai n a comprehensive understanding of people's sentiment towards a topic, it is necessary to employ a customized sentiment analysis model. 1.1.2. The process of how Sentiment Analysis works: Data collection refers to the process of gathering and compiling a dataset of text samples (e.g. tweets, reviews, forum posts) that will be used to train and test a sentiment analysis model. The dataset should be representative of the type of text that the model will be used to analyze in production, and should be labelled with the appropriate sentiment (e.g. positive, neutral, negative). Data can be collected manually or using web scraping tools. It is important to ensure that the collected data is unbiased and diverse, to avoid any unintended bias in the model. Pre-processing in sentiment analysis refers to the process of cleaning, transforming, and preparing the text data for use in training a sentiment analysis model. The pre-processing steps may include: Removing unwanted characters, such as punctuation, special characters, and HTML tags. Lowercasing all text, to ensure that the model does not treat words with the same meaning, but in different cases as different words. Removing stop words, which are common words such as "the", "and", "is", that do not carry much meaning in the context of sentiment analysis. Tokenization, which is the process of breaking down the text into individual words or phrases. Stemming or lemmatization, which is the process of reducing words to their base form to reduce the dimensionality of the dataset. Removing any remaining unwanted text, such as numbers, URLs, and email addresses. Removing any duplicate data. 8 Figure 1.2 Working Procedure of Sentiment Analysis It is important to keep the data consistency, so it is recommended to apply the same pre- processing steps to both the training and testing dataset. Converting words to tokens in sentiment analysis refers to the process of representing words or phrases as numerical values, called tokens, that can be used as input to a machine learning model. One common way to convert words to tokens in sentiment analysis is through the use of a tokenizer. A tokenizer is a tool that takes a string of text and breaks it down into individual words or phrases, called tokens. There are different types of tokenizers, such as word-level tokenizers and character-level tokenizers. Word-level tokenizers will break down text into individual words, while character-level tokenizers will break down text into individual characters. Once the text is tokenized, the resulting tokens can be converted into numerical values using a technique called word embedding. Word embedding is the process of representing each token as a high-dimensional vector of numbers. This is done by mapping each token to a point in a high-dimensional space, where tokens that have similar meanings are located closer to each other. There are several popular methods for word embedding. These methods are trained on large datasets and are able to capture complex relationships between words and their meanings. After the conversion of words to tokens, it is possible to use this 9 numerical representation of the text as input to a machine learning model for sentiment analysis. Words that are commonly used in a language and do not add significant meaning in the context of sentiment analysis are called "stop words", such as "the", "and", "is", "a", "an", "of", "to", etc. Identifying stop words in data in sentiment analysis can help to reduce the dimensionality of the dataset and improve the efficiency of the model. There are several ways to identify stop words in data: Using a predefined list of stop words: Some libraries and frameworks, such as NLTK and spaCy, come with a predefined list of stop words for several languages. These lists can be used to filter out stop words from the dataset. Identifying common words: By analyzing the dataset and identifying the most common words, it is possible to identify stop words and remove them from the dataset. Using a machine learning algorithm: Some machine learning algorithms, such as Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA), can be used to identify stop words and remove them from the dataset. It is important to note that removing stop words from the dataset may not always be beneficial for sentiment analysis, as some stop words could be carrying important sentiments. Therefore, it's advisable to experiment with removing stop words and evaluate the impact on the model's performance Stemming words in sentiment analysis is the process of reducing words to their base form (stem) in order to reduce the dimensionality of the dataset and improve the efficiency of the model. This can be done by removing inflectional endings from words, such as "-ing", "-ed", "-s", etc. For example, the stem of the word "running" is "run", and the stem of the word "runner" is also "run". By reducing words to their base form, the model can generalize better and understand the text more effectively. This is because many words that have the same stem may convey the same sentiment. It is important to note that stemming may not always produce accurate results, particularly when dealing with words that are already in base form, rare words, and words that are not in the language used in the dataset. Therefore, it's advisable to experiment with stemming and evaluate the impact on the model's performance. Parts of speech (POS) tagging is the process of identifying and labelling the parts of speech of each word in a sentence. This can be useful in sentiment analysis as the sentiment of a sentence may depend on the parts of speech of its words. For example, a sentence such as "I am loving this product" might have a positive sentiment, while "I am hating this product" might have a negative sentiment. In this case, the key word is "loving" and "hating" which are verb, POS tagging allows the model to understand that the sentiment is conveyed by the verb and not any other word in the sentence. Chunking and chinking in sentiment analysis are techniques used to extract specific parts of a sentence that are relevant to the sentiment 10 analysis task. They are used to identify the phrases and words that are most informative for the sentiment analysis task. Chunking is the process of extracting chunks of words from a sentence, where a chunk is a sequence of words that together form a coherent and meaningful unit. For example, in the sentence "I am very happy with this product", the chunk "I am very happy" might be relevant to the sentiment analysis task. Chinking is the opposite of chunking; it is the process of removing chunks from a sentence. Chinking is used to remove parts of the sentence that are not informative for the sentiment analysis task, such as conjunctions and prepositions. Both techniques are based on the use of regular expressions and chunking and chinking rules, which are used to identify the specific phrases and words that are relevant to the sentiment analysis task. It's important to note that chunking and chinking are advanced techniques that can be useful in specific cases, where the model needs to focus on specific phrases or words in the text that convey the sentiment. However, it is also important to keep in mind that the process can be complex and may require a lot of time and resources to be implemented successfully. Named Entity Recognition (NER) is a technique used in natural language processing to identify and classify named entities in text, such as people, organizations, locations, and so on. In sentiment analysis, NER can be used to identify entities that are mentioned in a text, and then use this information to infer the sentiment of the text. For example, if a text contains the named entity "Apple Inc." and the sentiment of the text is positive, the model might infer that the sentiment is positive towards Apple Inc. It's important to note that NER is not always necessary for sentiment analysis and in many cases the performance of the model may not change significantly with or without NER. However, in some cases, NER can improve the model's performance, particularly when working with complex sentences or idiomatic expressions. Additionally, NER can be useful to extract more information from the text like the location, person or organization that is talked about which in turn can give more context to the sentiment. Lemmatization is a technique used in natural language processing to reduce words to their base or root form. It is similar to stemming, but while stemming reduces words to their base form by cutting off their endings, lemmatization reduces words to their base form by considering their context and the meaning of the word. In sentiment analysis, lemmatization can be used to reduce words to their base form so that the model can better understand the sentiment of the text. For example, the words "running," "ran," and "runs" would all be reduced to their base form "run" and the model would be able to understand that these words have the same sentiment, rather than treating them as three separate words. Text classification is a technique used in sentiment analysis to automatically categorize text