COVID-19 Vaccination Analysis Sukarn Pahuja : 0770205 2023-04-11 INTRODUCTION Wuhan, Hubei Province, China reported an outbreak of pneumonia in December 2019, which was linked to the Huanan Seafood Wholesale Market. The virus responsible for the outbreak was isolated from respiratory samples, and genome analysis revealed it to be a new type of coronavirus that belongs to the subgenus Sar- becovirus and was named SARS-CoV-2 due to its relation to SARS-CoV (Ciotti et al. 2020). The pandemic caused by SARS-CoV-2 and the resulting COVID-19 disease prompted the World Health Organization to declare a pandemic on 12 March 2020. Since its initial outbreak, the SARS-CoV-2 virus has spread globally, infecting 4,806,299 individuals and causing 318,599 deaths as of May 20, 2020. SARS-CoV-2, along with other related viruses such as SARS-CoV and MERS-CoV, can lead to severe pneumonia with fatality rates of 2.9%, 9.6%, and approximately 36%, respectively (Ciotti et al. 2020). However, the other four human coronaviruses (OC43, NL63, HKU1, and 229E) typically cause mild symptoms and self-limited diseases. The virus has caused a significant loss of human life, a negative impact on the economy, and an increase in poverty worldwide (Ciotti et al. 2020). The COVID-19 pandemic has had a significant impact on the world, causing widespread illness, death, and economic disruption. Vaccines are a crucial tool in mitigating the spread of the virus and ending the pandemic. As of October 5, 2022, according to data compiled by Bloomberg, over 12.7 billion vaccine doses have been administered in 184 countries, with an average daily rate of around 7.07 million doses. In the United States, a total of 613 million vaccine doses have been administered thus far, with an average daily rate of 30,866 doses administered over weeks (Bloomberg 2021). However, there are significant disparities in vaccination rates between countries, with some countries lagging far behind others. This research aims to explore the trends and patterns in global COVID-19 vaccination efforts and investigate the factors that may be influencing vaccination rates. The research analyses the countries with maximum and lowest vaccinations. In this research, we will analyse the average daily vaccinations for a country with high HDI, United States and a country with low HDI, India. Literatire Review The COVID-19 pandemic is an unparalleled crisis in recent human history. Within less than 18 months since its emergence, nearly two hundred million confirmed cases and four million deaths have been reported glob- ally. There have been significant efforts towards the development of safe and effective vaccines (Ndwandwe and Wiysonge 2021). On January 11, 2020, the genetic sequence of SARS-CoV-2, which causes COVID-19, was published, resulting in a surge of research and development efforts to create a vaccine against the virus (Thanh Le et al. 2020). The COVID-19 pandemic has had a significant impact on both humanitarian and economic aspects, driving the exploration of new paradigms and technology platforms to hasten the devel- opment of next-generation vaccines and thus by March 16, 2020, the first COVID-19 vaccine candidate was already in clinical testing with remarkable speed (Thanh Le et al. 2020). As of July 2021, there were 184 vaccine candidates in pre-clinical development, 105 in clinical development, and 18 vaccines approved for emergency use by at least one regulatory authority. These vaccines encompass live attenuated or inactivated 1 whole virus, protein-based, viral vector, and nucleic acid vaccines (Ndwandwe and Wiysonge 2021). The Coalition for Epidemic Preparedness Innovations (CEPI) has collaborated with global health authorities and vaccine developers to support the development of COVID-19 vaccines (Thanh Le et al. 2020). To aid this initiative, CEPI created and continually updates a database of COVID-19 vaccine development programs (Thanh Le et al. 2020). This database includes programs reported by the World Health Organization’s (WHO) authoritative list, as well as other programs identified from public and proprietary sources (Thanh Le et al. 2020). The database provides important insights into COVID-19 vaccine research and development and serves as a resource for CEPI’s ongoing portfolio management (Thanh Le et al. 2020). The information from the database has also been shared with the global health community to facilitate coordination and direct resources and capabilities towards the most promising vaccine candidates (Thanh Le et al. 2020). However, the distribution of vaccines around the world has been uneven, with some countries having access to an abundance of vaccines while others struggle to obtain any at all. Several factors may be influencing the distribution of vaccines around the world. One significant factor is wealth, with richer countries generally having better access to vaccines than poorer ones. The ethical considerations surrounding the distribution of COVID-19 vaccines need to be taken into account, especially for vulnerable groups. As of March 2021, 468 million vaccine doses had been administered in 135 countries, with high-income countries purchasing 54% of secured doses despite only accounting for 19% of the global population (Binagwaho, Mathewos, and Davis 2021). The availability of vaccines may also be influenced by the manufacturing and supply chain. Supply chain disruptions, such as those caused by the pandemic, can have a significant impact on the availability of vaccines around the world. A study by The New York Times found that as of September 2021, high-income countries had administered an average of 100 vaccine doses per 100 people, while low-income countries had administered only 2 doses per 100 people (Kaplan and Milstein 2021). The COVAX initiative was established to provide a safety net for all countries, with the aim of vaccinating 20% of member countries’ populations and prioritizing at-risk groups such as front-line health-care workers. However, vaccine nationalism has reduced the supply of available vaccines, resulting in low-income countries not receiving vaccinations until late 2023 (Binagwaho, Mathewos, and Davis 2021). By mid-2021, around three billion doses of COVID-19 vaccines had been administered worldwide, primarily in high-income countries (Kaplan and Milstein 2021). This uneven distribution of vaccines is a manifestation of unethical decisions and actions with historical roots that threaten to impede the return to normality. (Binagwaho, Mathewos, and Davis 2021). The administration of COVID-19 vaccines offers a promising solution to end the pandemic, provided there is equal access and optimal uptake in all countries globally (Ndwandwe and Wiysonge 2021). These factors affected the administration of COVID-19 vaccines to common people and to the most vulnerable. According to United Nations Development Programme, 2022, the Human Development Index (HDI) was created with the intention of highlighting that a country’s development should not be assessed solely based on its economic growth, but rather on its people and their capabilities. It is a summary measure that consid- ers three key dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living. The HDI is calculated by finding the geometric mean of normalized indices for each of these dimensions (United Nations Development Programme 2022). The health dimension is evaluated using life expectancy at birth, while the education dimension considers the mean years of schooling for adults aged 25 years and above and the expected years of schooling for children who are entering school. The standard of living dimension is measured by the gross national income per capita. However, the HDI adjusts this value using the logarithm of income to reflect that income becomes less significant as GNI increases. The scores for each of these dimensions are then combined using geometric mean to create a composite index (United Nations Development Program 2022). By using the HDI, it is possible to compare two countries with the same level of GNI per capita and question why their human development outcomes are different. Such comparisons can stimulate comparison between countries on the basis of COVID-19 vaccinations ad- ministration and management. However, the HDI only captures a portion of human development and does not take into account factors such as inequalities, poverty, human security, and empowerment. To provide a more comprehensive view of a country’s level of human development, other indicators and information presented in the Human Development Report (HDR) statistical annex should be analyzed. Additionally, the HDRO provides other composite indices that serve as broader proxies for key issues of human development, such as inequality, gender disparity, and poverty (United Nations Development Program 2022). In this research, we will analyse the average daily vaccinations for a country with high HDI, United States 2 and a country with low HDI, India, using statistical testing methods. We will be performing empirical research for COVID-19 vaccination analysis and will be using inferential statistics to carry out hypothetical testing that will enable us to provide a conclusion on the analysis. The main focus of inferential statistics is to draw conclusions or make decisions about a population of measurements using information gathered from a sample of those measurements. Estimation and hypothesis testing are the two primary types of statistical inference. Estimation aims to determine an approximate value for a population characteristic (parameter) by analyzing a sample statistic. This can be achieved using either point estimators or interval estimators. On the other hand, hypothesis testing is a more complex process consisting of six distinct steps. Its purpose is to draw inferences or make conclusions about one or more population parameters based on sample statistics that estimate those parameters (“Inferential Statistics - an Overview | ScienceDirect Topics” n.d.). To infer the results for the empirical analysis, we wile be using T-test inferential statistics. A t test is a statistical test that compares the means of two groups in a study. There are two types of statistical inference methods: parametric and nonparametric methods. Parametric methods involve defining the probability distribution of probability variables and making inferences about the parameters of the distribution, while nonparametric methods are used when the probability distribution cannot be defined (Kim 2015). For a test to be a parametric, certain assumptions need to be assumed. There are three assumptions that needs to be met in order to consider the T test as a parametric test, namely - normality, independence and constant variance (Bevans 2020). T tests fall under the category of parametric methods and can be used when the samples meet the criteria of normality, equal variance, and independence (Bevans 2020). There are three types of t-test - One Sample T-test, Unpaired T-test and Paired T-test. The one-sample t-test is used to compare the mean of a sample to a known standard, while the unpaired t-test is used to compare the means of two independent groups (“Types of T-Test : Excellent Reference You Will Love,” n.d.). The unpaired t-test can be either the standard Student’s t-test or Welch’s t-test, which is less restrictive (“Types of T-Test : Excellent Reference You Will Love,” n.d.). The paired t-test is used to compare the means of two related groups of samples and is applied when two values for the same samples are available, such as before and after treatment (“Types of T-Test : Excellent Reference You Will Love,” n.d.). METHODS To test whether there is a significant difference in the mean total vaccinations in the country with high Human Development Index (HDI), that is United States and a country with low Human Development Index, that is, India, we will be using Welch’s Two Sample Independent t-test. The dataset that will be used in this eperical research is the COVID-19 World Vaccination Progress (“COVID-19 World Vaccination Progress” n.d.). To perform empirical research using the COVID-19 World Vaccination Progress dataset on Kaggle, we can follow a general approach which involves the following steps: Data Understanding Data Understanding means to Understanding the structure and content of the dataset. The COVID-19 World Vaccination Progress dataset contains information on COVID-19 vaccination progress from various countries around the world. The data includes information such as the total number of vaccinations administered, the number of people fully vaccinated, the number of vaccinations per hundred people, and the vaccine types used. The dataset also contains country-level demographic and socio-economic data such as population, GDP per capita, and human development index (HDI). Data Preparation Data Preparation means Cleaning, transforming, and formatting the data to make it ready for analysis.Before starting with the analysis, we need to import, clean and transform the data to make it ready for analysis. 3 This can involve tasks such as handling missing values, correcting data types, and merging datasets. # Importing the dataset df <- read.csv("country_vaccinations.csv") # describing the data stats <- summary(df) # checking na values sum_na <- sum(is.na(df)) #removing na values country_vaccine <- na.omit(df) Exploratory Data Analysis Exploratory Data Analysis (EDA) refers to analyzing the data by creating visualizations, performing sta- tistical tests, and identifying patterns and relationships. Once the data is cleaned and prepared, we can start analyzing it by creating visualizations and performing statistical tests. We can start by exploring the distribution of variables, identifying outliers, and visualizing relationships between variables using scatter plots, bar charts, and heatmaps. We will be making country wise comparison sof different characterstics. For that, we will be required to group the data by country and aggregating by adding values. library(dplyr) # group the data by country and calculate the total vaccinations total_vaccinations <- country_vaccine %>% group_by(country) %>% summarize(total_vaccinations = sum(total_vaccinations, na.rm = TRUE)) %>% arrange(desc(total_vaccinations)) Now, we will be visualizing a comparison between different countries based on the vaccines administered using a bar graph library(ggplot2) # total vaccinations administered by country plot1 <- ggplot(total_vaccinations[1:20,], aes(x = country, y = total_vaccinations)) + geom_bar(stat = "identity", fill = "#003f5c") + labs(title = "Total Vaccinations by Country", x = "Country", y = "Total Vaccinations") + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) Hypothesis Testing Hypothesis testing is formulating and testing hypotheses using statistical methods to determine if there are significant differences or relationships in the data. After exploring the data, we can formulate hypotheses to test using statistical methods. In this empirical research, we will analyse the average daily vaccinations for a country with high HDI, United States and a country with low HDI, India., we will be using Welch’s Two Sample Independent t-test. 4 Step-1 : Declare Hypothesis Now, we declare the hypothesis that needs to be tested. Null Hypothesis ( H 0 ) and Alternate Hypothesis ( H A ) are declared. The Alternate hypothesis is the hypothesis that is tested in the test and null hypothesis is assumed to be the counterpart of alternate hypothesis. H 0 : There is no significant difference in the mean total daily vaccinations between United Sates and India. H A : H 0 : There is a significant difference in the mean total daily vaccinations between United Sates and India. Step-2 : Assumptions To perform the t-test, certain assumptions need to be met. These assumptions are Independence, Normality and Equal Variance. These can be checked as: 1) Independence : The data should be randomly sampled and the sample size should be less than 10% of the population size. The data set is randomly sampled and is less than 10% of the total world population. Thus, independence can be assumed. 2) Normality : Normality of variables can be checked using the qq plot for India and United states vaccinations. # creating United states and India vaccination variables us_vaccinations <- country_vaccine[country_vaccine$country == "United States", "daily_vaccinations"] india_vaccinations <- country_vaccine[country_vaccine$country == "India", "daily_vaccinations"] # united states vaccination qqnorm(us_vaccinations) qqline(us_vaccinations) −3 −2 −1 0 1 2 3 0 1000000 2500000 Normal Q−Q Plot Theoretical Quantiles Sample Quantiles 5 # India vaccinations qqnorm(india_vaccinations) qqline(india_vaccinations) −3 −2 −1 0 1 2 3 0e+00 4e+06 8e+06 Normal Q−Q Plot Theoretical Quantiles Sample Quantiles Since the data points show no much deflection and are along the main diagnol. Normality of the variables can be assumed. 3) Equal Variance : The variance across the variables should be equal and this can be checked using a plot of the data. # line plot for US and India vaccinations plot(us_vaccinations, type="l", col="red", ylim=c(0, max(us_vaccinations, india_vaccinations)), ylab = " lines(india_vaccinations, col="blue") 6 0 100 200 300 400 0e+00 4e+06 8e+06 Index Vaccinations The Red Line shows the vaccination drive in United States and Blue in India. There s no much deflection in the data point and thus, constant variance across variables can be assumed. Step-3 : Statistical Test Now, as all the assumptions are met, the welch’s two sample t-test is performed and the p-value and statistical values produces are used to infer the results and provide a conclusion. out <- t.test(us_vaccinations, india_vaccinations, var.equal=TRUE) The t-test thus performed provides the p-value and other statistical data that can be inferred to derive a conclusion. Conclusion and Recommendations Conclusion and recommendations include drawing conclusions from the analysis and making recommenda- tions based on the findings. For example, we may find that countries with higher HDI values have a higher number of vaccinations administered per hundred people. Based on this finding, we can recommend that countries with lower HDI values should be prioritized for vaccine distribution. 7 RESULTS The “COVID-19 World Vaccination Progress” data set contains country level vaccination data. The data includes information such as the total number of vaccinations administered, the number of people fully vaccinated, the number of vaccinations per hundred people, and the vaccine types used and the data set can be seen as: ## country iso_code date total_vaccinations people_vaccinated ## 86507 Zimbabwe ZWE 2022-03-24 8552429 4704720 ## 86508 Zimbabwe ZWE 2022-03-25 8691642 4814582 ## 86509 Zimbabwe ZWE 2022-03-26 8791728 4886242 ## 86510 Zimbabwe ZWE 2022-03-27 8845039 4918147 ## 86511 Zimbabwe ZWE 2022-03-28 8934360 4975433 ## 86512 Zimbabwe ZWE 2022-03-29 9039729 5053114 ## people_fully_vaccinated daily_vaccinations_raw daily_vaccinations ## 86507 3461926 137952 51151 ## 86508 3473523 139213 69579 ## 86509 3487962 100086 83429 ## 86510 3493763 53311 90629 ## 86511 3501493 89321 100614 ## 86512 3510256 105369 103751 ## total_vaccinations_per_hundred people_vaccinated_per_hundred ## 86507 56.67 31.17 ## 86508 57.59 31.90 ## 86509 58.25 32.38 ## 86510 58.61 32.59 ## 86511 59.20 32.97 ## 86512 59.90 33.48 ## people_fully_vaccinated_per_hundred daily_vaccinations_per_million ## 86507 22.94 3389 ## 86508 23.02 4610 ## 86509 23.11 5528 ## 86510 23.15 6005 ## 86511 23.20 6667 ## 86512 23.26 6874 ## vaccines ## 86507 Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac, Sputnik V ## 86508 Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac, Sputnik V ## 86509 Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac, Sputnik V ## 86510 Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac, Sputnik V ## 86511 Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac, Sputnik V ## 86512 Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac, Sputnik V ## source_name ## 86507 Ministry of Health ## 86508 Ministry of Health ## 86509 Ministry of Health ## 86510 Ministry of Health ## 86511 Ministry of Health ## 86512 Ministry of Health ## ## 86507 https://www.arcgis.com/home/webmap/viewer.html?url=https://services9.arcgis.com/DnERH4rcjw7NU6l ## 86508 https://www.arcgis.com/home/webmap/viewer.html?url=https://services9.arcgis.com/DnERH4rcjw7NU6l ## 86509 https://www.arcgis.com/home/webmap/viewer.html?url=https://services9.arcgis.com/DnERH4rcjw7NU6l ## 86510 https://www.arcgis.com/home/webmap/viewer.html?url=https://services9.arcgis.com/DnERH4rcjw7NU6l 8 ## 86511 https://www.arcgis.com/home/webmap/viewer.html?url=https://services9.arcgis.com/DnERH4rcjw7NU6l ## 86512 https://www.arcgis.com/home/webmap/viewer.html?url=https://services9.arcgis.com/DnERH4rcjw7NU6l The structure of data is shown below. It shows the data variables and it’s data types, the total number of records and the number of fields. ## ’data.frame’: 86512 obs. of 15 variables: ## $ country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ## $ iso_code : chr "AFG" "AFG" "AFG" "AFG" ... ## $ date : chr "2021-02-22" "2021-02-23" "2021-02-24" "2021-02-25" ... ## $ total_vaccinations : num 0 NA NA NA NA NA 8200 NA NA NA ... ## $ people_vaccinated : num 0 NA NA NA NA NA 8200 NA NA NA ... ## $ people_fully_vaccinated : num NA NA NA NA NA NA NA NA NA NA ... ## $ daily_vaccinations_raw : num NA NA NA NA NA NA NA NA NA NA ... ## $ daily_vaccinations : num NA 1367 1367 1367 1367 ... ## $ total_vaccinations_per_hundred : num 0 NA NA NA NA NA 0.02 NA NA NA ... ## $ people_vaccinated_per_hundred : num 0 NA NA NA NA NA 0.02 NA NA NA ... ## $ people_fully_vaccinated_per_hundred: num NA NA NA NA NA NA NA NA NA NA ... ## $ daily_vaccinations_per_million : num NA 34 34 34 34 34 34 40 45 50 ... ## $ vaccines : chr "Johnson&Johnson, Oxford/AstraZeneca, Pfizer/BioNTech, S ## $ source_name : chr "World Health Organization" "World Health Organization" ## $ source_website : chr "https://covid19.who.int/" "https://covid19.who.int/" "h A bar graph shown below represent the total number of vaccinations administered to people in different countries. 0e+00 1e+11 2e+11 3e+11 Argentina Brazil Canada Chile China England France Germany India Indonesia Italy Japan Malaysia Mexico Russia South Korea Thailand Turkey United Kingdom United States Country Total Vaccinations Total Vaccinations by Country The bar graph shows top 20 countries that administered the maximum number of COVID-19 vaccination doses to it’s people. India, with a huge population and low Human Development Index, administered the maximum number of doses making a remarkable vaccination drive. On the other hand, United States, a 9 country with high Human Development Index also made a remarkable progress in it’s COVID-19 Vaccination program. India and the United States administered maximum doses to it’s people as compared to other countries in the world. Welch’s two sample Independent t-test was performed to test and compare the average daily vaccinations for the United States and India. The result of the test is shown below: ## ## Two Sample t-test ## ## data: us_vaccinations and india_vaccinations ## t = -28.402, df = 857, p-value < 2.2e-16 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -3528252 -3072126 ## sample estimates: ## mean of x mean of y ## 1191727 4491916 The p-value for the test comes out to be 2.2e-16, which is less than the level of significance (0.05). As the p-val < 0.05, we Reject Null Hypothesis and accept the Alternate hypothesis. There is enough evidence available to prove that there is a significant difference in the mean total daily vaccinations between United Sates and India. The mean COVID-19 vaccination administered daily in the United States amounts to 1191727 while the mean COVID-19 vaccination administered daily in India amounts to 4491916. India, belonging to a low Human Development Index, performed remarkably well in the COVID-19 vaccination drive by adminitring such highdaily doses to it’s people making it a significant difference to get people protected from COVID-19. United States on the other hand, with high Human Developed Index also performed really well according to population to fight against COVID-19 and get it’s people protected. Overall, performing empirical research using the COVID-19 World Vaccination Progress dataset on Kaggle can provide valuable insights into the global vaccination progress and help inform public health policies and interventions. DISCUSSION REFERENCES • Ciotti, Marco, Massimo Ciccozzi, Alessandro Terrinoni, Wen-Can Jiang, Cheng-Bin Wang, and Sergio Bernardini. 2020. “The COVID-19 Pandemic.” Critical Reviews in Clinical Laboratory Sciences 57 (6): 365–88. https://doi.org/10.1080/10408363.2020.1783198. • Bloomberg. 2021. “More than 1.2 Million People Have Been Vaccinated: Covid-19 Tracker.” Bloomberg, 2021. https://www.bloomberg.com/graphics/covid-vaccine-tracker-global-distribution/. • Ndwandwe, Duduzile, and Charles S Wiysonge. 2021. “COVID-19 Vaccines.” Current Opinion in Immunology 71 (July): 111–16. https://doi.org/10.1016/j.coi.2021.07.003. • Kaplan, Robert M., and Arnold Milstein. 2021. “Influence of a COVID-19 Vaccine’s Effectiveness and Safety Profile on Vaccination Acceptance.” Proceedings of the National Academy of Sciences 118 (10). https://doi.org/10.1073/pnas.2021726118. • Thanh Le, Tung, Zacharias Andreadakis, Arun Kumar, Raúl Gómez Román, Stig Tollefsen, Melanie Saville, and Stephen Mayhew. 2020. “The COVID-19 Vaccine Development Landscape.” Nature Reviews Drug Discovery 19 (19). https://doi.org/10.1038/d41573-020-00073-5. 10 • Binagwaho, Agnes, Kedest Mathewos, and Sheila Davis. 2021. “Time for the Ethical Management of COVID-19 Vaccines.” The Lancet Global Health 9 (8). https://doi.org/10.1016/s2214-109x(21)00180- 7. • United Nations Development Programme. 2022. “Human Development Index.” United Nations Devel- opment Programme. United Nations. 2022. https://hdr.undp.org/data-center/human-development- index#/indicies/HDI. • “Inferential Statistics - an Overview | ScienceDirect Topics.” n.d. Www.sciencedirect.com. Accessed April 10, 2023. https://www.sciencedirect.com/topics/social-sciences/inferential- statistics#:~:text=Inferential%20statistics%20is%20concerned%20with. • Kim, Tae Kyun. 2015. “T Test as a Parametric Statistic.” Korean Journal of Anesthesiology 68 (6): 540–46. https://doi.org/10.4097/kjae.2015.68.6.540. • Bevans, Rebecca. 2020. “An Introduction to T Tests | Definitions, Formula and Examples.” Scribbr. January 31, 2020. https://www.scribbr.com/statistics/t-test/#:~:text=A%20t%20test%20is%20a. • “Types of T-Test : Excellent Reference You Will Love.” n.d. Datanovia. https://www.datanovia.com/ en/lessons/types-of-t-test/. • “COVID-19 World Vaccination Progress.” n.d. Www.kaggle.com. Accessed April 15, 2023. https: //www.kaggle.com/datasets/gpreda/covid-world-vaccination-progress?resource=download. 11