COVID-19 World Vaccination Progress Analysis

autrph
5 min readApr 28, 2021

The dataset is taken from Kaggle. Data is collected daily from Our World in Data GitHub repository for Covid-19, merged and uploaded by Gabriel Preda. I believe this dataset would give interesting insights into the vaccination programs of different countries.

I would start with data preparation then perform exploratory data analysis and visualization. I would try to answer and explore interesting facts about vaccine manufacturers and vaccination programs.

Note: If you would like to perform data analysis on your chosen data and learn the required skills, do check out the free course Data Analysis with Python: Zero to Pandas.

The code snippets are part of an executable Jupyter notebook hosted on Jovian.ml, a platform for sharing data science projects. The easiest way to start executing this notebook is to click the “Run” button at the top of the page, and select “Run on Binder”. This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks.

Downloading the Dataset

We start with downloading our dataset from Kaggle.

!pip install jovian opendatasets --upgrade --quiet

Let’s begin by downloading the data, and listing the files within the dataset.

dataset_url = 'https://www.kaggle.com/gpreda/covid-world-vaccination-progress'import opendatasets as od
od.download(dataset_url)

The dataset has been downloaded and extracted.

import os
os.listdir(data_dir)
['country_vaccinations.csv', 'country_vaccinations_by_manufacturer.csv']

Let us save and upload our work to Jovian before continuing.

project_name = "covid-world-vaccination-progress-analysis"!pip install jovian --upgrade -qimport jovianjovian.commit(project=project_name)

Data Preparation and Cleaning

import pandas as pdvac_raw_df = pd.read_csv(data_dir + '/country_vaccinations.csv')

While the dataset contains a lot of information, we will select a subset of columns:

Country- this is the country for which the vaccination information is provided

Date — date for the data entry; for some of the dates we have only the daily vaccinations, for others, only the (cumulative) total

Total number of vaccinations — this is the absolute number of total immunizations in the country

Total number of people vaccinated — a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccination might be larger than the number of people

Total number of people fully vaccinated — this is the number of people that received the entire set of immunization according to the immunization scheme (typically 2); at a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme

Daily vaccinations — for a certain data entry, the number of vaccination for that date/country

Total vaccinations per hundred — ratio (in percent) between vaccination number and total population up to the date in the country

Total number of people vaccinated per hundred — ratio (in percent) between population immunized and total population up to the date in the country

Total number of people fully vaccinated per hundred — ratio (in percent) between population fully immunized and total population up to the date in the country

Vaccines used in the country — vaccine type

selected_columns = [
'country',
'date',
'total_vaccinations',
'people_vaccinated',
'people_fully_vaccinated',
'daily_vaccinations',
'total_vaccinations_per_hundred',
'people_vaccinated_per_hundred',
'people_fully_vaccinated_per_hundred',
'vaccines']
vac_df = vac_raw_df[selected_columns].copy()

We have now prepared the dataset for analysis. Let’s take a look at sample rows from the data frame:

Exploratory Analysis and Visualization

Let’s begin by importingmatplotlib.pyplot and seaborn.

import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (12, 6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

Let’s plot the Top 10 countries with the highest number of total vaccinations

highest_total_vac_df = vac_df.groupby('country')[['total_vaccinations']].max()
highest_total_vac = highest_total_vac_df.sort_values('total_vaccinations', ascending=False).head(10)

These are the top 10 countries with the highest number of vaccinations done.

US is the highest vaccinated country followed by China and India.

Turkey, France, and Indonesia have around the same number of the population vaccinated.

Top 20 Countries with People Fully Vaccinated

top_full_vac_df = vac_df.groupby('country')[['people_fully_vaccinated']].max()
top_full_vac = top_full_vac_df.sort_values('people_fully_vaccinated', ascending=False).head(20)

US is the topmost country with people fully vaccinated i.e. people who have taken 2 doses of the vaccine.

Top 20 Vaccinated Countries based on People Fully Vaccinated per Hundred

top_people_full_vac_per_100_df = vac_df.groupby('country')[['people_vaccinated_per_hundred']].max()
top_people_full_vac_per_100 = top_people_full_vac_per_100_df.sort_values('people_vaccinated_per_hundred', ascending=False).head(20)

Gibraltar has vaccinated all of its population.

More than half of the population is vaccinated in the Falkland Islands, Seychelles, Isle of Man, Israel, Bhutan, Saint Helena, Wales, Maldives, Cayman Islands, UAE, and San Marino.

Top 20 Countries Based on People Vaccinated

top_people_vac_df = vac_df.groupby('country')[['people_vaccinated']].max()
top_people_vac = top_people_vac_df.sort_values('people_vaccinated', ascending=False).head(20)

US and India top this list where people have taken vaccination at least 1 time.

Total Vaccinations grouped by Vaccine Manufacturer

vac_manf_raw_df = pd.read_csv(data_dir + '/country_vaccinations_by_manufacturer.csv')
vac_manf = vac_manf_raw_df.groupby("vaccine")["total_vaccinations"].max()vac_counts = vac_manf.sort_values(ascending = False)
sns.barplot(vac_counts, vac_counts.index)

Pfizer/BioNTech and Moderna are the most vaccinated among all manufacturers.

Asking and Answering Questions

Q1: Are more people getting vaccinated?

First, we will change the ‘date’ column to a DateTime object of both files:

vac_raw_df[['date']] = vac_raw_df[['date']].apply(pd.to_datetime)
vac_manf_raw_df['date'] = vac_manf_raw_df[['date']].apply(pd.to_datetime)
vac_raw_df['month'] = pd.DatetimeIndex(vac_raw_df.date).monthmonth_vac = vac_raw_df.groupby('month')['people_vaccinated'].max()

As this data was compiled starting with December 2020, we can see that number of vaccinations is increasing monthly.

Q2: In the month of March, which vaccine was more applied?

vac_manf_raw_df['month'] = pd.DatetimeIndex(vac_manf_raw_df.date).monthmonth_3 = vac_manf_raw_df.month == 3
vac_manf_3 = vac_manf_raw_df[month_3]

dd= vac_manf_3.groupby('vaccine')['total_vaccinations'].count()
sns.barplot(dd.index, dd)

Pfizer/BioNTech was the most applied vaccine in the month of March 2021

Q3: Which country is least vaccinated?

Nauru with a population of around 11000 has got only 168 people vaccinated.

The other 4 least vaccinated countries are Cameroon, Tonga, Armenia, and Libya.

Q4: Which country is using the ‘Sinopharm/Beijing’ vaccine?

The above countries use Sinopharm/Beijing vaccine.

Q5: In the month of December 2020, which country was vaccinated most?

month_12 = vac_raw_df.month == 12
vac_12 = vac_raw_df[month_12]

US was the most vaccinated country in Dec 2020 with over 9 million people vaccinated with at least 1 dose.

Inferences and Conclusion

US is the highest vaccinated country followed by China and India.

Turkey, France and Indonesia have around same number of population vaccinated.

US is top most country with people fully vaccinated i.e. people have taken 2 doses of the vaccine.

Gibraltar has vaccinated all of its population.

More than half of the population is vaccinated in Falkland Islands, Seychelles, Isle of Man, Israel, Bhutan, Saint Helena, Wales, Maldives, Cayman Islands, UAE, and San Marino.

US and India top the list where people have taken vaccination atleast 1 time.

Pfizer/BioNTech and Moderna are the most vaccinated among all manufacturers.

Nauru with population around 11000 has got only 168 people vaccinated. The other 4 least vaccinated countries are Cameroon, Tonga, Armenia and Libya.

Pfizer/BioNTech was the most applied vaccine in the month of March 2021

The following countries use Sinopharm/Beijing vaccine: Cameroon, Equatorial Guinea, Gabon, Kyrgyzstan, Mauritania, Mozambique, Niger, Senegal, and Zimbabwe.

US was the most vaccinated country in Dec 2020 with over 9 million people vaccinated with atleast 1 dose.

References

Kaggle Dataset — https://www.kaggle.com/gpreda/covid-world-vaccination-progress

Jovian — Data Analysis with Python: Zero to Pandas

--

--