The dataset is taken from Kaggle. Data is collected daily from Our World in Data GitHub repository for Covid-19, merged and uploaded by Gabriel Preda. I believe this dataset would give interesting insights into the vaccination programs of different countries.
I would start with data preparation then perform exploratory data analysis and visualization. I would try to answer and explore interesting facts about vaccine manufacturers and vaccination programs.
Note: If you would like to perform data analysis on your chosen data and learn the required skills, do check out the free course Data Analysis with Python: Zero to Pandas.
The code snippets are part of an executable Jupyter notebook hosted on Jovian.ml, a platform for sharing data science projects. The easiest way to start executing this notebook is to click the “Run” button at the top of the page, and select “Run on Binder”. This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks.
Downloading the Dataset
We start with downloading our dataset from Kaggle.
!pip install jovian opendatasets --upgrade --quiet
Let’s begin by downloading the data, and listing the files within the dataset.
dataset_url = 'https://www.kaggle.com/gpreda/covid-world-vaccination-progress'import opendatasets as od
od.download(dataset_url)
The dataset has been downloaded and extracted.
import os
os.listdir(data_dir)['country_vaccinations.csv', 'country_vaccinations_by_manufacturer.csv']
Let us save and upload our work to Jovian before continuing.
project_name = "covid-world-vaccination-progress-analysis"!pip install jovian --upgrade -qimport jovianjovian.commit(project=project_name)
Data Preparation and Cleaning
import pandas as pdvac_raw_df = pd.read_csv(data_dir + '/country_vaccinations.csv')
While the dataset contains a lot of information, we will select a subset of columns:
Country- this is the country for which the vaccination information is provided
Date — date for the data entry; for some of the dates we have only the daily vaccinations, for others, only the (cumulative) total
Total number of vaccinations — this is the absolute number of total immunizations in the country
Total number of people vaccinated — a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccination might be larger than the number of people
Total number of people fully vaccinated — this is the number of people that received the entire set of immunization according to the immunization scheme (typically 2); at a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme
Daily vaccinations — for a certain data entry, the number of vaccination for that date/country
Total vaccinations per hundred — ratio (in percent) between vaccination number and total population up to the date in the country
Total number of people vaccinated per hundred — ratio (in percent) between population immunized and total population up to the date in the country
Total number of people fully vaccinated per hundred — ratio (in percent) between population fully immunized and total population up to the date in the country
Vaccines used in the country — vaccine type
selected_columns = [
'country',
'date',
'total_vaccinations',
'people_vaccinated',
'people_fully_vaccinated',
'daily_vaccinations',
'total_vaccinations_per_hundred',
'people_vaccinated_per_hundred',
'people_fully_vaccinated_per_hundred',
'vaccines']vac_df = vac_raw_df[selected_columns].copy()
We have now prepared the dataset for analysis. Let’s take a look at sample rows from the data frame:
Exploratory Analysis and Visualization
Let’s begin by importingmatplotlib.pyplot
and seaborn
.
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (12, 6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
Let’s plot the Top 10 countries with the highest number of total vaccinations
highest_total_vac_df = vac_df.groupby('country')[['total_vaccinations']].max()
highest_total_vac = highest_total_vac_df.sort_values('total_vaccinations', ascending=False).head(10)
These are the top 10 countries with the highest number of vaccinations done.
US is the highest vaccinated country followed by China and India.
Turkey, France, and Indonesia have around the same number of the population vaccinated.
Top 20 Countries with People Fully Vaccinated
top_full_vac_df = vac_df.groupby('country')[['people_fully_vaccinated']].max()
top_full_vac = top_full_vac_df.sort_values('people_fully_vaccinated', ascending=False).head(20)
US is the topmost country with people fully vaccinated i.e. people who have taken 2 doses of the vaccine.
Top 20 Vaccinated Countries based on People Fully Vaccinated per Hundred
top_people_full_vac_per_100_df = vac_df.groupby('country')[['people_vaccinated_per_hundred']].max()
top_people_full_vac_per_100 = top_people_full_vac_per_100_df.sort_values('people_vaccinated_per_hundred', ascending=False).head(20)
Gibraltar has vaccinated all of its population.
More than half of the population is vaccinated in the Falkland Islands, Seychelles, Isle of Man, Israel, Bhutan, Saint Helena, Wales, Maldives, Cayman Islands, UAE, and San Marino.
Top 20 Countries Based on People Vaccinated
top_people_vac_df = vac_df.groupby('country')[['people_vaccinated']].max()
top_people_vac = top_people_vac_df.sort_values('people_vaccinated', ascending=False).head(20)
US and India top this list where people have taken vaccination at least 1 time.
Total Vaccinations grouped by Vaccine Manufacturer
vac_manf_raw_df = pd.read_csv(data_dir + '/country_vaccinations_by_manufacturer.csv')
vac_manf = vac_manf_raw_df.groupby("vaccine")["total_vaccinations"].max()vac_counts = vac_manf.sort_values(ascending = False)
sns.barplot(vac_counts, vac_counts.index)
Pfizer/BioNTech and Moderna are the most vaccinated among all manufacturers.
Asking and Answering Questions
Q1: Are more people getting vaccinated?
First, we will change the ‘date’ column to a DateTime object of both files:
vac_raw_df[['date']] = vac_raw_df[['date']].apply(pd.to_datetime)
vac_manf_raw_df['date'] = vac_manf_raw_df[['date']].apply(pd.to_datetime)
vac_raw_df['month'] = pd.DatetimeIndex(vac_raw_df.date).monthmonth_vac = vac_raw_df.groupby('month')['people_vaccinated'].max()
As this data was compiled starting with December 2020, we can see that number of vaccinations is increasing monthly.
Q2: In the month of March, which vaccine was more applied?
vac_manf_raw_df['month'] = pd.DatetimeIndex(vac_manf_raw_df.date).monthmonth_3 = vac_manf_raw_df.month == 3
vac_manf_3 = vac_manf_raw_df[month_3]
dd= vac_manf_3.groupby('vaccine')['total_vaccinations'].count()
sns.barplot(dd.index, dd)
Pfizer/BioNTech was the most applied vaccine in the month of March 2021
Q3: Which country is least vaccinated?
Nauru with a population of around 11000 has got only 168 people vaccinated.
The other 4 least vaccinated countries are Cameroon, Tonga, Armenia, and Libya.
Q4: Which country is using the ‘Sinopharm/Beijing’ vaccine?
The above countries use Sinopharm/Beijing vaccine.
Q5: In the month of December 2020, which country was vaccinated most?
month_12 = vac_raw_df.month == 12
vac_12 = vac_raw_df[month_12]
US was the most vaccinated country in Dec 2020 with over 9 million people vaccinated with at least 1 dose.
Inferences and Conclusion
US is the highest vaccinated country followed by China and India.
Turkey, France and Indonesia have around same number of population vaccinated.
US is top most country with people fully vaccinated i.e. people have taken 2 doses of the vaccine.
Gibraltar has vaccinated all of its population.
More than half of the population is vaccinated in Falkland Islands, Seychelles, Isle of Man, Israel, Bhutan, Saint Helena, Wales, Maldives, Cayman Islands, UAE, and San Marino.
US and India top the list where people have taken vaccination atleast 1 time.
Pfizer/BioNTech and Moderna are the most vaccinated among all manufacturers.
Nauru with population around 11000 has got only 168 people vaccinated. The other 4 least vaccinated countries are Cameroon, Tonga, Armenia and Libya.
Pfizer/BioNTech was the most applied vaccine in the month of March 2021
The following countries use Sinopharm/Beijing vaccine: Cameroon, Equatorial Guinea, Gabon, Kyrgyzstan, Mauritania, Mozambique, Niger, Senegal, and Zimbabwe.
US was the most vaccinated country in Dec 2020 with over 9 million people vaccinated with atleast 1 dose.
References
Kaggle Dataset — https://www.kaggle.com/gpreda/covid-world-vaccination-progress