Usage: exploratory data analysis
Here, we will review the datasets downloaded and cleaned with DataLoader
class. Methods of this class produces the following class instances.
JHUData
: the number of confirmed/infected/fatal/recovered casesOxCGRTData
: indicators of government responses (OxCGRT)PCRData
: the number of testsVaccineData
: the number of vaccinations, people vaccinatedMobilityData
: percentage to baseline in visitsPyramidData
: population pyramidJapanData
: Japan-specific dataset
If you want to use a new dataset for your analysis, please kindly inform us with GitHub Issues: Request new method of DataLoader class.
LinelistData
(linelist of case reports) was deprecated with issue #866 at development version 2.22.0.PopulationData
(population values) was deprecated with issue #904 at development version 2.22.0.In this notebook, review the cleaned datasets one by one and visualize them.
Preparation
Import the packages. Please confirm that the latest version of covsirphy
was installed.
!pip install --upgrade covsirphy
[1]:
# !pip install covsirphy --upgrade
from pprint import pprint
import covsirphy as cs
cs.__version__
/home/runner/.pyenv/3.9/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
[1]:
'2.24.0'
Data cleaning classes will be produced with methods of DataLoader
class. Please specify the directory to save CSV files when creating DataLoader
instance. The default value of directory
is “input” and we will set “../input” here.
DataLoader
at Usage: data loading.[2]:
# Create DataLoader instance
loader = cs.DataLoader("../input")
Usage of methods will be explained in the following sections. If you want to download all datasets with copy & paste, please refer to Dataset preparation.
The number of cases (JHU style)
The main data for analysis is that of the number of cases. JHUData
class created with DataLoader.jhu()
method is for the number of confirmed/fatal/recovered cases. The number of infected cases will be calculated as “Confirmed - Recovered - Fatal” when data cleaning.
[3]:
# Create instance
jhu_data = loader.jhu()
Retrieving COVID-19 dataset in Japan from https://github.com/lisphilar/covid19-sir/data/japan
Retrieving datasets from COVID-19 Data Hub https://covid19datahub.io/
Please set verbose=2 to see the detailed citation list.
Retrieving datasets from Our World In Data https://github.com/owid/covid-19-data/
Retrieving datasets from COVID-19 Open Data by Google Cloud Platform https://github.com/GoogleCloudPlatform/covid-19-open-data
[4]:
# Check type
type(jhu_data)
[4]:
covsirphy.cleaning.jhu_data.JHUData
JHUData.citation
property shows the description of this dataset.
[5]:
print(jhu_data.citation)
Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan
(Secondary source) Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.
Detailed citation list is saved in DataLoader.covid19dh_citation
property. This is not a property of JHUData
. Because many links are included, the will not be shown in this tutorial.
[6]:
# Detailed citations (string)
# data_loader.covid19dh_citation
We can check the raw data with JHUData.raw
property.
[7]:
jhu_data.raw.tail()
[7]:
Date | ISO3 | Country | Province | Confirmed | Infected | Fatal | Recovered | Population | |
---|---|---|---|---|---|---|---|---|---|
1215023 | 2022-03-29 | ZWE | Zimbabwe | - | 246042.0 | NaN | 5439.0 | 82994.0 | 14439018.0 |
1215024 | 2022-03-30 | ZWE | Zimbabwe | - | 246182.0 | NaN | 5440.0 | 82994.0 | 14439018.0 |
1215025 | 2022-03-31 | ZWE | Zimbabwe | - | 246286.0 | NaN | 5444.0 | 82994.0 | 14439018.0 |
1215026 | 2022-04-01 | ZWE | Zimbabwe | - | 246414.0 | NaN | 5444.0 | 82994.0 | 14439018.0 |
1215027 | 2022-04-02 | ZWE | Zimbabwe | - | 246481.0 | NaN | 5446.0 | 82994.0 | 14439018.0 |
The cleaned dataset is here.
[8]:
jhu_data.cleaned().tail()
[8]:
Date | ISO3 | Country | Province | Confirmed | Infected | Fatal | Recovered | Population | |
---|---|---|---|---|---|---|---|---|---|
1215023 | 2022-03-29 | ZWE | Zimbabwe | - | 246042 | 157609 | 5439 | 82994 | 14439018 |
1215024 | 2022-03-30 | ZWE | Zimbabwe | - | 246182 | 157748 | 5440 | 82994 | 14439018 |
1215025 | 2022-03-31 | ZWE | Zimbabwe | - | 246286 | 157848 | 5444 | 82994 | 14439018 |
1215026 | 2022-04-01 | ZWE | Zimbabwe | - | 246414 | 157976 | 5444 | 82994 | 14439018 |
1215027 | 2022-04-02 | ZWE | Zimbabwe | - | 246481 | 158041 | 5446 | 82994 | 14439018 |
As you noticed, they are returned as a Pandas dataframe. Because tails are the latest values, pandas.DataFrame.tail()
was used for reviewing it.
Check the data types and memory usage as follows.
[9]:
jhu_data.cleaned().info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1215028 entries, 0 to 1215027
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 1215028 non-null datetime64[ns]
1 ISO3 1215028 non-null category
2 Country 1215028 non-null category
3 Province 1215028 non-null category
4 Confirmed 1215028 non-null int64
5 Infected 1215028 non-null int64
6 Fatal 1215028 non-null int64
7 Recovered 1215028 non-null int64
8 Population 1215028 non-null int64
dtypes: category(3), datetime64[ns](1), int64(5)
memory usage: 62.6 MB
Note that date is pandas.datetime64
, area names are pandas.Category
and the number of cases is numpy.int64
.
Total number of cases in all countries
JHUData.total()
returns total number of cases in all countries. Fatality and recovery rate are added.
[10]:
total_df = jhu_data.total()
# Show the oldest data
display(total_df.loc[total_df["Confirmed"] > 0].head())
# Show the latest data
display(total_df.tail())
Confirmed | Infected | Fatal | Recovered | Fatal per Confirmed | Recovered per Confirmed | Fatal per (Fatal or Recovered) | |
---|---|---|---|---|---|---|---|
Date | |||||||
2020-01-02 | 1 | 0 | 0 | 1 | 0.0 | 1.000000 | 0.0 |
2020-01-04 | 1 | 1 | 0 | 0 | 0.0 | 0.000000 | NaN |
2020-01-05 | 3 | 1 | 0 | 2 | 0.0 | 0.666667 | 0.0 |
2020-01-06 | 4 | 4 | 0 | 0 | 0.0 | 0.000000 | NaN |
2020-01-07 | 4 | 4 | 0 | 0 | 0.0 | 0.000000 | NaN |
Confirmed | Infected | Fatal | Recovered | Fatal per Confirmed | Recovered per Confirmed | Fatal per (Fatal or Recovered) | |
---|---|---|---|---|---|---|---|
Date | |||||||
2022-03-30 | 484811803 | 296505631 | 6154772 | 182151400 | 0.012695 | 0.375716 | 0.032685 |
2022-03-31 | 485862779 | 297426789 | 6157997 | 182277993 | 0.012674 | 0.375164 | 0.032680 |
2022-04-01 | 443643146 | 269739781 | 5515864 | 168387501 | 0.012433 | 0.379556 | 0.031718 |
2022-04-02 | 431051825 | 262097660 | 5363447 | 163590718 | 0.012443 | 0.379515 | 0.031745 |
2022-04-03 | 44493901 | 19602403 | 327077 | 24564421 | 0.007351 | 0.552085 | 0.013140 |
The first case (registered in the dataset) was 07Jan2020. COVID-19 outbreak is still ongoing.
We can create line plots with covsirphy.line_plot()
function.
[11]:
cs.line_plot(total_df[["Infected", "Fatal", "Recovered"]], "Total number of cases over time")

Statistics of fatality and recovery rate are here.
[12]:
total_df.loc[:, total_df.columns.str.contains("per")].describe().T
[12]:
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
Fatal per Confirmed | 822.0 | 0.027162 | 0.013084 | 0.0 | 0.020560 | 0.021731 | 0.031178 | 0.065194 |
Recovered per Confirmed | 822.0 | 0.468943 | 0.184317 | 0.0 | 0.395970 | 0.553999 | 0.583285 | 1.000000 |
Fatal per (Fatal or Recovered) | 815.0 | 0.120675 | 0.222446 | 0.0 | 0.034402 | 0.036802 | 0.066674 | 1.000000 |
Subset for area
JHUData.subset()
creates a subset for a specific area. We can select country name and province name. In this tutorial, “Japan” and “Tokyo in Japan” will be used. Please replace it with your country/province name.
[13]:
# Specify contry name
df, complement = jhu_data.records("Japan")
# Or, specify ISO3 code
# df, complement = jhu_data.records("JPN")
# Show records
display(df.tail())
# Show details of complement
print(complement)
Date | Confirmed | Infected | Fatal | Recovered | Susceptible | |
---|---|---|---|---|---|---|
783 | 2022-03-30 | 6452108 | 415942 | 27913 | 6008253 | 120076992 |
784 | 2022-03-31 | 6504873 | 428780 | 28010 | 6048083 | 120024227 |
785 | 2022-04-01 | 6552920 | 441767 | 28097 | 6083056 | 119976180 |
786 | 2022-04-02 | 6606464 | 455864 | 28200 | 6122400 | 119922636 |
787 | 2022-04-03 | 6653841 | 460801 | 28248 | 6164792 | 119875259 |
monotonic increasing complemented confirmed data and
monotonic increasing complemented fatal data and
partially complemented recovered data
Complement of records was performed. The second returned value is the description of complement. Details will be explained later and we can skip complement with auto_complement=False
argument. Or, use just use JHUData.subset()
method when the second returned value (False
because no complement) is un-necessary.
[14]:
# Skip complement
df, complement = jhu_data.records("Japan", auto_complement=False)
# Or,
# df = jhu_data.subset("Japan")
display(df.tail())
# Show complement (False because not complemented)
print(complement)
Date | Confirmed | Infected | Fatal | Recovered | Susceptible | |
---|---|---|---|---|---|---|
783 | 2022-03-30 | 6452108 | 415942 | 27913 | 6008253 | 120076992 |
784 | 2022-03-31 | 6504873 | 428780 | 28010 | 6048083 | 120024227 |
785 | 2022-04-01 | 6552920 | 441767 | 28097 | 6083056 | 119976180 |
786 | 2022-04-02 | 6606464 | 455864 | 28200 | 6122400 | 119922636 |
787 | 2022-04-03 | 6653841 | 460801 | 28248 | 6164792 | 119875259 |
False
Subset for a province (called “prefecture” in Japan):
[15]:
df, _ = jhu_data.records("Japan", province="Tokyo")
df.tail()
[15]:
Date | Confirmed | Infected | Fatal | Recovered | Susceptible | |
---|---|---|---|---|---|---|
742 | 2022-03-30 | 1242659 | 98205 | 4157 | 1140297 | 12700197 |
743 | 2022-03-31 | 1250651 | 101863 | 4169 | 1144619 | 12692205 |
744 | 2022-04-01 | 1258633 | 106345 | 4178 | 1148110 | 12684223 |
745 | 2022-04-02 | 1266028 | 106438 | 4182 | 1155408 | 12676828 |
746 | 2022-04-03 | 1273991 | 114392 | 4191 | 1155408 | 12668865 |
The list of countries can be checked with JHUdata.countries()
as follows.
[16]:
pprint(jhu_data.countries(), compact=True)
['Afghanistan', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola',
'Anguilla', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba',
'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh',
'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan',
'Bolivia', 'Bonaire, Sint Eustatius and Saba', 'Bosnia and Herzegovina',
'Botswana', 'Brazil', 'Brunei', 'Bulgaria', 'Burkina Faso', 'Burundi',
'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands',
'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Comoros',
'Cook Islands', 'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Curacao',
'Cyprus', 'Czech Republic', 'Democratic Republic of the Congo', 'Denmark',
'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt',
'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia',
'Falkland Islands (Malvinas)', 'Faroe Islands', 'Fiji', 'Finland', 'France',
'French Guiana', 'French Polynesia', 'Gabon', 'Gambia', 'Georgia', 'Germany',
'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guadeloupe', 'Guam',
'Guatemala', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Holy See',
'Honduras', 'Hong Kong', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran',
'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan',
'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Kosovo', 'Kuwait', 'Kyrgyzstan',
'Laos', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein',
'Lithuania', 'Luxembourg', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives',
'Mali', 'Malta', 'Marshall Islands', 'Martinique', 'Mauritania', 'Mauritius',
'Mayotte', 'Mexico', 'Micronesia', 'Moldova', 'Monaco', 'Mongolia',
'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia',
'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger',
'Nigeria', 'North Macedonia', 'Northern Mariana Islands', 'Norway', 'Oman',
'Pakistan', 'Palau', 'Palestine', 'Panama', 'Papua New Guinea', 'Paraguay',
'Peru', 'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar',
'Republic of the Congo', 'Romania', 'Russia', 'Rwanda', 'Réunion',
'Saint Helena, Ascension and Tristan da Cunha', 'Saint Kitts and Nevis',
'Saint Lucia', 'Saint Vincent and the Grenadines', 'Samoa', 'San Marino',
'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles',
'Sierra Leone', 'Singapore', 'Sint Maarten', 'Slovakia', 'Slovenia',
'Solomon Islands', 'Somalia', 'South Africa', 'South Korea', 'South Sudan',
'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Swaziland', 'Sweden',
'Switzerland', 'Syria', 'Taiwan', 'Tajikistan', 'Tanzania', 'Thailand',
'Timor-Leste', 'Togo', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey',
'Turks and Caicos Islands', 'Uganda', 'Ukraine', 'United Arab Emirates',
'United Kingdom', 'United States', 'Uruguay', 'Uzbekistan', 'Vanuatu',
'Venezuela', 'Vietnam', 'Virgin Islands, British', 'Virgin Islands, U.S.',
'Wallis and Futuna', 'Yemen', 'Zambia', 'Zimbabwe']
Complement
JHUData.records()
automatically complement the records, if necessary and auto_complement=True
(default). Each area can have either none or one or multiple complements, depending on the records and their preprocessing analysis.
We can show the specific kind of complements that were applied to the records of each country with JHUData.show_complement()
method. The possible kinds of complement for each country are the following:
“Monotonic_confirmed/fatal/recovered” (monotonic increasing complement) Force the variable show monotonic increasing.
“Full_recovered” (full complement of recovered data) Estimate the number of recovered cases using the value of estimated average recovery period.
“Partial_recovered” (partial complement of recovered data) When recovered values are not updated for some days, extrapolate the values.
For JHUData.show_complement()
, we can specify country names and province names.
[17]:
# Specify country name
jhu_data.show_complement(country="Japan")
# Or, specify country and province name
# jhu_data.show_complement(country="Japan", province="Tokyo")
[17]:
Country | Province | Monotonic_confirmed | Monotonic_fatal | Monotonic_recovered | Full_recovered | Partial_recovered | |
---|---|---|---|---|---|---|---|
0 | Japan | - | True | True | True | False | True |
When list was applied was country
argument, the all specified countries will be shown. If None
, all registered countries will be used.
[18]:
# Specify country names
jhu_data.show_complement(country=["Greece", "Japan"])
# Or, apply None
# jhu_data.show_complement(country=None)
[18]:
Country | Province | Monotonic_confirmed | Monotonic_fatal | Monotonic_recovered | Full_recovered | Partial_recovered | |
---|---|---|---|---|---|---|---|
0 | Greece | - | False | False | False | False | True |
1 | Japan | - | True | True | True | False | True |
If complement was performed incorrectly or you need new algorithms, kindly let us know via issue page.
Recovery period
We defined “recovery period” as yhe time period between case confirmation and recovery (as it is subjectively defined per country). With the global cases records, we estimate the average recovery period using JHUData.calculate_recovery_period()
.
[19]:
recovery_period = jhu_data.calculate_recovery_period()
print(f"Average recovery period: {recovery_period} [days]")
Average recovery period: 15 [days]
What we currently do is to calculate the difference between confirmed cases and fatal cases and try to match it to some recovered cases value in the future. We apply this method for every country that has valid recovery data and average the partial recovery periods in order to obtain a single (average) recovery period. During the calculations, we ignore time intervals that lead to very short (<7 days) or very long (>90 days) partial recovery periods, if these exist with high frequency (>50%) in the records. We have to assume temporarily invariable compartments for this analysis to extract an approximation of the average recovery period.
Alternatively, we had tried to use linelist of case reports to get precise value of recovery period (average of recovery date minus confirmation date for cases), but the number of records was too small.
Visualize the number of cases at a time point
We can visualize the number of cases with JHUData.map()
method. When country is None, global map will be shown.
Global map with country level data:
[20]:
# Global map with country level data
jhu_data.map(country=None, variable="Infected")
# To set included/exclude some countries
# jhu_data.map(country=None, variable="Infected", included=["Japan"])
# jhu_data.map(country=None, variable="Infected", excluded=["Japan"])
# To change the date
# jhu_data.map(country=None, variable="Infected", date="01Oct2021")

Values can be retrieved with .layer()
method.
[21]:
jhu_data.layer(country=None).tail()
[21]:
Date | ISO3 | Country | Confirmed | Infected | Fatal | Recovered | Population | |
---|---|---|---|---|---|---|---|---|
186122 | 2022-03-29 | ZWE | Zimbabwe | 246042 | 157609 | 5439 | 82994 | 14439018 |
186123 | 2022-03-30 | ZWE | Zimbabwe | 246182 | 157748 | 5440 | 82994 | 14439018 |
186124 | 2022-03-31 | ZWE | Zimbabwe | 246286 | 157848 | 5444 | 82994 | 14439018 |
186125 | 2022-04-01 | ZWE | Zimbabwe | 246414 | 157976 | 5444 | 82994 | 14439018 |
186126 | 2022-04-02 | ZWE | Zimbabwe | 246481 | 158041 | 5446 | 82994 | 14439018 |
Country map with province level data:
[22]:
# Country map with province level data
jhu_data.map(country="Japan", variable="Infected")
# To set included/exclude some countries
# jhu_data.map(country="Japan", variable="Infected", included=["Tokyo"])
# jhu_data.map(country="Japan", variable="Infected", excluded=["Tokyo"])
# To change the date
# jhu_data.map(country="Japan", variable="Infected", date="01Oct2021")

Values are here.
[23]:
jhu_data.layer(country="Japan").tail()
[23]:
Date | ISO3 | Country | Province | Confirmed | Infected | Fatal | Recovered | Population | |
---|---|---|---|---|---|---|---|---|---|
38848 | 2022-03-30 | JPN | Japan | Yamanashi | 22826 | 1653 | 63 | 21110 | 812056 |
38849 | 2022-03-31 | JPN | Japan | Yamanashi | 23127 | 1751 | 64 | 21312 | 812056 |
38850 | 2022-04-01 | JPN | Japan | Yamanashi | 23382 | 1793 | 64 | 21525 | 812056 |
38851 | 2022-04-02 | JPN | Japan | Yamanashi | 23382 | 1793 | 64 | 21525 | 812056 |
38852 | 2022-04-03 | JPN | Japan | Yamanashi | 24119 | 2536 | 64 | 21519 | 812056 |
OxCGRT indicators
Government responses are tracked with Oxford Covid-19 Government Response Tracker (OxCGRT). Because government responses and activities of persons change the parameter values of SIR-derived models, this dataset is significant when we try to forecast the number of cases. OxCGRTData
class will be created with DataLoader.oxcgrt()
method.
OxCGRT indicators are
school_closing,
workplace_closing,
cancel_events,
gatherings_restrictions,
transport_closing,
stay_home_restrictions,
internal_movement_restrictions,
international_movement_restrictions,
information_campaigns,
testing_policy, and
contact_tracing.
[24]:
oxcgrt_data = loader.oxcgrt()
[25]:
type(oxcgrt_data)
[25]:
covsirphy.cleaning.oxcgrt.OxCGRTData
Because records will be retrieved via “COVID-19 Data Hub” as well as JHUData
, data description and raw data is the same.
[26]:
# Description
print(oxcgrt_data.citation)
# Raw
# oxcgrt_data.raw.tail()
The cleaned dataset is here.
[27]:
oxcgrt_data.cleaned().tail()
[27]:
Date | ISO3 | Country | Province | School_closing | Workplace_closing | Cancel_events | Gatherings_restrictions | Transport_closing | Stay_home_restrictions | Internal_movement_restrictions | International_movement_restrictions | Information_campaigns | Testing_policy | Contact_tracing | Stringency_index | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1219264 | 2022-01-15 | GRL | Greenland | Syddanmark | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | -1.0 | 2.0 | 2.0 | 3.0 | 2.0 | -38.89 |
1219265 | 2022-01-16 | GRL | Greenland | Syddanmark | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | -1.0 | 2.0 | 2.0 | 3.0 | 2.0 | -38.89 |
1219266 | 2022-01-17 | GRL | Greenland | Syddanmark | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | -1.0 | 2.0 | 2.0 | 3.0 | 2.0 | -38.89 |
1219267 | 2022-01-18 | GRL | Greenland | Syddanmark | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | -1.0 | 2.0 | 2.0 | 3.0 | 2.0 | -38.89 |
1219268 | 2022-01-19 | GRL | Greenland | Syddanmark | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | -1.0 | 2.0 | 2.0 | 3.0 | 2.0 | -38.89 |
Subset for area
PopulationData.subset()
creates a subset for a specific area. We can select only country name. Note that province level data is not registered in OxCGRTData
.
[28]:
oxcgrt_data.subset("Japan").tail()
# Or, with ISO3 code
# oxcgrt_data.subset("JPN").tail()
[28]:
Date | School_closing | Workplace_closing | Cancel_events | Gatherings_restrictions | Transport_closing | Stay_home_restrictions | Internal_movement_restrictions | International_movement_restrictions | Information_campaigns | Testing_policy | Contact_tracing | Stringency_index | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
798 | 2022-03-30 | 1.0 | 1.0 | 1.0 | 1.0 | -1.0 | 1.0 | 1.0 | 4.0 | 2.0 | 1.0 | 1.0 | 47.22 |
799 | 2022-03-31 | 1.0 | 1.0 | 1.0 | 1.0 | -1.0 | 1.0 | 1.0 | 4.0 | 2.0 | 1.0 | 1.0 | 47.22 |
800 | 2022-04-01 | 1.0 | 1.0 | 1.0 | 1.0 | -1.0 | 1.0 | 1.0 | 4.0 | 2.0 | 1.0 | 1.0 | 47.22 |
801 | 2022-04-02 | 1.0 | 1.0 | 1.0 | 1.0 | -1.0 | 1.0 | 1.0 | 4.0 | 2.0 | 1.0 | 1.0 | 47.22 |
802 | 2022-04-03 | 1.0 | 1.0 | 1.0 | 1.0 | -1.0 | 1.0 | 1.0 | 4.0 | 2.0 | 1.0 | 1.0 | 47.22 |
Visualize indicator values
We can visualize indicator values with .map()
method. Arguments are the same as JHUData.map()
, but country name cannot be specified.
[29]:
oxcgrt_data.map(variable="Stringency_index")

Values are here.
[30]:
oxcgrt_data.layer().tail()
[30]:
Date | ISO3 | Country | School_closing | Workplace_closing | Cancel_events | Gatherings_restrictions | Transport_closing | Stay_home_restrictions | Internal_movement_restrictions | International_movement_restrictions | Information_campaigns | Testing_policy | Contact_tracing | Stringency_index | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
186924 | 2022-03-29 | GRL | Greenland | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | -1.0 | 1.0 | 2.0 | 3.0 | 2.0 | 13.89 |
186925 | 2022-03-30 | GRL | Greenland | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | -1.0 | 1.0 | 2.0 | 3.0 | 2.0 | 13.89 |
186926 | 2022-03-31 | GRL | Greenland | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | -1.0 | 1.0 | 2.0 | 3.0 | 2.0 | 13.89 |
186927 | 2022-04-01 | GRL | Greenland | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | -1.0 | 1.0 | 2.0 | 3.0 | 2.0 | 13.89 |
186928 | 2022-04-02 | GRL | Greenland | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | -1.0 | 1.0 | 2.0 | 3.0 | 2.0 | 13.89 |
The number of tests
The number of tests is also key information to understand the situation. PCRData
class will be created with DataLoader.pcr()
method.
[31]:
pcr_data = loader.pcr()
[32]:
type(pcr_data)
[32]:
covsirphy.cleaning.pcr_data.PCRData
Because records will be retrieved via “COVID-19 Data Hub” as well as JHUData
, data description and raw data is the same.
[33]:
# Description
print(pcr_data.citation)
# Raw
# pcr_data.raw.tail()
Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan
(Secondary source) Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.
Hasell, J., Mathieu, E., Beltekian, D. et al. A cross-country database of COVID-19 testing. Sci Data 7, 345 (2020). https://doi.org/10.1038/s41597-020-00688-8
The cleaned dataset is here.
[34]:
pcr_data.cleaned().tail()
[34]:
Date | Country | Province | Tests | Confirmed | |
---|---|---|---|---|---|
1215023 | 2022-03-29 | Zimbabwe | - | 2176708 | 246042 |
1215024 | 2022-03-30 | Zimbabwe | - | 2176708 | 246182 |
1215025 | 2022-03-31 | Zimbabwe | - | 2176708 | 246286 |
1215026 | 2022-04-01 | Zimbabwe | - | 2185767 | 246414 |
1215027 | 2022-04-02 | Zimbabwe | - | 2185767 | 246481 |
Subset for area
PCRData.subset()
creates a subset for a specific area. We can select country name and province name.
[35]:
pcr_data.subset("Japan").tail()
# Or, with ISO3 code
# pcr_data.subset("JPN").tail()
[35]:
Date | Tests | Tests_diff | Confirmed | |
---|---|---|---|---|
783 | 2022-03-30 | 43110998 | 126369 | 6452108 |
784 | 2022-03-31 | 43265925 | 154927 | 6504873 |
785 | 2022-04-01 | 43413502 | 147577 | 6552920 |
786 | 2022-04-02 | 43606340 | 192838 | 6606464 |
787 | 2022-04-03 | 43717117 | 110777 | 6653841 |
Positive rate
Under the assumption that all tests were PCR test, we can calculate the positive rate of PCR tests as “the number of confirmed cases per the number of tests” with PCRData.positive_rate()
method.
[36]:
pcr_data.positive_rate("Japan").tail()

[36]:
Date | ISO3 | Country | Province | Tests | Confirmed | Tests_diff | Confirmed_diff | Test_positive_rate | |
---|---|---|---|---|---|---|---|---|---|
782 | 2022-03-30 | JPN | Japan | - | 43110998 | 6452108 | 128746.000000 | 42699.571429 | 33.165746 |
783 | 2022-03-31 | JPN | Japan | - | 43265925 | 6504873 | 127702.285714 | 45008.142857 | 35.244587 |
784 | 2022-04-01 | JPN | Japan | - | 43413502 | 6552920 | 127498.142857 | 44863.000000 | 35.187179 |
785 | 2022-04-02 | JPN | Japan | - | 43606340 | 6606464 | 131789.428571 | 45622.428571 | 34.617669 |
786 | 2022-04-03 | JPN | Japan | - | 43717117 | 6653841 | 132673.142857 | 45669.571429 | 34.422620 |
Visualize the number of tests
We can visualize the number of tests with .map()
method. When country is None, global map will be shown. Arguments are the same as JHUData
, but variable name cannot be specified.
Country level data:
[37]:
pcr_data.map(country=None)

Values are here.
[38]:
pcr_data.layer(country=None).tail()
[38]:
Date | ISO3 | Country | Tests | Confirmed | |
---|---|---|---|---|---|
98081 | 2022-03-29 | ZWE | Zimbabwe | 2176708 | 246042 |
98082 | 2022-03-30 | ZWE | Zimbabwe | 2176708 | 246182 |
98083 | 2022-03-31 | ZWE | Zimbabwe | 2176708 | 246286 |
98084 | 2022-04-01 | ZWE | Zimbabwe | 2185767 | 246414 |
98085 | 2022-04-02 | ZWE | Zimbabwe | 2185767 | 246481 |
Province level data:
[39]:
pcr_data.map(country="Japan")

Values are here.
[40]:
pcr_data.layer(country="Japan").tail()
[40]:
Date | ISO3 | Country | Province | Tests | Confirmed | |
---|---|---|---|---|---|---|
35648 | 2022-03-30 | JPN | Japan | Yamanashi | 239637 | 22826 |
35649 | 2022-03-31 | JPN | Japan | Yamanashi | 239637 | 23127 |
35650 | 2022-04-01 | JPN | Japan | Yamanashi | 248105 | 23382 |
35651 | 2022-04-02 | JPN | Japan | Yamanashi | 248105 | 23382 |
35652 | 2022-04-03 | JPN | Japan | Yamanashi | 248105 | 24119 |
Vaccinations
Vaccinations is a key factor to end the outbreak as soon as possible. VaccineData
class will be created with DataLoader.vaccine()
method.
[41]:
vaccine_data = loader.vaccine()
[42]:
type(vaccine_data)
[42]:
covsirphy.cleaning.vaccine_data.VaccineData
Description is here.
[43]:
print(vaccine_data.citation)
Hasell, J., Mathieu, E., Beltekian, D. et al. A cross-country database of COVID-19 testing. Sci Data 7, 345 (2020). https://doi.org/10.1038/s41597-020-00688-8
Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan
Raw data is here.
[44]:
vaccine_data.raw.tail()
[44]:
Date | ISO3 | Country | Province | Product | Vaccinations | Vaccinations_boosters | Vaccinated_once | Vaccinated_full | |
---|---|---|---|---|---|---|---|---|---|
1215021 | 2022-03-27 | ZWE | Zimbabwe | - | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 8845039.0 | 433129.0 | 4918147.0 | 3493763.0 |
1215022 | 2022-03-28 | ZWE | Zimbabwe | - | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 8934360.0 | 457434.0 | 4975433.0 | 3501493.0 |
1215023 | 2022-03-29 | ZWE | Zimbabwe | - | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9039729.0 | 476359.0 | 5053114.0 | 3510256.0 |
1215024 | 2022-03-30 | ZWE | Zimbabwe | - | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9202369.0 | 505132.0 | 5175175.0 | 3522062.0 |
1215025 | 2022-03-31 | ZWE | Zimbabwe | - | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9368822.0 | 539033.0 | 5297081.0 | 3532708.0 |
The next is the cleaned dataset.
[45]:
vaccine_data.cleaned().tail()
[45]:
Date | ISO3 | Country | Province | Product | Vaccinations | Vaccinations_boosters | Vaccinated_once | Vaccinated_full | |
---|---|---|---|---|---|---|---|---|---|
173355 | 2022-03-30 | ZWE | Zimbabwe | - | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9202369.0 | 505132.0 | 5175175.0 | 3522062.0 |
173356 | 2022-03-31 | ZWE | Zimbabwe | - | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9368822.0 | 539033.0 | 5297081.0 | 3532708.0 |
173357 | 2022-04-01 | ZWE | Zimbabwe | - | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9368822.0 | 539033.0 | 5297081.0 | 3532708.0 |
173358 | 2022-04-02 | ZWE | Zimbabwe | - | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9368822.0 | 539033.0 | 5297081.0 | 3532708.0 |
173359 | 2022-04-03 | ZWE | Zimbabwe | - | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9368822.0 | 539033.0 | 5297081.0 | 3532708.0 |
Note for variables
Definition of variables are as follows.
Vaccinations: cumulative number of vaccinations
Vaccinations_boosters: cumulative number of booster vaccinations
Vaccinated_once: cumulative number of people who received at least one vaccine dose
Vaccinated_full: cumulative number of people who received all doses prescribed by the protocol
Registered countries can be checked with VaccineData.countries()
method.
[46]:
pprint(vaccine_data.countries(), compact=True)
['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola', 'Anguilla',
'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria',
'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus',
'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia',
'Bonaire Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana',
'Brazil', 'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso',
'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands',
'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Comoros',
'Cook Islands', 'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Curacao',
'Cyprus', 'Czechia', 'Democratic Republic of Congo', 'Denmark', 'Djibouti',
'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador',
'Equatorial Guinea', 'Estonia', 'Eswatini', 'Ethiopia', 'Faeroe Islands',
'Falkland Islands', 'Fiji', 'Finland', 'France', 'French Polynesia', 'Gabon',
'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland',
'Grenada', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-Bissau', 'Guyana',
'Haiti', 'Honduras', 'Hong Kong', 'Hungary', 'Iceland', 'India', 'Indonesia',
'Iran', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica',
'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Kuwait',
'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya',
'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao', 'Madagascar', 'Malawi',
'Malaysia', 'Maldives', 'Mali', 'Malta', 'Mauritania', 'Mauritius', 'Mexico',
'Moldova', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco',
'Mozambique', 'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands',
'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Niue',
'North Macedonia', 'Norway', 'Oman', 'Pakistan', 'Palestine', 'Panama',
'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Pitcairn', 'Poland',
'Portugal', 'Qatar', 'Republic of the Congo', 'Romania', 'Russia', 'Rwanda',
'Saint Helena', 'Saint Kitts and Nevis', 'Saint Lucia',
'Saint Vincent and the Grenadines', 'Samoa', 'San Marino',
'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles',
'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia',
'Slovenia', 'Solomon Islands', 'Somalia', 'South Africa', 'South Korea',
'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Sweden',
'Switzerland', 'Syria', 'Taiwan', 'Tajikistan', 'Tanzania', 'Thailand',
'Timor', 'Togo', 'Tokelau', 'Tonga', 'Trinidad and Tobago', 'Tunisia',
'Turkey', 'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu', 'Uganda',
'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States',
'Uruguay', 'Uzbekistan', 'Vanuatu', 'Venezuela', 'Vietnam',
'Wallis and Futuna', 'Yemen', 'Zambia', 'Zimbabwe']
Subset for area
VaccineData.subset()
creates a subset for a specific area. We can select only country name. Note that province level data is not registered.
[47]:
vaccine_data.subset("Japan").tail()
# Or, with ISO3 code
# vaccine_data.subset("JPN").tail()
[47]:
Date | Vaccinations | Vaccinations_boosters | Vaccinated_once | Vaccinated_full | |
---|---|---|---|---|---|
1571 | 2022-04-01 | 298700671.0 | 52521088.0 | 137629998.0 | 108549585.0 |
1572 | 2022-04-02 | 298700671.0 | 52521088.0 | 137629998.0 | 108549585.0 |
1573 | 2022-04-02 | 298700671.0 | 52521088.0 | 137629998.0 | 108549585.0 |
1574 | 2022-04-03 | 298700671.0 | 52521088.0 | 137629998.0 | 108549585.0 |
1575 | 2022-04-03 | 298700671.0 | 52521088.0 | 137629998.0 | 108549585.0 |
Visualize the number of vaccinations
We can visualize the number of vaccinations and the other variables with .map()
method. Arguments are the same as JHUData
, but country name cannot be specified.
[48]:
vaccine_data.map()

Values are here.
[49]:
vaccine_data.layer().tail()
[49]:
Date | ISO3 | Country | Product | Vaccinations | Vaccinations_boosters | Vaccinated_once | Vaccinated_full | |
---|---|---|---|---|---|---|---|---|
171779 | 2022-03-30 | ZWE | Zimbabwe | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9202369.0 | 505132.0 | 5175175.0 | 3522062.0 |
171780 | 2022-03-31 | ZWE | Zimbabwe | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9368822.0 | 539033.0 | 5297081.0 | 3532708.0 |
171781 | 2022-04-01 | ZWE | Zimbabwe | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9368822.0 | 539033.0 | 5297081.0 | 3532708.0 |
171782 | 2022-04-02 | ZWE | Zimbabwe | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9368822.0 | 539033.0 | 5297081.0 | 3532708.0 |
171783 | 2022-04-03 | ZWE | Zimbabwe | Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... | 9368822.0 | 539033.0 | 5297081.0 | 3532708.0 |
Mobility
Levels of mobility is a key factor of \(\rho\) (effective contact rate) of SIR-derived ODE models. MobilityData
class will be created with DataLoader.mobility()
method.
[50]:
mobility_data = loader.mobility()
[51]:
type(mobility_data)
[51]:
covsirphy.cleaning.mobility_data.MobilityData
Description is here.
[52]:
print(mobility_data.citation)
O. Wahltinez and others (2020), COVID-19 Open-Data: curating a fine-grained, global-scale data repository for SARS-CoV-2, Work in progress, https://goo.gle/covid-19-open-data
Raw data is here.
[53]:
mobility_data.raw.tail()
[53]:
Date | ISO3 | Country | Province | Mobility_grocery_and_pharmacy | Mobility_parks | Mobility_transit_stations | Mobility_retail_and_recreation | Mobility_residential | Mobility_workplaces | |
---|---|---|---|---|---|---|---|---|---|---|
1215020 | 2022-03-26 | ZWE | Zimbabwe | - | 209.0 | 277.0 | 194.0 | 199.0 | 109.0 | 186.0 |
1215021 | 2022-03-27 | ZWE | Zimbabwe | - | 219.0 | 314.0 | 202.0 | 213.0 | 112.0 | 208.0 |
1215022 | 2022-03-28 | ZWE | Zimbabwe | - | 197.0 | 283.0 | 201.0 | 192.0 | 112.0 | 177.0 |
1215023 | 2022-03-29 | ZWE | Zimbabwe | - | 206.0 | 282.0 | 195.0 | 195.0 | 112.0 | 177.0 |
1215024 | 2022-03-30 | ZWE | Zimbabwe | - | 210.0 | 284.0 | 206.0 | 198.0 | 112.0 | 177.0 |
The next is the cleaned dataset.
[54]:
mobility_data.cleaned().tail()
[54]:
Date | ISO3 | Country | Province | Mobility_grocery_and_pharmacy | Mobility_parks | Mobility_transit_stations | Mobility_retail_and_recreation | Mobility_residential | Mobility_workplaces | |
---|---|---|---|---|---|---|---|---|---|---|
1215020 | 2022-03-26 | ZWE | Zimbabwe | - | 209 | 277 | 194 | 199 | 109 | 186 |
1215021 | 2022-03-27 | ZWE | Zimbabwe | - | 219 | 314 | 202 | 213 | 112 | 208 |
1215022 | 2022-03-28 | ZWE | Zimbabwe | - | 197 | 283 | 201 | 192 | 112 | 177 |
1215023 | 2022-03-29 | ZWE | Zimbabwe | - | 206 | 282 | 195 | 195 | 112 | 177 |
1215024 | 2022-03-30 | ZWE | Zimbabwe | - | 210 | 284 | 206 | 198 | 112 | 177 |
Note for variables
Definition of variables are as follows.
Mobility_grocery_and_pharmacy (int): % to baseline in visits (grocery markets, pharmacies etc.)
Mobility_parks (int): % to baseline in visits (parks etc.)
Mobility_transit_stations (int): % to baseline in visits (public transport hubs etc.)
Mobility_retail_and_recreation (int): % to baseline in visits (restaurant, museums etc.)
Mobility_residential (int): % to baseline in visits (places of residence)
Mobility_workplaces (int): % to baseline in visits (places of work)
Registered countries can be checked with MobilityData.countries()
method.
[55]:
pprint(mobility_data.countries(), compact=True)
['Afghanistan', 'Angola', 'Antigua and Barbuda', 'Argentina', 'Aruba',
'Australia', 'Austria', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados',
'Belarus', 'Belgium', 'Belize', 'Benin', 'Bolivia', 'Bosnia and Herzegovina',
'Botswana', 'Brazil', 'Bulgaria', 'Burkina Faso', 'Cambodia', 'Cameroon',
'Canada', 'Cape Verde', 'Chile', 'Colombia', 'Costa Rica', "Cote d'Ivoire",
'Croatia', 'Czech Republic', 'Denmark', 'Dominican Republic', 'Ecuador',
'Egypt', 'El Salvador', 'Estonia', 'Fiji', 'Finland', 'France', 'Gabon',
'Germany', 'Ghana', 'Greece', 'Guatemala', 'Haiti', 'Honduras', 'Hungary',
'India', 'Indonesia', 'Iraq', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jordan',
'Kazakhstan', 'Kenya', 'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon',
'Libya', 'Lithuania', 'Luxembourg', 'Malaysia', 'Mali', 'Malta', 'Mauritius',
'Mexico', 'Moldova', 'Mongolia', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia',
'Nepal', 'Netherlands', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria',
'Norway', 'Oman', 'Pakistan', 'Panama', 'Paraguay', 'Peru', 'Philippines',
'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Romania', 'Russia', 'Rwanda',
'Saudi Arabia', 'Senegal', 'Serbia', 'Singapore', 'Slovakia', 'Slovenia',
'South Africa', 'South Korea', 'Spain', 'Sri Lanka', 'Sweden', 'Switzerland',
'Taiwan', 'Tajikistan', 'Tanzania', 'Thailand', 'Togo', 'Trinidad and Tobago',
'Turkey', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom',
'United States of America', 'Uruguay', 'Venezuela', 'Vietnam', 'Yemen',
'Zambia', 'Zimbabwe']
Subset for area
MobilityData.subset()
creates a subset for a specific area (country/province).
Subset for a country: We can use both of country names and ISO3 codes.
[56]:
mobility_data.subset("Japan").tail()
# Or, with ISO3 code
# mobility_data.subset("JPN").tail()
[56]:
Date | Mobility_grocery_and_pharmacy | Mobility_parks | Mobility_transit_stations | Mobility_retail_and_recreation | Mobility_residential | Mobility_workplaces | |
---|---|---|---|---|---|---|---|
770 | 2022-03-26 | 96 | 62 | 78 | 86 | 106 | 91 |
771 | 2022-03-27 | 105 | 108 | 86 | 95 | 102 | 96 |
772 | 2022-03-28 | 101 | 99 | 83 | 95 | 105 | 90 |
773 | 2022-03-29 | 102 | 100 | 81 | 97 | 105 | 89 |
774 | 2022-03-30 | 104 | 115 | 82 | 98 | 104 | 89 |
Visualize mobility data
We can visualize the levels of mobility with MobilityData.map()
method. Arguments are the same as JHUData
.
[57]:
mobility_data.map(country=None)

Values are here.
[58]:
mobility_data.layer().tail()
[58]:
Date | ISO3 | Country | Mobility_grocery_and_pharmacy | Mobility_parks | Mobility_transit_stations | Mobility_retail_and_recreation | Mobility_residential | Mobility_workplaces | |
---|---|---|---|---|---|---|---|---|---|
96441 | 2022-03-26 | ZWE | Zimbabwe | 209 | 277 | 194 | 199 | 109 | 186 |
96442 | 2022-03-27 | ZWE | Zimbabwe | 219 | 314 | 202 | 213 | 112 | 208 |
96443 | 2022-03-28 | ZWE | Zimbabwe | 197 | 283 | 201 | 192 | 112 | 177 |
96444 | 2022-03-29 | ZWE | Zimbabwe | 206 | 282 | 195 | 195 | 112 | 177 |
96445 | 2022-03-30 | ZWE | Zimbabwe | 210 | 284 | 206 | 198 | 112 | 177 |
Population pyramid
With population pyramid, we can divide the population to sub-groups. This will be useful when we analyse the meaning of parameters. For example, how many days go out is different between the sub-groups. PyramidData
class will be created with DataLoader.pyramid()
method.
[59]:
pyramid_data = loader.pyramid()
[60]:
type(pyramid_data)
[60]:
covsirphy.cleaning.pyramid.PopulationPyramidData
Description is here.
[61]:
print(pyramid_data.citation)
World Bank Group (2020), World Bank Open Data, https://data.worldbank.org/
Raw dataset is not registered. Subset will be retrieved when PyramidData.subset()
was called.
[62]:
pyramid_data.subset("Japan").tail()
Retrieving population pyramid dataset (Japan) from https://data.worldbank.org/
[62]:
Age | Population | Per_total | |
---|---|---|---|
113 | 118 | 262648 | 0.00225 |
114 | 119 | 262648 | 0.00225 |
115 | 120 | 262648 | 0.00225 |
116 | 121 | 262648 | 0.00225 |
117 | 122 | 262648 | 0.00225 |
“Per_total” is the proportion of the age group in the total population.
Japan-specific dataset
This includes the number of confirmed/infected/fatal/recovered/tests/moderate/severe cases at country/prefecture level and metadata of each prefecture (province). JapanData
class will be created with DataLoader.japan()
method.
[63]:
japan_data = loader.japan()
[64]:
type(japan_data)
[64]:
covsirphy.cleaning.japan_data.JapanData
Description is here.
[65]:
print(japan_data.citation)
Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan
The next is the cleaned dataset.
[66]:
japan_data.cleaned().tail()
[66]:
Date | Country | Province | Confirmed | Infected | Fatal | Recovered | Tests | Moderate | Severe | Vaccinations | Vaccinations_boosters | Vaccinated_once | Vaccinated_full | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
36633 | 2022-04-02 | Japan | Nagasaki | 33504 | 1499 | 116 | 31889 | 308815 | 3000 | 1 | 0 | 0 | 0 | 0 |
36634 | 2022-04-02 | Japan | Okinawa | 124639 | 7667 | 441 | 116531 | 664154 | 7664 | 8 | 0 | 0 | 0 | 0 |
36635 | 2022-04-02 | Japan | Yamanashi | 23382 | 1793 | 64 | 21525 | 248105 | 1798 | 1 | 0 | 0 | 0 | 0 |
36636 | 2022-04-03 | Japan | - | 6653841 | 460801 | 28248 | 6164792 | 43717117 | 444917 | 510 | 58919234270 | 1800280086 | 32796114518 | 24322839666 |
36637 | 2022-04-03 | Japan | Entering | 14266 | 1062 | 8 | 13196 | 1702306 | 1062 | 0 | 233882625393 | 41124011904 | 107764288434 | 84994325055 |
Visualize values
We can visualize the values with .map()
method. Arguments are the same as JHUData
.
[67]:
japan_data.map(variable="Severe")

Values are here.
[68]:
japan_data.layer(country="Japan").tail()
[68]:
Date | Country | Province | Confirmed | Infected | Fatal | Recovered | Tests | Moderate | Severe | Vaccinations | Vaccinations_boosters | Vaccinated_once | Vaccinated_full | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
35845 | 2022-04-02 | Japan | Osaka | 802145 | 37009 | 4714 | 760422 | 6224907 | 36696 | 291 | 0 | 0 | 0 | 0 |
35846 | 2022-04-02 | Japan | Nagasaki | 33504 | 1499 | 116 | 31889 | 308815 | 3000 | 1 | 0 | 0 | 0 | 0 |
35847 | 2022-04-02 | Japan | Okinawa | 124639 | 7667 | 441 | 116531 | 664154 | 7664 | 8 | 0 | 0 | 0 | 0 |
35848 | 2022-04-02 | Japan | Yamanashi | 23382 | 1793 | 64 | 21525 | 248105 | 1798 | 1 | 0 | 0 | 0 | 0 |
35849 | 2022-04-03 | Japan | Entering | 14266 | 1062 | 8 | 13196 | 1702306 | 1062 | 0 | 233882625393 | 41124011904 | 107764288434 | 84994325055 |
Map with country level data is not prepared, but country level data can be retrieved.
[69]:
japan_data.layer(country=None).tail()
[69]:
Date | Country | Confirmed | Infected | Fatal | Recovered | Tests | Moderate | Severe | Vaccinations | Vaccinations_boosters | Vaccinated_once | Vaccinated_full | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
783 | 2022-03-30 | Japan | 6452108 | 415942 | 27913 | 6008253 | 43110998 | 401231 | 655 | 57725021813 | 1590785961 | 32245594526 | 23888641326 |
784 | 2022-03-31 | Japan | 6504873 | 428780 | 28010 | 6048083 | 43265925 | 411013 | 627 | 58023132257 | 1642716822 | 32383224524 | 23997190911 |
785 | 2022-04-01 | Japan | 6552920 | 441767 | 28097 | 6083056 | 43413502 | 423151 | 533 | 58321832928 | 1695237910 | 32520854522 | 24105740496 |
786 | 2022-04-02 | Japan | 6606464 | 455864 | 28200 | 6122400 | 43606340 | 440031 | 518 | 58620533599 | 1747758998 | 32658484520 | 24214290081 |
787 | 2022-04-03 | Japan | 6653841 | 460801 | 28248 | 6164792 | 43717117 | 444917 | 510 | 58919234270 | 1800280086 | 32796114518 | 24322839666 |
Metadata
Additionally, JapanData.meta()
retrieves meta data for Japan prefectures.
[70]:
japan_data.meta().tail()
Retrieving Metadata of Japan dataset from https://github.com/lisphilar/covid19-sir/data/japan
[70]:
Prefecture | Admin_Capital | Admin_Region | Admin_Num | Area_Habitable | Area_Total | Clinic_bed_Care | Clinic_bed_Total | Hospital_bed_Care | Hospital_bed_Specific | Hospital_bed_Total | Hospital_bed_Tuberculosis | Hospital_bed_Type-I | Hospital_bed_Type-II | Population_Female | Population_Male | Population_Total | Location_Latitude | Location_Longitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
42 | Kumamoto | Kumamoto | Kyushu | 43 | 2796 | 7409 | 497 | 4628 | 8340 | 0 | 33710 | 95 | 2 | 46 | 933 | 833 | 1765 | 32.790513 | 130.742388 |
43 | Oita | Oita | Kyushu | 44 | 1799 | 6341 | 269 | 3561 | 2618 | 0 | 19834 | 50 | 2 | 38 | 607 | 546 | 1152 | 33.238391 | 131.612658 |
44 | Miyazaki | Miyazaki | Kyushu | 45 | 1850 | 7735 | 206 | 2357 | 3682 | 0 | 18769 | 33 | 1 | 30 | 577 | 512 | 1089 | 31.911188 | 131.423873 |
45 | Kagoshima | Kagoshima | Kyushu | 46 | 3313 | 9187 | 652 | 4827 | 7750 | 0 | 32651 | 98 | 1 | 44 | 863 | 763 | 1626 | 31.560052 | 130.557745 |
46 | Okinawa | Naha | Okinawa | 47 | 1169 | 2281 | 83 | 914 | 3804 | 0 | 18710 | 47 | 4 | 20 | 734 | 709 | 1443 | 26.211761 | 127.681119 |