Usage: exploratory data analysis

Open In Colab

Here, we will review the datasets downladed and cleaned with DataLoader class. Methods of this class produces the following class instances.

  1. JHUData: the number of confirmed/infected/fatal/recovored cases

  2. OxCGRTData: indicators of government responses (OxCGRT)

  3. PCRData: the number of tests

  4. VaccineData: the number of vaccinations, people vaccinated

  5. MobilityData: percentage to baseline in visits

  6. PyramidData: population pyramid

  7. JapanData: Japan-specific dataset

If you want to use a new dataset for your analysis, please kindly inform us with GitHub Issues: Request new method of DataLoader class.

Note:
LinelistData (linelist of case reports) was deprecated with issue #866 at development version 2.22.0.
Note:
PopulationData (population values) was deprecated with issue #904 at development version 2.22.0.

In this notebook, review the cleaned datasets one by one and visualize them.

Preparation

Import the packages.

[1]:
# !pip install covsirphy --upgrade
from pprint import pprint
import covsirphy as cs
cs.__version__
[1]:
'2.22.0-beta'

Data cleaning classes will be produced with methods of DataLoader class. Please specify the directory to save CSV files when creating DataLoader instance. The default value of directory is “input” and we will set “../input” here.

Note:
Please find the details of DataLoader at Usage: data loading.
[2]:
# Create DataLoader instance
loader = cs.DataLoader("../input")

Usage of methods will be explained in the following sections. If you want to download all datasets with copy & paste, please refer to Dataset preparation.

The number of cases (JHU style)

The main data for analysis is that of the number of cases. JHUData class created with DataLoader.jhu() method is for the number of confirmed/fatal/recovered cases. The number of infected cases will be calculated as “Confirmed - Recovered - Fatal” when data cleaning.

[3]:
# Create instance
jhu_data = loader.jhu()
Retrieving COVID-19 dataset in Japan from https://github.com/lisphilar/covid19-sir/data/japan
Retrieving datasets from COVID-19 Data Hub https://covid19datahub.io/
        Please set verbose=2 to see the detailed citation list.
Retrieving datasets from Our World In Data https://github.com/owid/covid-19-data/
Retrieving datasets from COVID-19 Open Data by Google Cloud Platform https://github.com/GoogleCloudPlatform/covid-19-open-data
[4]:
# Check type
type(jhu_data)
[4]:
covsirphy.cleaning.jhu_data.JHUData

JHUData.citation property shows the description of this dataset.

[5]:
print(jhu_data.citation)
Hirokazu Takaya (2020-2021), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan
(Secondary source) Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.

Detailed citation list is saved in DataLoader.covid19dh_citation property. This is not a property of JHUData. Because many links are included, the will not be shown in this tutorial.

[6]:
# Detailed citations (string)
# data_loader.covid19dh_citation

We can check the raw data with JHUData.raw property.

[7]:
jhu_data.raw.tail()
[7]:
Date ISO3 Country Province Confirmed Infected Fatal Recovered Population
924255 2021-10-06 ZWE Zimbabwe - 131434.0 NaN 4630.0 82994.0 14439018.0
924256 2021-10-07 ZWE Zimbabwe - 131523.0 NaN 4631.0 82994.0 14439018.0
924257 2021-10-08 ZWE Zimbabwe - 131705.0 NaN 4634.0 82994.0 14439018.0
924258 2021-10-09 ZWE Zimbabwe - 131762.0 NaN 4636.0 82994.0 14439018.0
924259 2021-10-10 ZWE Zimbabwe - 131796.0 NaN 4637.0 82994.0 14439018.0

The cleaned dataset is here.

[8]:
jhu_data.cleaned().tail()
[8]:
Date ISO3 Country Province Confirmed Infected Fatal Recovered Population
924235 2021-10-06 CHN China - 108654 4577 4849 99228 1361142636
924236 2021-10-07 CHN China - 108683 4606 4849 99228 1361142636
924237 2021-10-08 CHN China - 108704 4627 4849 99228 1361142636
924238 2021-10-09 CHN China - 108736 4659 4849 99228 1361142636
924239 2021-10-10 CHN China - 108761 4684 4849 99228 1361142636

As you noticed, they are returned as a Pandas dataframe. Because tails are the latest values, pandas.DataFrame.tail() was used for reviewing it.

Check the data types and memory usage as follows.

[9]:
jhu_data.cleaned().info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 924240 entries, 0 to 924239
Data columns (total 9 columns):
 #   Column      Non-Null Count   Dtype
---  ------      --------------   -----
 0   Date        924240 non-null  datetime64[ns]
 1   ISO3        924240 non-null  category
 2   Country     924240 non-null  category
 3   Province    924240 non-null  category
 4   Confirmed   924240 non-null  int64
 5   Infected    924240 non-null  int64
 6   Fatal       924240 non-null  int64
 7   Recovered   924240 non-null  int64
 8   Population  924240 non-null  int64
dtypes: category(3), datetime64[ns](1), int64(5)
memory usage: 47.7 MB

Note that date is pandas.datetime64, area names are pandas.Category and the number of cases is numpy.int64.

Total number of cases in all countries

JHUData.total() returns total number of cases in all countries. Fatality and recovery rate are added.

[10]:
total_df = jhu_data.total()
# Show the oldest data
display(total_df.loc[total_df["Confirmed"] > 0].head())
# Show the latest data
display(total_df.tail())
Confirmed Infected Fatal Recovered Fatal per Confirmed Recovered per Confirmed Fatal per (Fatal or Recovered)
Date
2020-01-02 1 -9246 9246 1 9246.000000 1.000000 0.999892
2020-01-22 556 -8738 9263 31 16.660072 0.055755 0.996665
2020-01-23 656 -8642 9264 34 14.121951 0.051829 0.996343
2020-01-24 942 -8371 9272 41 9.842887 0.043524 0.995598
2020-01-25 1438 -233881 14837 220482 10.317803 153.325452 0.063051
Confirmed Infected Fatal Recovered Fatal per Confirmed Recovered per Confirmed Fatal per (Fatal or Recovered)
Date
2021-10-07 234867133 85376794 4819716 144670623 0.020521 0.615968 0.032241
2021-10-08 235310505 85756534 4828078 144725893 0.020518 0.615042 0.032283
2021-10-09 235604835 85992139 4832465 144780231 0.020511 0.614504 0.032300
2021-10-10 235869170 86217785 4836587 144814798 0.020505 0.613962 0.032319
2021-10-11 5308736 3035233 57302 2216201 0.010794 0.417463 0.025204

The first case (registered in the dataset) was 07Jan2020. COVID-19 outbreak is still ongoing.

We can create line plots with covsirphy.line_plot() function.

[11]:
cs.line_plot(total_df[["Infected", "Fatal", "Recovered"]], "Total number of cases over time")
_images/usage_dataset_26_0.png

Statistics of fatality and recovery rate are here.

[12]:
total_df.loc[:, total_df.columns.str.contains("per")].describe().T
[12]:
count mean std min 25% 50% 75% max
Fatal per Confirmed 630.0 14.833974 368.364811 0.010794 0.021392 0.022559 0.042258 9246.000000
Recovered per Confirmed 630.0 1.578626 8.370735 0.043524 0.615417 0.637428 0.678912 153.325452
Fatal per (Fatal or Recovered) 630.0 0.060832 0.085894 0.011225 0.031851 0.034580 0.062362 0.999892

Subset for area

JHUData.subset() creates a subset for a specific area. We can select country name and province name. In this tutorial, “Japan” and “Tokyo in Japan” will be used. Please replace it with your country/province name.

Subset for a country:
We can use both of country names and ISO3 codes.
[13]:
# Specify contry name
df, complement = jhu_data.records("Japan")
# Or, specify ISO3 code
# df, complement = jhu_data.records("JPN")
# Show records
display(df.tail())
# Show details of complement
print(complement)
Date Confirmed Infected Fatal Recovered Susceptible
609 2021-10-07 1707752 15789 17819 1674144 124821348
610 2021-10-08 1708619 14924 17906 1675789 124820481
611 2021-10-09 1709395 13846 17930 1677619 124819705
612 2021-10-10 1709946 12579 17940 1679427 124819154
613 2021-10-11 1710314 11845 17960 1680509 124818786
monotonic increasing complemented fatal data and
partially complemented recovered data

Complement of records was performed. The second returned value is the description of complement. Details will be explained later and we can skip complement with auto_complement=False argument. Or, use just use JHUData.subset() method when the second returned value (False because no complement) is un-necessary.

[14]:
# Skip complement
df, complement = jhu_data.records("Japan", auto_complement=False)
# Or,
# df = jhu_data.subset("Japan")
display(df.tail())
# Show complement (False because not complemented)
print(complement)
Date Confirmed Infected Fatal Recovered Susceptible
609 2021-10-07 1707752 15789 17819 1674144 124821348
610 2021-10-08 1708619 14924 17906 1675789 124820481
611 2021-10-09 1709395 13846 17930 1677619 124819705
612 2021-10-10 1709946 12579 17940 1679427 124819154
613 2021-10-11 1710314 11845 17960 1680509 124818786
False

Subset for a province (called “prefecture” in Japan):

[15]:
df, _ = jhu_data.records("Japan", province="Tokyo")
df.tail()
[15]:
Date Confirmed Infected Fatal Recovered Susceptible
600 2021-10-07 376496 2117 2994 371385 13566360
601 2021-10-08 376634 1989 3012 371633 13566222
602 2021-10-09 376716 1821 3021 371874 13566140
603 2021-10-10 376776 1690 3028 372058 13566080
604 2021-10-11 376825 1608 3034 372183 13566031

The list of countries can be checked with JHUdata.countries() as folows.

[16]:
pprint(jhu_data.countries(), compact=True)
['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola',
 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia', 'Austria',
 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus',
 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia',
 'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'Brunei', 'Bulgaria',
 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde',
 'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Comoros',
 'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Cyprus', 'Czech Republic',
 'Democratic Republic of the Congo', 'Denmark', 'Djibouti', 'Dominica',
 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea',
 'Eritrea', 'Estonia', 'Ethiopia', 'Fiji', 'Finland', 'France', 'Gabon',
 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Greece', 'Grenada', 'Guam',
 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Holy See',
 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran', 'Iraq',
 'Ireland', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jordan', 'Kazakhstan',
 'Kenya', 'Kiribati', 'Kosovo', 'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia',
 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania',
 'Luxembourg', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta',
 'Marshall Islands', 'Mauritania', 'Mauritius', 'Mexico', 'Moldova', 'Monaco',
 'Mongolia', 'Montenegro', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia',
 'Nepal', 'Netherlands', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria',
 'North Macedonia', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan',
 'Palau', 'Palestine', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru',
 'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar',
 'Republic of the Congo', 'Romania', 'Russia', 'Rwanda',
 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Vincent and the Grenadines',
 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal',
 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Slovakia', 'Slovenia',
 'Solomon Islands', 'Somalia', 'South Africa', 'South Korea', 'South Sudan',
 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Swaziland', 'Sweden',
 'Switzerland', 'Syria', 'Taiwan', 'Tajikistan', 'Tanzania', 'Thailand',
 'Timor-Leste', 'Togo', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Uganda',
 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States',
 'Uruguay', 'Uzbekistan', 'Vanuatu', 'Venezuela', 'Vietnam',
 'Virgin Islands, U.S.', 'Yemen', 'Zambia', 'Zimbabwe']

Complement

JHUData.records() automatically complement the records, if necessary and auto_complement=True (default). Each area can have either none or one or multiple complements, depending on the records and their preprocessing analysis.

We can show the specific kind of complements that were applied to the records of each country with JHUData.show_complement() method. The possible kinds of complement for each country are the following:

  1. “Monotonic_confirmed/fatal/recovered” (monotonic increasing complement) Force the variable show monotonic increasing.

  2. “Full_recovered” (full complement of recovered data) Estimate the number of recovered cases using the value of estimated average recovery period.

  3. “Partial_recovered” (partial complement of recovered data) When recovered values are not updated for some days, extrapolate the values.

Note:
“Recovery period” will be discussed in the next subsection.

For JHUData.show_complement(), we can specify country names and province names.

[17]:
# Specify country name
jhu_data.show_complement(country="Japan")
# Or, specify country and province name
# jhu_data.show_complement(country="Japan", province="Tokyo")
[17]:
Country Province Monotonic_confirmed Monotonic_fatal Monotonic_recovered Full_recovered Partial_recovered
0 Japan - False True True False True

When list was apllied was country argument, the all spefied countries will be shown. If None, all registered countries will be used.

[18]:
# Specify country names
jhu_data.show_complement(country=["Greece", "Japan"])
# Or, apply None
# jhu_data.show_complement(country=None)
[18]:
Country Province Monotonic_confirmed Monotonic_fatal Monotonic_recovered Full_recovered Partial_recovered
0 Greece - False False False False True
1 Japan - False True True False True

If complement was performed incorrectly or you need new algorithms, kindly let us know via issue page.

Recovery period

We defined “recovery period” as yhe time period between case confirmation and recovery (as it is subjectively defined per country). With the global cases records, we estimate the average recovery period using JHUData.calculate_recovery_period().

[19]:
recovery_period = jhu_data.calculate_recovery_period()
print(f"Average recovery period: {recovery_period} [days]")
Average recovery period: 15 [days]

What we currently do is to calculate the difference between confirmed cases and fatal cases and try to match it to some recovered cases value in the future. We apply this method for every country that has valid recovery data and average the partial recovery periods in order to obtain a single (average) recovery period. During the calculations, we ignore time intervals that lead to very short (<7 days) or very long (>90 days) partial recovery periods, if these exist with high frequency (>50%) in the records. We have to assume temporarily invariable compartments for this analysis to extract an approximation of the average recovery period.

Alternatively, we had tried to use linelist of case reports to get precise value of recovery period (average of recovery date minus confirmation date for cases), but the number of records was too small.

Visualize the number of cases at a timepoint

We can visualize the number of cases with JHUData.map() method. When country is None, global map will be shown.

Global map with country level data:

[20]:
# Global map with country level data
jhu_data.map(country=None, variable="Infected")
# To set included/exclude some countries
# jhu_data.map(country=None, variable="Infected", included=["Japan"])
# jhu_data.map(country=None, variable="Infected", excluded=["Japan"])
# To change the date
# jhu_data.map(country=None, variable="Infected", date="01Oct2021")
_images/usage_dataset_50_0.png

Values can be retrieved with .layer() method.

[21]:
jhu_data.layer(country=None).tail()
[21]:
Date ISO3 Country Confirmed Infected Fatal Recovered Population
132526 2021-10-06 CHN China 108654 4577 4849 99228 1361142636
132527 2021-10-07 CHN China 108683 4606 4849 99228 1361142636
132528 2021-10-08 CHN China 108704 4627 4849 99228 1361142636
132529 2021-10-09 CHN China 108736 4659 4849 99228 1361142636
132530 2021-10-10 CHN China 108761 4684 4849 99228 1361142636

Country map with province level data:

[22]:
# Country map with province level data
jhu_data.map(country="Japan", variable="Infected")
# To set included/exclude some countries
# jhu_data.map(country="Japan", variable="Infected", included=["Tokyo"])
# jhu_data.map(country="Japan", variable="Infected", excluded=["Tokyo"])
# To change the date
# jhu_data.map(country="Japan", variable="Infected", date="01Oct2021")
_images/usage_dataset_54_0.png

Values are here.

[23]:
jhu_data.layer(country="Japan").tail()
[23]:
Date ISO3 Country Province Confirmed Infected Fatal Recovered Population
30492 2021-10-07 JPN Japan Yamanashi 5128 31 29 5068 812056
30493 2021-10-08 JPN Japan Yamanashi 5129 30 29 5070 812056
30494 2021-10-09 JPN Japan Yamanashi 5134 30 29 5075 812056
30495 2021-10-10 JPN Japan Yamanashi 5134 30 29 5075 812056
30496 2021-10-11 JPN Japan Yamanashi 5134 30 29 5075 812056
Note for Japan:
Province “Entering” means the number of cases who were confirmed when entering Japan.

OxCGRT indicators

Government responses are tracked with Oxford Covid-19 Government Response Tracker (OxCGRT). Because government responses and activities of persons change the parameter values of SIR-derived models, this dataset is significant when we try to forcast the number of cases. OxCGRTData class will be created with DataLoader.oxcgrt() method.

[24]:
oxcgrt_data = loader.oxcgrt()
[25]:
type(oxcgrt_data)
[25]:
covsirphy.cleaning.oxcgrt.OxCGRTData

Because records will be retrieved via “COVID-19 Data Hub” as well as JHUData, data description and raw data is the same.

[26]:
# Description
print(oxcgrt_data.citation)
# Raw
# oxcgrt_data.raw.tail()

The cleaned dataset is here.

[27]:
oxcgrt_data.cleaned().tail()
[27]:
Date ISO3 Country Province School_closing Workplace_closing Cancel_events Gatherings_restrictions Transport_closing Stay_home_restrictions Internal_movement_restrictions International_movement_restrictions Information_campaigns Testing_policy Contact_tracing Stringency_index
924883 2021-10-06 GRL Greenland - 1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 3.0 2.0 24.07
924884 2021-10-07 GRL Greenland - 1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 3.0 2.0 24.07
924885 2021-10-08 GRL Greenland - 1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 3.0 2.0 24.07
924886 2021-10-09 GRL Greenland - 1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 3.0 2.0 24.07
924887 2021-10-10 GRL Greenland - 1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 3.0 2.0 24.07

Subset for area

PopulationData.subset() creates a subset for a specific area. We can select only country name. Note that province level data is not registered in OxCGRTData.

Subset for a country:
We can use both of country names and ISO3 codes.
[28]:
oxcgrt_data.subset("Japan").tail()
# Or, with ISO3 code
# oxcgrt_data.subset("JPN").tail()
[28]:
Date School_closing Workplace_closing Cancel_events Gatherings_restrictions Transport_closing Stay_home_restrictions Internal_movement_restrictions International_movement_restrictions Information_campaigns Testing_policy Contact_tracing Stringency_index
624 2021-10-07 1.0 1.0 1.0 1.0 1.0 1.0 1.0 4.0 2.0 1.0 1.0 47.22
625 2021-10-08 1.0 1.0 1.0 1.0 1.0 1.0 1.0 4.0 2.0 1.0 1.0 47.22
626 2021-10-09 1.0 1.0 1.0 1.0 1.0 1.0 1.0 4.0 2.0 1.0 1.0 47.22
627 2021-10-10 1.0 1.0 1.0 1.0 1.0 1.0 1.0 4.0 2.0 1.0 1.0 47.22
628 2021-10-11 1.0 1.0 1.0 1.0 1.0 1.0 1.0 4.0 2.0 1.0 1.0 47.22

Visualize indicator values

We can visualize indicator values with .map() method. Arguments are the same as JHUData.map(), but country name cannot be specified.

[29]:
oxcgrt_data.map(variable="Stringency_index")
_images/usage_dataset_69_0.png

Values are here.

[30]:
oxcgrt_data.layer().tail()
[30]:
Date ISO3 Country School_closing Workplace_closing Cancel_events Gatherings_restrictions Transport_closing Stay_home_restrictions Internal_movement_restrictions International_movement_restrictions Information_campaigns Testing_policy Contact_tracing Stringency_index
133174 2021-10-06 GRL Greenland 1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 3.0 2.0 24.07
133175 2021-10-07 GRL Greenland 1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 3.0 2.0 24.07
133176 2021-10-08 GRL Greenland 1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 3.0 2.0 24.07
133177 2021-10-09 GRL Greenland 1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 3.0 2.0 24.07
133178 2021-10-10 GRL Greenland 1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 3.0 2.0 24.07

The number of tests

The number of tests is also key information to understand the situation. PCRData class will be created with DataLoader.pcr() method.

[31]:
pcr_data = loader.pcr()
[32]:
type(pcr_data)
[32]:
covsirphy.cleaning.pcr_data.PCRData

Because records will be retrieved via “COVID-19 Data Hub” as well as JHUData, data description and raw data is the same.

[33]:
# Description
print(pcr_data.citation)
# Raw
# pcr_data.raw.tail()
Hirokazu Takaya (2020-2021), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan
(Secondary source) Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.
Hasell, J., Mathieu, E., Beltekian, D. et al. A cross-country database of COVID-19 testing. Sci Data 7, 345 (2020). https://doi.org/10.1038/s41597-020-00688-8

The cleaned dataset is here.

[34]:
pcr_data.cleaned().tail()
[34]:
Date Country Province Tests Confirmed
924255 2021-10-06 Zimbabwe - 1288436 131434
924256 2021-10-07 Zimbabwe - 1288436 131523
924257 2021-10-08 Zimbabwe - 1288436 131705
924258 2021-10-09 Zimbabwe - 1288436 131762
924259 2021-10-10 Zimbabwe - 1288436 131796

Subset for area

PCRData.subset() creates a subset for a specific area. We can select country name and province name.

Subset for a country:
We can use both of country names and ISO3 codes.
[35]:
pcr_data.subset("Japan").tail()
# Or, with ISO3 code
# pcr_data.subset("JPN").tail()
[35]:
Date Tests Tests_diff Confirmed
609 2021-10-07 25190326 70121 1707752
610 2021-10-08 25252386 62060 1708619
611 2021-10-09 25322667 70281 1709395
612 2021-10-10 25374183 51516 1709946
613 2021-10-11 25400317 26134 1710314

Positive rate

Under the assumption that all tests were PCR test, we can calculate the positive rate of PCR tests as “the number of confirmed cases per the number of tests” with PCRData.positive_rate() method.

[36]:
pcr_data.positive_rate("Japan").tail()
_images/usage_dataset_83_0.png
[36]:
Date ISO3 Country Province Tests Confirmed Tests_diff Confirmed_diff Test_positive_rate
608 2021-10-07 JPN Japan - 25190326 1707752 62744.428571 1159.428571 1.847859
609 2021-10-08 JPN Japan - 25252386 1708619 61526.285714 1044.285714 1.697300
610 2021-10-09 JPN Japan - 25322667 1709395 59489.857143 939.428571 1.579141
611 2021-10-10 JPN Japan - 25374183 1709946 58715.142857 837.571429 1.426500
612 2021-10-11 JPN Japan - 25400317 1710314 58902.857143 752.571429 1.277648

Visualize the number of tests

We can visualize the number of tests with .map() method. When country is None, global map will be shown. Arguments are the same as JHUData, but variable name cannot be specified.

Country level data:

[37]:
pcr_data.map(country=None)
_images/usage_dataset_86_0.png

Values are here.

[38]:
pcr_data.layer(country=None).tail()
[38]:
Date ISO3 Country Tests Confirmed
67485 2021-10-06 ZWE Zimbabwe 1288436 131434
67486 2021-10-07 ZWE Zimbabwe 1288436 131523
67487 2021-10-08 ZWE Zimbabwe 1288436 131705
67488 2021-10-09 ZWE Zimbabwe 1288436 131762
67489 2021-10-10 ZWE Zimbabwe 1288436 131796

Province level data:

[39]:
pcr_data.map(country="Japan")
_images/usage_dataset_90_0.png

Values are here.

[40]:
pcr_data.layer(country="Japan").tail()
[40]:
Date ISO3 Country Province Tests Confirmed
27511 2021-10-07 JPN Japan Yamanashi 111974 5128
27512 2021-10-08 JPN Japan Yamanashi 111974 5129
27513 2021-10-09 JPN Japan Yamanashi 111974 5134
27514 2021-10-10 JPN Japan Yamanashi 111974 5134
27515 2021-10-11 JPN Japan Yamanashi 111974 5134

Vaccinations

Vaccinations is a key factor to end the outbreak as soon as possible. VaccineData class will be created with DataLoader.vaccine() method.

[41]:
vaccine_data = loader.vaccine()
[42]:
type(vaccine_data)
[42]:
covsirphy.cleaning.vaccine_data.VaccineData

Description is here.

[43]:
print(vaccine_data.citation)
Hasell, J., Mathieu, E., Beltekian, D. et al. A cross-country database of COVID-19 testing. Sci Data 7, 345 (2020). https://doi.org/10.1038/s41597-020-00688-8
Hirokazu Takaya (2020-2021), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan

Raw data is here.

[44]:
vaccine_data.raw.tail()
[44]:
Date ISO3 Country Province Product Vaccinations Vaccinated_once Vaccinated_full
924255 2021-10-06 ZWE Zimbabwe - Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5499530.0 3140386.0 2359144.0
924256 2021-10-07 ZWE Zimbabwe - Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5527583.0 3152617.0 2374966.0
924257 2021-10-08 ZWE Zimbabwe - Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5556506.0 3161809.0 2394697.0
924258 2021-10-09 ZWE Zimbabwe - Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5572318.0 3168552.0 2403766.0
924259 2021-10-10 ZWE Zimbabwe - Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5581524.0 3171399.0 2410125.0

The next is the cleaned dataset.

[45]:
vaccine_data.cleaned().tail()
[45]:
Date ISO3 Country Province Product Vaccinations Vaccinated_once Vaccinated_full
62130 2021-10-07 ZWE Zimbabwe - Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5527583 3152617 2374966
62131 2021-10-08 ZWE Zimbabwe - Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5556506 3161809 2394697
62132 2021-10-09 ZWE Zimbabwe - Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5572318 3168552 2403766
62133 2021-10-10 ZWE Zimbabwe - Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5581524 3171399 2410125
62134 2021-10-11 ZWE Zimbabwe - Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5581524 3171399 2410125

Note for variables

Definition of variables are as follows.

  • Vaccinations: cumulative number of vaccinations

  • Vaccinated_once: cumulative number of people who received at least one vaccine dose

  • Vaccinated_full: cumulative number of people who received all doses prescrived by the protocol

Registered countries can be checked with VaccineData.countries() method.

[46]:
pprint(vaccine_data.countries(), compact=True)
['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola', 'Anguilla',
 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria',
 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus',
 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia',
 'Bonaire Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana',
 'Brazil', 'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso',
 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands',
 'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Comoros',
 'Cook Islands', 'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Curacao',
 'Cyprus', 'Czechia', 'Democratic Republic of Congo', 'Denmark', 'Djibouti',
 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador',
 'Equatorial Guinea', 'Estonia', 'Eswatini', 'Ethiopia', 'Faeroe Islands',
 'Falkland Islands', 'Fiji', 'Finland', 'France', 'French Polynesia', 'Gabon',
 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland',
 'Grenada', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-Bissau', 'Guyana',
 'Haiti', 'Honduras', 'Hong Kong', 'Hungary', 'Iceland', 'India', 'Indonesia',
 'Iran', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica',
 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Kuwait',
 'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya',
 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao', 'Madagascar', 'Malawi',
 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Mauritania', 'Mauritius', 'Mexico',
 'Moldova', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco',
 'Mozambique', 'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands',
 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Niue',
 'North Macedonia', 'Norway', 'Oman', 'Pakistan', 'Palestine', 'Panama',
 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Pitcairn', 'Poland',
 'Portugal', 'Qatar', 'Republic of the Congo', 'Romania', 'Russia', 'Rwanda',
 'Saint Helena', 'Saint Kitts and Nevis', 'Saint Lucia',
 'Saint Vincent and the Grenadines', 'Samoa', 'San Marino',
 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles',
 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia',
 'Slovenia', 'Solomon Islands', 'Somalia', 'South Africa', 'South Korea',
 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Sweden',
 'Switzerland', 'Syria', 'Taiwan', 'Tajikistan', 'Tanzania', 'Thailand',
 'Timor', 'Togo', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey',
 'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu', 'Uganda', 'Ukraine',
 'United Arab Emirates', 'United Kingdom', 'United States', 'Uruguay',
 'Uzbekistan', 'Vanuatu', 'Venezuela', 'Vietnam', 'Wallis and Futuna', 'Yemen',
 'Zambia', 'Zimbabwe']

Subset for area

VaccineData.subset() creates a subset for a specific area. We can select only country name. Note that province level data is not registered.

Subset for a country:
We can use both of country names and ISO3 codes.
[47]:
vaccine_data.subset("Japan").tail()
# Or, with ISO3 code
# vaccine_data.subset("JPN").tail()
[47]:
Date Vaccinations Vaccinated_once Vaccinated_full
284 2021-10-07 -47873548 -35516004 -12357544
285 2021-10-08 -47873548 -35516004 -12357544
286 2021-10-09 -47873548 -35516004 -12357544
287 2021-10-10 174631850 93169823 81462027
288 2021-10-11 174631850 93169823 81462027

Visualize the number of vaccinations

We can visualize the number of vaccinations and the other variables with .map() method. Arguments are the same as JHUData, but country name cannot be specified.

[48]:
vaccine_data.map()
_images/usage_dataset_109_0.png

Values are here.

[49]:
vaccine_data.layer().tail()
[49]:
Date ISO3 Country Product Vaccinations Vaccinated_once Vaccinated_full
62130 2021-10-07 ZWE Zimbabwe Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5527583 3152617 2374966
62131 2021-10-08 ZWE Zimbabwe Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5556506 3161809 2394697
62132 2021-10-09 ZWE Zimbabwe Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5572318 3168552 2403766
62133 2021-10-10 ZWE Zimbabwe Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5581524 3171399 2410125
62134 2021-10-11 ZWE Zimbabwe Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac... 5581524 3171399 2410125

Mobility

Levels of mobility is a key factor of \(\rho\) (effective contact rate) of SIR-derived ODE models. MobilityData class will be created with DataLoader.mobility() method.

[50]:
mobility_data = loader.mobility()
[51]:
type(mobility_data)
[51]:
covsirphy.cleaning.mobility_data.MobilityData

Description is here.

[52]:
print(mobility_data.citation)
O. Wahltinez and others (2020), COVID-19 Open-Data: curating a fine-grained, global-scale data repository for SARS-CoV-2,  Work in progress, https://goo.gle/covid-19-open-data

Raw data is here.

[53]:
 mobility_data.raw.tail()
[53]:
Date ISO3 Country Province Mobility_grocery_and_pharmacy Mobility_parks Mobility_transit_stations Mobility_retail_and_recreation Mobility_residential Mobility_workplaces
924251 2021-10-02 ZWE Zimbabwe - 197.0 183.0 168.0 170.0 91.0 142.0
924252 2021-10-03 ZWE Zimbabwe - 195.0 174.0 169.0 167.0 94.0 142.0
924253 2021-10-04 ZWE Zimbabwe - 176.0 158.0 158.0 153.0 95.0 133.0
924254 2021-10-05 ZWE Zimbabwe - 180.0 162.0 153.0 153.0 95.0 131.0
924255 2021-10-06 ZWE Zimbabwe - 184.0 173.0 159.0 157.0 95.0 133.0

The next is the cleaned dataset.

[54]:
mobility_data.cleaned().tail()
[54]:
Date ISO3 Country Province Mobility_grocery_and_pharmacy Mobility_parks Mobility_transit_stations Mobility_retail_and_recreation Mobility_residential Mobility_workplaces
924251 2021-10-02 ZWE Zimbabwe - 197 183 168 170 91 142
924252 2021-10-03 ZWE Zimbabwe - 195 174 169 167 94 142
924253 2021-10-04 ZWE Zimbabwe - 176 158 158 153 95 133
924254 2021-10-05 ZWE Zimbabwe - 180 162 153 153 95 131
924255 2021-10-06 ZWE Zimbabwe - 184 173 159 157 95 133

Note for variables

Definition of variables are as follows.

  • Mobility_grocery_and_pharmacy (int): % to baseline in visits (grocery markets, pharmacies etc.)

  • Mobility_parks (int): % to baseline in visits (parks etc.)

  • Mobility_transit_stations (int): % to baseline in visits (public transport hubs etc.)

  • Mobility_retail_and_recreation (int): % to baseline in visits (restaurant, museums etc.)

  • Mobility_residential (int): % to baseline in visits (places of residence)

  • Mobility_workplaces (int): % to baseline in visits (places of work)

Registered countries can be checked with MobilityData.countries() method.

[55]:
pprint(mobility_data.countries(), compact=True)
['Afghanistan', 'Angola', 'Antigua and Barbuda', 'Argentina', 'Aruba',
 'Australia', 'Austria', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados',
 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bolivia', 'Bosnia and Herzegovina',
 'Botswana', 'Brazil', 'Bulgaria', 'Burkina Faso', 'Cambodia', 'Cameroon',
 'Canada', 'Cape Verde', 'Chile', 'Colombia', 'Costa Rica', "Cote d'Ivoire",
 'Croatia', 'Czech Republic', 'Denmark', 'Dominican Republic', 'Ecuador',
 'Egypt', 'El Salvador', 'Estonia', 'Fiji', 'Finland', 'France', 'Gabon',
 'Germany', 'Ghana', 'Greece', 'Guatemala', 'Haiti', 'Honduras', 'Hungary',
 'India', 'Indonesia', 'Iraq', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jordan',
 'Kazakhstan', 'Kenya', 'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon',
 'Libya', 'Lithuania', 'Luxembourg', 'Malaysia', 'Mali', 'Malta', 'Mauritius',
 'Mexico', 'Moldova', 'Mongolia', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia',
 'Nepal', 'Netherlands', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria',
 'Norway', 'Oman', 'Pakistan', 'Panama', 'Paraguay', 'Peru', 'Philippines',
 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Romania', 'Russia', 'Rwanda',
 'Saudi Arabia', 'Senegal', 'Serbia', 'Singapore', 'Slovakia', 'Slovenia',
 'South Africa', 'South Korea', 'Spain', 'Sri Lanka', 'Sweden', 'Switzerland',
 'Taiwan', 'Tajikistan', 'Tanzania', 'Thailand', 'Togo', 'Trinidad and Tobago',
 'Turkey', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom',
 'United States of America', 'Uruguay', 'Venezuela', 'Vietnam', 'Yemen',
 'Zambia', 'Zimbabwe']

Subset for area

MobilityData.subset() creates a subset for a specific area (country/province).

Subset for a country: We can use both of country names and ISO3 codes.

[56]:
mobility_data.subset("Japan").tail()
# Or, with ISO3 code
# mobility_data.subset("JPN").tail()
[56]:
Date Mobility_grocery_and_pharmacy Mobility_parks Mobility_transit_stations Mobility_retail_and_recreation Mobility_residential Mobility_workplaces
595 2021-10-02 109 101 79 90 105 91
596 2021-10-03 107 113 80 90 104 93
597 2021-10-04 104 97 81 88 105 91
598 2021-10-05 108 102 81 92 105 90
599 2021-10-06 106 102 80 90 105 90

Visualize mobility data

We can visualize the levels of mobility with MobilityData.map() method. Arguments are the same as JHUData.

[57]:
mobility_data.map(country=None)
_images/usage_dataset_127_0.png

Values are here.

[58]:
mobility_data.layer().tail()
[58]:
Date ISO3 Country Mobility_grocery_and_pharmacy Mobility_parks Mobility_transit_stations Mobility_retail_and_recreation Mobility_residential Mobility_workplaces
74064 2021-10-02 ZWE Zimbabwe 197 183 168 170 91 142
74065 2021-10-03 ZWE Zimbabwe 195 174 169 167 94 142
74066 2021-10-04 ZWE Zimbabwe 176 158 158 153 95 133
74067 2021-10-05 ZWE Zimbabwe 180 162 153 153 95 131
74068 2021-10-06 ZWE Zimbabwe 184 173 159 157 95 133

Population pyramid

With population pyramid, we can divide the population to sub-groups. This will be useful when we analyse the meaning of parameters. For example, how many days go out is different between the sub-groups. PyramidData class will be created with DataLoader.pyramid() method.

[59]:
pyramid_data = loader.pyramid()
[60]:
type(pyramid_data)
[60]:
covsirphy.cleaning.pyramid.PopulationPyramidData

Description is here.

[61]:
print(pyramid_data.citation)
World Bank Group (2020), World Bank Open Data, https://data.worldbank.org/

Raw dataset is not registered. Subset will be retrieved when PyramidData.subset() was called.

[62]:
pyramid_data.subset("Japan").tail()
Retrieving population pyramid dataset (Japan) from https://data.worldbank.org/
[62]:
Age Population Per_total
113 118 262648 0.00225
114 119 262648 0.00225
115 120 262648 0.00225
116 121 262648 0.00225
117 122 262648 0.00225

“Per_total” is the proportion of the age group in the total population.

Japan-specific dataset

This includes the number of confirmed/infected/fatal/recovered/tests/moderate/severe cases at country/prefecture level and metadata of each prefecture (province). JapanData class will be created with DataLoader.japan() method.

[63]:
japan_data = loader.japan()
[64]:
type(japan_data)
[64]:
covsirphy.cleaning.japan_data.JapanData

Description is here.

[65]:
print(japan_data.citation)
Hirokazu Takaya (2020-2021), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan

The next is the cleaned dataset.

[66]:
japan_data.cleaned().tail()
[66]:
Date Country Province Confirmed Infected Fatal Recovered Tests Moderate Severe Vaccinations Vaccinated_once Vaccinated_full
27911 2021-10-06 Japan Gifu 18853 52 216 18585 405976 238 3 0 0 0
27912 2021-10-06 Japan Kanagawa 168087 875 1276 165936 1595529 826 49 0 0 0
27913 2021-10-06 Japan - 1706675 17161 17789 1671725 25120205 15078 612 6710747962 3901739346 2809008616
27914 2021-10-07 Japan Entering 4297 141 7 4149 1076570 141 0 -28963496540 -21487182420 -7476314120
27915 2021-10-07 Japan - 1707752 15789 17819 1674144 25190326 13668 595 6662874414 3866223342 2796651072

Visualize values

We can visualize the values with .map() method. Arguments are the same as JHUData.

[67]:
japan_data.map(variable="Severe")
_images/usage_dataset_146_0.png

Values are here.

[68]:
japan_data.layer(country="Japan").tail()
[68]:
Date Country Province Confirmed Infected Fatal Recovered Tests Moderate Severe Vaccinations Vaccinated_once Vaccinated_full
27301 2021-10-06 Japan Tottori 1644 39 5 1600 145306 16 0 0 0 0
27302 2021-10-06 Japan Akita 1890 36 26 1828 26542 35 1 0 0 0
27303 2021-10-06 Japan Gifu 18853 52 216 18585 405976 238 3 0 0 0
27304 2021-10-06 Japan Kanagawa 168087 875 1276 165936 1595529 826 49 0 0 0
27305 2021-10-07 Japan Entering 4297 141 7 4149 1076570 141 0 -28963496540 -21487182420 -7476314120

Map with country level data is not prepared, but country level data can be retrieved.

[69]:
japan_data.layer(country=None).tail()
[69]:
Date Country Confirmed Infected Fatal Recovered Tests Moderate Severe Vaccinations Vaccinated_once Vaccinated_full
605 2021-10-03 Japan 1704083 22724 17716 1663643 24963177 20471 696 6854368606 4008287358 2846081248
606 2021-10-04 Japan 1705046 20547 17730 1666769 24987997 18303 693 6806495058 3972771354 2833723704
607 2021-10-05 Japan 1705778 19082 17756 1668940 25056205 16939 655 6758621510 3937255350 2821366160
608 2021-10-06 Japan 1706675 17161 17789 1671725 25120205 15078 612 6710747962 3901739346 2809008616
609 2021-10-07 Japan 1707752 15789 17819 1674144 25190326 13668 595 6662874414 3866223342 2796651072

Metadata

Additionally, JapanData.meta() retrieves meta data for Japan prefectures.

[70]:
japan_data.meta().tail()
Retrieving Metadata of Japan dataset from https://github.com/lisphilar/covid19-sir/data/japan
[70]:
Prefecture Admin_Capital Admin_Region Admin_Num Area_Habitable Area_Total Clinic_bed_Care Clinic_bed_Total Hospital_bed_Care Hospital_bed_Specific Hospital_bed_Total Hospital_bed_Tuberculosis Hospital_bed_Type-I Hospital_bed_Type-II Population_Female Population_Male Population_Total Location_Latitude Location_Longitude
42 Kumamoto Kumamoto Kyushu 43 2796 7409 497 4628 8340 0 33710 95 2 46 933 833 1765 32.790513 130.742388
43 Oita Oita Kyushu 44 1799 6341 269 3561 2618 0 19834 50 2 38 607 546 1152 33.238391 131.612658
44 Miyazaki Miyazaki Kyushu 45 1850 7735 206 2357 3682 0 18769 33 1 30 577 512 1089 31.911188 131.423873
45 Kagoshima Kagoshima Kyushu 46 3313 9187 652 4827 7750 0 32651 98 1 44 863 763 1626 31.560052 130.557745
46 Okinawa Naha Okinawa 47 1169 2281 83 914 3804 0 18710 47 4 20 734 709 1443 26.211761 127.681119