Usage: datasets

Open In Colab

Here, we will review the datasets downladed and cleaned with DataLoader class. Methods of this class produces the following class instances.

  1. JHUData: the number of confirmed/infected/fatal/recovored cases

  2. OxCGRTData: indicators of government responses (OxCGRT)

  3. PCRData: the number of tests

  4. VaccineData: the number of vaccinations, people vaccinated

  5. MobilityData: percentage to baseline in visits (will be usable from 2.22.0)

  6. PyramidData: population pyramid

  7. JapanData: Japan-specific dataset

If you want to use a new dataset for your analysis, please kindly inform us with GitHub Issues: Request new method of DataLoader class.

Note:
LinelistData (linelist of case reports) was deprecated with issue #866 at development version 2.22.0.
Note:
PopulationData (population values) was deprecated with issue #904 at development version 2.21.0-xi-fu2.

In this notebook, review the cleaned datasets one by one and visualize them.

Preparation

Import the packages.

[1]:
# !pip install covsirphy --upgrade
from pprint import pprint
import covsirphy as cs
cs.__version__
[1]:
'2.21.0-pi'

Data cleaning classes will be produced with methods of DataLoader class. Please specify the directory to save CSV files when creating DataLoader instance. The default value of directory is “input” and we will set “../input” here.

Note:
When the directory has a CSV file with the same name, DataLoader will load them without downloading dataset from server. When the CSV file was created/updated more than 12 hours ago, the CSV file will be updated automatically. “12 hours” is the default value and we can change it with update_interval argument when creating DataLoader instance.
[2]:
# Create DataLoader instance
loader = cs.DataLoader("../input", update_interval=12)

Usage of methods will be explained in the following sections. If you want to download all datasets with copy & paste, please refer to Dataset preparation.

The number of cases (JHU style)

The main data for analysis is that of the number of cases. JHUData class created with DataLoader.jhu() method is for the number of confirmed/fatal/recovered cases. The number of infected cases will be calculated as “Confirmed - Recovered - Fatal” when data cleaning.

If you want to create this instance with your local CSV file, please refer to Dataset preparation: 3. Use a local CSV file which has the number of cases.

[3]:
# Create instance
jhu_data = loader.jhu()
Retrieving COVID-19 dataset in Japan from https://github.com/lisphilar/covid19-sir/data/japan
Retrieving datasets from COVID-19 Data Hub https://covid19datahub.io/
        Please set verbose=2 to see the detailed citation list.
Retrieving datasets from Our World In Data https://github.com/owid/covid-19-data/
Retrieving datasets from COVID-19 Open Data by Google Cloud Platform https://github.com/GoogleCloudPlatform/covid-19-open-data
[4]:
# Check type
type(jhu_data)
[4]:
covsirphy.cleaning.jhu_data.JHUData

JHUData.citation property shows the description of this dataset.

[5]:
print(jhu_data.citation)
Hirokazu Takaya (2020-2021), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan
(Secondary source) Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.

Detailed citation list is saved in DataLoader.covid19dh_citation property. This is not a property of JHUData. Because many links are included, the will not be shown in this tutorial.

[6]:
# Detailed citations (string)
# data_loader.covid19dh_citation

We can check the raw data with JHUData.raw property.

[7]:
jhu_data.raw.tail()
[7]:
Date ISO3 Country Province Confirmed Infected Fatal Recovered Population
777628 2021-07-22 ZWE Zimbabwe - 93421.0 NaN 2870.0 61723.0 14439018.0
777629 2021-07-23 ZWE Zimbabwe - 95686.0 NaN 2961.0 62986.0 14439018.0
777630 2021-07-24 ZWE Zimbabwe - 97277.0 NaN 3050.0 64628.0 14439018.0
777631 2021-07-25 ZWE Zimbabwe - 97894.0 NaN 3094.0 65913.0 14439018.0
777632 2021-07-26 ZWE Zimbabwe - 99944.0 NaN 3173.0 67827.0 14439018.0

The cleaned dataset is here.

[8]:
jhu_data.cleaned().tail()
[8]:
Date ISO3 Country Province Confirmed Infected Fatal Recovered Population
777609 2021-07-22 CHN China - 104489 741 4848 98900 1361142636
777610 2021-07-23 CHN China - 104526 775 4848 98903 1361142636
777611 2021-07-24 CHN China - 104562 769 4848 98945 1361142636
777612 2021-07-25 CHN China - 104642 819 4848 98975 1361142636
777613 2021-07-26 CHN China - 104713 870 4848 98995 1361142636

As you noticed, they are returned as a Pandas dataframe. Because tails are the latest values, pandas.DataFrame.tail() was used for reviewing it.

Check the data types and memory usage as follows.

[9]:
jhu_data.cleaned().info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 777614 entries, 0 to 777613
Data columns (total 9 columns):
 #   Column      Non-Null Count   Dtype
---  ------      --------------   -----
 0   Date        777614 non-null  datetime64[ns]
 1   ISO3        777614 non-null  category
 2   Country     777614 non-null  category
 3   Province    777614 non-null  category
 4   Confirmed   777614 non-null  int64
 5   Infected    777614 non-null  int64
 6   Fatal       777614 non-null  int64
 7   Recovered   777614 non-null  int64
 8   Population  777614 non-null  int64
dtypes: category(3), datetime64[ns](1), int64(5)
memory usage: 40.1 MB

Note that date is pandas.datetime64, area names are pandas.Category and the number of cases is numpy.int64.

Total number of cases in all countries

JHUData.total() returns total number of cases in all countries. Fatality and recovery rate are added.

[10]:
total_df = jhu_data.total()
# Show the oldest data
display(total_df.loc[total_df["Confirmed"] > 0].head())
# Show the latest data
display(total_df.tail())
Confirmed Infected Fatal Recovered Fatal per Confirmed Recovered per Confirmed Fatal per (Fatal or Recovered)
Date
2020-01-02 1 -5714 5714 1 5714.000000 1.000000 0.999825
2020-01-22 556 -5206 5731 31 10.307554 0.055755 0.994620
2020-01-23 657 -5110 5732 35 8.724505 0.053272 0.993931
2020-01-24 943 -4839 5740 42 6.086957 0.044539 0.992736
2020-01-25 1438 -228121 10346 219213 7.194715 152.442976 0.045069
Confirmed Infected Fatal Recovered Fatal per Confirmed Recovered per Confirmed Fatal per (Fatal or Recovered)
Date
2021-07-23 193214415 55950065 4145219 133119131 0.021454 0.688971 0.030199
2021-07-24 193632993 55886497 4153646 133592850 0.021451 0.689928 0.030154
2021-07-25 194050596 55984604 4160554 133905438 0.021441 0.690054 0.030135
2021-07-26 194572295 56206779 4167921 134197595 0.021421 0.689706 0.030123
2021-07-27 41523369 3076070 621608 37825691 0.014970 0.910949 0.016168

The first case (registered in the dataset) was 07Jan2020. COVID-19 outbreak is still ongoing.

We can create line plots with covsirphy.line_plot() function.

[11]:
cs.line_plot(total_df[["Infected", "Fatal", "Recovered"]], "Total number of cases over time")
_images/usage_dataset_26_0.png

Statistics of fatality and recovery rate are here.

[12]:
total_df.loc[:, total_df.columns.str.contains("per")].describe().T
[12]:
count mean std min 25% 50% 75% max
Fatal per Confirmed 554.0 10.442516 242.760290 0.014970 0.021828 0.027025 0.049266 5714.000000
Recovered per Confirmed 554.0 1.701156 8.866499 0.044539 0.593213 0.634971 0.681159 152.442976
Fatal per (Fatal or Recovered) 554.0 0.066489 0.092761 0.011276 0.033243 0.036216 0.061071 0.999825

Subset for area

JHUData.subset() creates a subset for a specific area. We can select country name and province name. In this tutorial, “Japan” and “Tokyo in Japan” will be used. Please replace it with your country/province name.

Subset for a country:
We can use both of country names and ISO3 codes.
[13]:
# Specify contry name
df, complement = jhu_data.records("Japan")
# Or, specify ISO3 code
# df, complement = jhu_data.records("JPN")
# Show records
display(df.tail())
# Show details of complement
print(complement)
Date Confirmed Infected Fatal Recovered Susceptible
533 2021-07-23 857799 34280 15106 808413 125671301
534 2021-07-24 862148 36148 15116 810884 125666952
535 2021-07-25 865666 36518 15124 814024 125663434
536 2021-07-26 870445 38441 15129 816875 125658655
537 2021-07-27 875506 40231 15137 820138 125653594
monotonic increasing complemented fatal data and
partially complemented recovered data

Complement of records was performed. The second returned value is the description of complement. Details will be explained later and we can skip complement with auto_complement=False argument. Or, use just use JHUData.subset() method when the second returned value (False because no complement) is un-necessary.

[14]:
# Skip complement
df, complement = jhu_data.records("Japan", auto_complement=False)
# Or,
# df = jhu_data.subset("Japan")
display(df.tail())
# Show complement (False because not complemented)
print(complement)
Date Confirmed Infected Fatal Recovered Susceptible
533 2021-07-23 857799 34280 15106 808413 125671301
534 2021-07-24 862148 36148 15116 810884 125666952
535 2021-07-25 865666 36518 15124 814024 125663434
536 2021-07-26 870445 38441 15129 816875 125658655
537 2021-07-27 875506 40231 15137 820138 125653594
False

Subset for a province (called “prefecture” in Japan):

[15]:
df, _ = jhu_data.records("Japan", province="Tokyo")
df.tail()
[15]:
Date Confirmed Infected Fatal Recovered Susceptible
524 2021-07-23 196400 11957 2277 182166 13746456
525 2021-07-24 197528 11881 2277 183370 13745328
526 2021-07-25 199291 12635 2277 184379 13743565
527 2021-07-26 200720 12831 2277 185612 13742136
528 2021-07-27 203568 15677 2279 185612 13739288

The list of countries can be checked with JHUdata.countries() as folows.

[16]:
pprint(jhu_data.countries(), compact=True)
['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola',
 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia', 'Austria',
 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus',
 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia',
 'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'Brunei', 'Bulgaria',
 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde',
 'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Comoros',
 'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Cyprus', 'Czech Republic',
 'Democratic Republic of the Congo', 'Denmark', 'Djibouti', 'Dominica',
 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea',
 'Eritrea', 'Estonia', 'Ethiopia', 'Fiji', 'Finland', 'France', 'Gabon',
 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Greece', 'Grenada', 'Guam',
 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Holy See',
 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran', 'Iraq',
 'Ireland', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jordan', 'Kazakhstan',
 'Kenya', 'Kiribati', 'Kosovo', 'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia',
 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania',
 'Luxembourg', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta',
 'Marshall Islands', 'Mauritania', 'Mauritius', 'Mexico', 'Moldova', 'Monaco',
 'Mongolia', 'Montenegro', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia',
 'Nepal', 'Netherlands', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria',
 'North Macedonia', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan',
 'Palestine', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines',
 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Republic of the Congo',
 'Romania', 'Russia', 'Rwanda', 'Saint Kitts and Nevis', 'Saint Lucia',
 'Saint Vincent and the Grenadines', 'Samoa', 'San Marino',
 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles',
 'Sierra Leone', 'Singapore', 'Slovakia', 'Slovenia', 'Solomon Islands',
 'Somalia', 'South Africa', 'South Korea', 'South Sudan', 'Spain', 'Sri Lanka',
 'Sudan', 'Suriname', 'Swaziland', 'Sweden', 'Switzerland', 'Syria', 'Taiwan',
 'Tajikistan', 'Tanzania', 'Thailand', 'Timor-Leste', 'Togo',
 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Uganda', 'Ukraine',
 'United Arab Emirates', 'United Kingdom', 'United States', 'Uruguay',
 'Uzbekistan', 'Vanuatu', 'Venezuela', 'Vietnam', 'Virgin Islands, U.S.',
 'Yemen', 'Zambia', 'Zimbabwe']

Complement

JHUData.records() automatically complement the records, if necessary and auto_complement=True (default). Each area can have either none or one or multiple complements, depending on the records and their preprocessing analysis.

We can show the specific kind of complements that were applied to the records of each country with JHUData.show_complement() method. The possible kinds of complement for each country are the following:

  1. “Monotonic_confirmed/fatal/recovered” (monotonic increasing complement) Force the variable show monotonic increasing.

  2. “Full_recovered” (full complement of recovered data) Estimate the number of recovered cases using the value of estimated average recovery period.

  3. “Partial_recovered” (partial complement of recovered data) When recovered values are not updated for some days, extrapolate the values.

Note:
“Recovery period” will be discussed in the next subsection.

For JHUData.show_complement(), we can specify country names and province names.

[17]:
# Specify country name
jhu_data.show_complement(country="Japan")
# Or, specify country and province name
# jhu_data.show_complement(country="Japan", province="Tokyo")
[17]:
Country Province Monotonic_confirmed Monotonic_fatal Monotonic_recovered Full_recovered Partial_recovered
0 Japan - False True True False True

When list was apllied was country argument, the all spefied countries will be shown. If None, all registered countries will be used.

[18]:
# Specify country names
jhu_data.show_complement(country=["Greece", "Japan"])
# Or, apply None
# jhu_data.show_complement(country=None)
[18]:
Country Province Monotonic_confirmed Monotonic_fatal Monotonic_recovered Full_recovered Partial_recovered
0 Greece - False False False False True
1 Japan - False True True False True

If complement was performed incorrectly or you need new algorithms, kindly let us know via issue page.

Recovery period

We defined “recovery period” as yhe time period between case confirmation and recovery (as it is subjectively defined per country). With the global cases records, we estimate the average recovery period using JHUData.calculate_recovery_period().

[19]:
recovery_period = jhu_data.calculate_recovery_period()
print(f"Average recovery period: {recovery_period} [days]")
Average recovery period: 16 [days]

What we currently do is to calculate the difference between confirmed cases and fatal cases and try to match it to some recovered cases value in the future. We apply this method for every country that has valid recovery data and average the partial recovery periods in order to obtain a single (average) recovery period. During the calculations, we ignore time intervals that lead to very short (<7 days) or very long (>90 days) partial recovery periods, if these exist with high frequency (>50%) in the records. We have to assume temporarily invariable compartments for this analysis to extract an approximation of the average recovery period.

Alternatively, we had tried to use linelist of case reports to get precise value of recovery period (average of recovery date minus confirmation date for cases), but the number of records was too small.

Visualize the number of cases at a timepoint

We can visualize the number of cases with JHUData.map() method. When country is None, global map will be shown.

Global map with country level data:

[20]:
# Global map with country level data
jhu_data.map(country=None, variable="Infected")
# To set included/exclude some countries
# jhu_data.map(country=None, variable="Infected", included=["Japan"])
# jhu_data.map(country=None, variable="Infected", excluded=["Japan"])
# To change the date
# jhu_data.map(country=None, variable="Infected", date="01Oct2021")
_images/usage_dataset_50_0.png

Values can be retrieved with .layer() method.

[21]:
jhu_data.layer(country=None).tail()
[21]:
Date ISO3 Country Confirmed Infected Fatal Recovered Population
115110 2021-07-22 CHN China 104489 741 4848 98900 1361142636
115111 2021-07-23 CHN China 104526 775 4848 98903 1361142636
115112 2021-07-24 CHN China 104562 769 4848 98945 1361142636
115113 2021-07-25 CHN China 104642 819 4848 98975 1361142636
115114 2021-07-26 CHN China 104713 870 4848 98995 1361142636

Country map with province level data:

[22]:
# Country map with province level data
jhu_data.map(country="Japan", variable="Infected")
# To set included/exclude some countries
# jhu_data.map(country="Japan", variable="Infected", included=["Tokyo"])
# jhu_data.map(country="Japan", variable="Infected", excluded=["Tokyo"])
# To change the date
# jhu_data.map(country="Japan", variable="Infected", date="01Oct2021")
_images/usage_dataset_54_0.png

Values are here.

[23]:
jhu_data.layer(country="Japan").tail()
[23]:
Date ISO3 Country Province Confirmed Infected Fatal Recovered Population
26848 2021-07-23 JPN Japan Yamanashi 2218 47 21 2150 812056
26849 2021-07-24 JPN Japan Yamanashi 2218 47 21 2150 812056
26850 2021-07-25 JPN Japan Yamanashi 2218 47 21 2150 812056
26851 2021-07-26 JPN Japan Yamanashi 2268 86 21 2161 812056
26852 2021-07-27 JPN Japan Yamanashi 2295 113 21 2161 812056
Note for Japan:
Province “Entering” means the number of cases who were confirmed when entering Japan.

OxCGRT indicators

Government responses are tracked with Oxford Covid-19 Government Response Tracker (OxCGRT). Because government responses and activities of persons change the parameter values of SIR-derived models, this dataset is significant when we try to forcast the number of cases. OxCGRTData class will be created with DataLoader.oxcgrt() method.

[24]:
oxcgrt_data = loader.oxcgrt()
[25]:
type(oxcgrt_data)
[25]:
covsirphy.cleaning.oxcgrt.OxCGRTData

Because records will be retrieved via “COVID-19 Data Hub” as well as JHUData, data description and raw data is the same.

[26]:
# Description
print(oxcgrt_data.citation)
# Raw
# oxcgrt_data.raw.tail()

The cleaned dataset is here.

[27]:
oxcgrt_data.cleaned().tail()
[27]:
Date ISO3 Country Province School_closing Workplace_closing Cancel_events Gatherings_restrictions Transport_closing Stay_home_restrictions Internal_movement_restrictions International_movement_restrictions Information_campaigns Testing_policy Contact_tracing Stringency_index
779284 2021-07-22 GRL Greenland Greenland 1.0 2.0 2.0 2.0 1.0 1.0 1.0 3.0 2.0 3.0 1.0 47.22
779285 2021-07-23 GRL Greenland Greenland 1.0 2.0 2.0 2.0 1.0 1.0 1.0 3.0 2.0 3.0 1.0 47.22
779286 2021-07-24 GRL Greenland Greenland 1.0 2.0 2.0 2.0 1.0 1.0 1.0 3.0 2.0 3.0 1.0 47.22
779287 2021-07-25 GRL Greenland Greenland 1.0 2.0 2.0 2.0 1.0 1.0 1.0 3.0 2.0 3.0 1.0 47.22
779288 2021-07-26 GRL Greenland Greenland 1.0 2.0 2.0 2.0 1.0 1.0 1.0 3.0 2.0 3.0 1.0 47.22

Subset for area

PopulationData.subset() creates a subset for a specific area. We can select only country name. Note that province level data is not registered in OxCGRTData.

Subset for a country:
We can use both of country names and ISO3 codes.
[28]:
oxcgrt_data.subset("Japan").tail()
# Or, with ISO3 code
# oxcgrt_data.subset("JPN").tail()
[28]:
Date School_closing Workplace_closing Cancel_events Gatherings_restrictions Transport_closing Stay_home_restrictions Internal_movement_restrictions International_movement_restrictions Information_campaigns Testing_policy Contact_tracing Stringency_index
548 2021-07-23 1.0 2.0 1.0 2.0 1.0 1.0 1.0 4.0 2.0 1.0 2.0 50.46
549 2021-07-24 1.0 2.0 1.0 2.0 1.0 1.0 1.0 4.0 2.0 1.0 2.0 50.46
550 2021-07-25 1.0 2.0 1.0 2.0 1.0 1.0 1.0 4.0 2.0 1.0 2.0 50.46
551 2021-07-26 1.0 2.0 1.0 2.0 1.0 1.0 1.0 4.0 2.0 1.0 2.0 50.46
552 2021-07-27 1.0 2.0 1.0 2.0 1.0 1.0 1.0 4.0 2.0 1.0 2.0 50.46

Visualize indicator values

We can visualize indicator values with .map() method. Arguments are the same as JHUData.map(), but country name cannot be specified.

[29]:
oxcgrt_data.map(variable="Stringency_index")
_images/usage_dataset_69_0.png

Values are here.

[30]:
oxcgrt_data.layer().tail()
[30]:
Date ISO3 Country School_closing Workplace_closing Cancel_events Gatherings_restrictions Transport_closing Stay_home_restrictions Internal_movement_restrictions International_movement_restrictions Information_campaigns Testing_policy Contact_tracing Stringency_index
115681 2021-07-22 GRL Greenland 1.0 2.0 2.0 2.0 1.0 1.0 1.0 3.0 2.0 3.0 1.0 47.22
115682 2021-07-23 GRL Greenland 1.0 2.0 2.0 2.0 1.0 1.0 1.0 3.0 2.0 3.0 1.0 47.22
115683 2021-07-24 GRL Greenland 1.0 2.0 2.0 2.0 1.0 1.0 1.0 3.0 2.0 3.0 1.0 47.22
115684 2021-07-25 GRL Greenland 1.0 2.0 2.0 2.0 1.0 1.0 1.0 3.0 2.0 3.0 1.0 47.22
115685 2021-07-26 GRL Greenland 1.0 2.0 2.0 2.0 1.0 1.0 1.0 3.0 2.0 3.0 1.0 47.22

The number of tests

The number of tests is also key information to understand the situation. PCRData class will be created with DataLoader.pcr() method.

[31]:
pcr_data = loader.pcr()
[32]:
type(pcr_data)
[32]:
covsirphy.cleaning.pcr_data.PCRData

Because records will be retrieved via “COVID-19 Data Hub” as well as JHUData, data description and raw data is the same.

[33]:
# Description
print(pcr_data.citation)
# Raw
# pcr_data.raw.tail()
Hirokazu Takaya (2020-2021), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan
(Secondary source) Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.
Hasell, J., Mathieu, E., Beltekian, D. et al. A cross-country database of COVID-19 testing. Sci Data 7, 345 (2020). https://doi.org/10.1038/s41597-020-00688-8

The cleaned dataset is here.

[34]:
pcr_data.cleaned().tail()
[34]:
Date Country Province Tests Confirmed
777628 2021-07-22 Zimbabwe - 872248 93421
777629 2021-07-23 Zimbabwe - 872248 95686
777630 2021-07-24 Zimbabwe - 872248 97277
777631 2021-07-25 Zimbabwe - 872248 97894
777632 2021-07-26 Zimbabwe - 872248 99944

Subset for area

PCRData.subset() creates a subset for a specific area. We can select country name and province name.

Subset for a country:
We can use both of country names and ISO3 codes.
[35]:
pcr_data.subset("Japan").tail()
# Or, with ISO3 code
# pcr_data.subset("JPN").tail()
[35]:
Date Tests Tests_diff Confirmed
533 2021-07-23 17803291 42751 857799
534 2021-07-24 17857644 54353 862148
535 2021-07-25 17893458 35814 865666
536 2021-07-26 17914211 20753 870445
537 2021-07-27 18013366 99155 875506

Positive rate

Under the assumption that all tests were PCR test, we can calculate the positive rate of PCR tests as “the number of confirmed cases per the number of tests” with PCRData.positive_rate() method.

[36]:
pcr_data.positive_rate("Japan").tail()
_images/usage_dataset_83_0.png
[36]:
Date ISO3 Country Province Tests Confirmed Tests_diff Confirmed_diff Test_positive_rate
532 2021-07-23 JPN Japan - 17803291 857799 60840.285714 3800.857143 6.247270
533 2021-07-24 JPN Japan - 17857644 862148 57058.285714 3932.857143 6.892701
534 2021-07-25 JPN Japan - 17893458 865666 55231.857143 3890.571429 7.044071
535 2021-07-26 JPN Japan - 17914211 870445 53664.428571 4129.714286 7.695441
536 2021-07-27 JPN Japan - 18013366 875506 56926.714286 4498.857143 7.902893

Visualize the number of tests

We can visualize the number of tests with .map() method. When country is None, global map will be shown. Arguments are the same as JHUData, but variable name cannot be specified.

Country level data:

[37]:
pcr_data.map(country=None)
_images/usage_dataset_86_0.png

Values are here.

[38]:
pcr_data.layer(country=None).tail()
[38]:
Date ISO3 Country Tests Confirmed
57680 2021-07-22 ZWE Zimbabwe 872248 93421
57681 2021-07-23 ZWE Zimbabwe 872248 95686
57682 2021-07-24 ZWE Zimbabwe 872248 97277
57683 2021-07-25 ZWE Zimbabwe 872248 97894
57684 2021-07-26 ZWE Zimbabwe 872248 99944

Province level data:

[39]:
pcr_data.map(country="Japan")
_images/usage_dataset_90_0.png

Values are here.

[40]:
pcr_data.layer(country="Japan").tail()
[40]:
Date ISO3 Country Province Tests Confirmed
23867 2021-07-23 JPN Japan Yamanashi 65656 2218
23868 2021-07-24 JPN Japan Yamanashi 65656 2218
23869 2021-07-25 JPN Japan Yamanashi 65656 2218
23870 2021-07-26 JPN Japan Yamanashi 69775 2268
23871 2021-07-27 JPN Japan Yamanashi 69775 2295

Vaccinations

Vaccinations is a key factor to end the outbreak as soon as possible. VaccineData class will be created with DataLoader.vaccine() method.

[41]:
vaccine_data = loader.vaccine()
[42]:
type(vaccine_data)
[42]:
covsirphy.cleaning.vaccine_data.VaccineData

Description is here.

[43]:
print(vaccine_data.citation)
Hasell, J., Mathieu, E., Beltekian, D. et al. A cross-country database of COVID-19 testing. Sci Data 7, 345 (2020). https://doi.org/10.1038/s41597-020-00688-8
Hirokazu Takaya (2020-2021), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan

Raw data is here.

[44]:
vaccine_data.raw.tail()
[44]:
Date ISO3 Country Province Product Vaccinations Vaccinated_once Vaccinated_full
777627 2021-07-21 ZWE Zimbabwe - Sinopharm/Beijing, Sinovac, Sputnik V 1949472.0 1292642.0 656830.0
777628 2021-07-22 ZWE Zimbabwe - Sinopharm/Beijing, Sinovac, Sputnik V 2017101.0 1352514.0 664587.0
777629 2021-07-23 ZWE Zimbabwe - Sinopharm/Beijing, Sinovac, Sputnik V 2072060.0 1400905.0 671155.0
777630 2021-07-24 ZWE Zimbabwe - Sinopharm/Beijing, Sinovac, Sputnik V 2116664.0 1438890.0 677774.0
777631 2021-07-25 ZWE Zimbabwe - Sinopharm/Beijing, Sinovac, Sputnik V 2127402.0 1447342.0 680060.0

The next is the cleaned dataset.

[45]:
vaccine_data.cleaned().tail()
[45]:
Date ISO3 Country Province Product Vaccinations Vaccinated_once Vaccinated_full
43447 2021-07-23 ZWE Zimbabwe - Sinopharm/Beijing, Sinovac, Sputnik V 2072060 1400905 671155
43448 2021-07-24 ZWE Zimbabwe - Sinopharm/Beijing, Sinovac, Sputnik V 2116664 1438890 677774
43449 2021-07-25 ZWE Zimbabwe - Sinopharm/Beijing, Sinovac, Sputnik V 2127402 1447342 680060
43450 2021-07-26 ZWE Zimbabwe - Sinopharm/Beijing, Sinovac, Sputnik V 2127402 1447342 680060
43451 2021-07-27 ZWE Zimbabwe - Sinopharm/Beijing, Sinovac, Sputnik V 2127402 1447342 680060

Note for variables

Definition of variables are as follows.

  • Vaccinations: cumulative number of vaccinations

  • Vaccinated_once: cumulative number of people who received at least one vaccine dose

  • Vaccinated_full: cumulative number of people who received all doses prescrived by the protocol

Registered countries can be checked with VaccineData.countries() method.

[46]:
pprint(vaccine_data.countries(), compact=True)
['Afghanistan', 'Albania', 'Andorra', 'Angola', 'Anguilla',
 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria',
 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus',
 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia',
 'Bonaire Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana',
 'Brazil', 'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso',
 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands',
 'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Comoros',
 'Cook Islands', 'Costa Rica', 'Croatia', 'Cuba', 'Curacao', 'Cyprus',
 'Czechia', 'Democratic Republic of Congo', 'Denmark', 'Djibouti', 'Dominica',
 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea',
 'Estonia', 'Eswatini', 'Faeroe Islands', 'Falkland Islands', 'Fiji', 'Finland',
 'France', 'French Polynesia', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana',
 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guatemala', 'Guernsey',
 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'Hong Kong',
 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran', 'Iraq', 'Ireland',
 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan',
 'Kazakhstan', 'Kenya', 'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon',
 'Lesotho', 'Liberia', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao',
 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Mauritania', 'Mauritius',
 'Mexico', 'Moldova', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat',
 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands',
 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Niue',
 'North Macedonia', 'Norway', 'Oman', 'Pakistan', 'Palestine', 'Panama',
 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Poland', 'Portugal',
 'Qatar', 'Romania', 'Russia', 'Rwanda', 'Saint Helena',
 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Vincent and the Grenadines',
 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal',
 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore',
 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Solomon Islands',
 'Somalia', 'South Africa', 'South Korea', 'South Sudan', 'Spain', 'Sri Lanka',
 'Sudan', 'Suriname', 'Sweden', 'Switzerland', 'Syria', 'Taiwan', 'Tajikistan',
 'Thailand', 'Timor', 'Togo', 'Tonga', 'Trinidad and Tobago', 'Tunisia',
 'Turkey', 'Turkmenistan', 'Turks and Caicos Islands', 'Uganda', 'Ukraine',
 'United Arab Emirates', 'United Kingdom', 'United States', 'Uruguay',
 'Uzbekistan', 'Venezuela', 'Vietnam', 'Wallis and Futuna', 'Yemen', 'Zambia',
 'Zimbabwe']

Subset for area

VaccineData.subset() creates a subset for a specific area. We can select only country name. Note that province level data is not registered.

Subset for a country:
We can use both of country names and ISO3 codes.
[47]:
vaccine_data.subset("Japan").tail()
# Or, with ISO3 code
# vaccine_data.subset("JPN").tail()
[47]:
Date Vaccinations Vaccinated_once Vaccinated_full
208 2021-07-23 77120493 46157623 30962870
209 2021-07-24 77944424 46426270 31518154
210 2021-07-25 78656888 46654777 32002111
211 2021-07-26 79383659 46911901 32471758
212 2021-07-27 79383659 46911901 32471758

Visualize the number of vaccinations

We can visualize the number of vaccinations and the other variables with .map() method. Arguments are the same as JHUData, but country name cannot be specified.

[48]:
vaccine_data.map()
_images/usage_dataset_109_0.png

Values are here.

[49]:
vaccine_data.layer().tail()
[49]:
Date ISO3 Country Product Vaccinations Vaccinated_once Vaccinated_full
43447 2021-07-23 ZWE Zimbabwe Sinopharm/Beijing, Sinovac, Sputnik V 2072060 1400905 671155
43448 2021-07-24 ZWE Zimbabwe Sinopharm/Beijing, Sinovac, Sputnik V 2116664 1438890 677774
43449 2021-07-25 ZWE Zimbabwe Sinopharm/Beijing, Sinovac, Sputnik V 2127402 1447342 680060
43450 2021-07-26 ZWE Zimbabwe Sinopharm/Beijing, Sinovac, Sputnik V 2127402 1447342 680060
43451 2021-07-27 ZWE Zimbabwe Sinopharm/Beijing, Sinovac, Sputnik V 2127402 1447342 680060

Mobility

Levels of mobility is a key factor of \(\rho\) (effective contact rate) of SIR-derived ODE models. MobilityData class will be created with DataLoader.mobility() method.

[50]:
# mobility_data = loader.mobility()
[51]:
# type(mobility_data)

Description is here.

[52]:
# print(mobility_data.citation)

Raw data is here.

[53]:
# mobility_data.raw.tail()

The next is the cleaned dataset.

[54]:
# mobility_data.cleaned().tail()

Note for variables

Definition of variables are as follows.

  • Mobility_grocery_and_pharmacy (int): % to baseline in visits (grocery markets, pharmacies etc.)

  • Mobility_parks (int): % to baseline in visits (parks etc.)

  • Mobility_transit_stations (int): % to baseline in visits (public transport hubs etc.)

  • Mobility_retail_and_recreation (int): % to baseline in visits (restaurant, museums etc.)

  • Mobility_residential (int): % to baseline in visits (places of residence)

  • Mobility_workplaces (int): % to baseline in visits (places of work)

Registered countries can be checked with MobilityData.countries() method.

[55]:
# pprint(mobility_data.countries(), compact=True)

Subset for area

MobilityData.subset() creates a subset for a specific area (country/province).

Subset for a country: We can use both of country names and ISO3 codes.

[56]:
# mobility_data.subset("Japan").tail()
# Or, with ISO3 code
# mobility_data.subset("JPN").tail()

Visualize mobility data

We can visualize the levels of mobility with MobilityData.map() method. Arguments are the same as JHUData.

[57]:
# mobility_ata.map()

Values are here.

[58]:
# mobility_data.layer().tail()

Population pyramid

With population pyramid, we can divide the population to sub-groups. This will be useful when we analyse the meaning of parameters. For example, how many days go out is different between the sub-groups. PyramidData class will be created with DataLoader.pyramid() method.

[59]:
pyramid_data = loader.pyramid()
[60]:
type(pyramid_data)
[60]:
covsirphy.cleaning.pyramid.PopulationPyramidData

Description is here.

[61]:
print(pyramid_data.citation)
World Bank Group (2020), World Bank Open Data, https://data.worldbank.org/

Raw dataset is not registered. Subset will be retrieved when PyramidData.subset() was called.

[62]:
pyramid_data.subset("Japan").tail()
Retrieving population pyramid dataset (Japan) from https://data.worldbank.org/
[62]:
Age Population Per_total
113 118 262648 0.00225
114 119 262648 0.00225
115 120 262648 0.00225
116 121 262648 0.00225
117 122 262648 0.00225

“Per_total” is the proportion of the age group in the total population.

Japan-specific dataset

This includes the number of confirmed/infected/fatal/recovered/tests/moderate/severe cases at country/prefecture level and metadata of each prefecture (province). JapanData class will be created with DataLoader.japan() method.

[63]:
japan_data = loader.japan()
[64]:
type(japan_data)
[64]:
covsirphy.cleaning.japan_data.JapanData

Description is here.

[65]:
print(japan_data.citation)
Hirokazu Takaya (2020-2021), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan

The next is the cleaned dataset.

[66]:
japan_data.cleaned().tail()
[66]:
Date Country Province Confirmed Infected Fatal Recovered Tests Moderate Severe Vaccinations Vaccinated_once Vaccinated_full
24383 2021-07-26 Japan Shiga 5805 141 93 5571 166453 140 1 0 0 0
24384 2021-07-26 Japan Osaka 109742 4403 2719 102620 2402057 3359 158 0 0 0
24385 2021-07-26 Japan Yamanashi 2268 86 21 2161 69775 86 0 0 0 0
24386 2021-07-27 Japan Entering 3453 174 6 3273 874667 174 0 42311490247 25004043233 17307447014
24387 2021-07-27 Japan - 875506 40231 15137 820138 18013366 38453 514 3143133984 2066897478 1076236506

Visualize values

We can visualize the values with .map() method. Arguments are the same as JHUData.

[67]:
japan_data.map(variable="Severe")
_images/usage_dataset_146_0.png

Values are here.

[68]:
japan_data.layer(country="Japan").tail()
[68]:
Date Country Province Confirmed Infected Fatal Recovered Tests Moderate Severe Vaccinations Vaccinated_once Vaccinated_full
23845 2021-07-26 Japan Fukui 1490 84 36 1370 76581 84 0 0 0 0
23846 2021-07-26 Japan Shiga 5805 141 93 5571 166453 140 1 0 0 0
23847 2021-07-26 Japan Osaka 109742 4403 2719 102620 2402057 3359 158 0 0 0
23848 2021-07-26 Japan Yamanashi 2268 86 21 2161 69775 86 0 0 0 0
23849 2021-07-27 Japan Entering 3453 174 6 3273 874667 174 0 42311490247 25004043233 17307447014

Map with country level data is not prepared, but country level data can be retrieved.

[69]:
japan_data.layer(country=None).tail()
[69]:
Date Country Confirmed Infected Fatal Recovered Tests Moderate Severe Vaccinations Vaccinated_once Vaccinated_full
533 2021-07-23 Japan 857799 34280 15106 808413 17803291 32734 431 2827765354 1879992629 947772725
534 2021-07-24 Japan 862148 36148 15116 810884 17857644 34726 436 2905709778 1926418899 979290879
535 2021-07-25 Japan 865666 36518 15124 814024 17893458 35134 448 2984366666 1973073676 1011292990
536 2021-07-26 Japan 870445 38441 15129 816875 17914211 36912 466 3063750325 2019985577 1043764748
537 2021-07-27 Japan 875506 40231 15137 820138 18013366 38453 514 3143133984 2066897478 1076236506

Metadata

Additionally, JapanData.meta() retrieves meta data for Japan prefectures.

[70]:
japan_data.meta().tail()
Retrieving Metadata of Japan dataset from https://github.com/lisphilar/covid19-sir/data/japan
[70]:
Prefecture Admin_Capital Admin_Region Admin_Num Area_Habitable Area_Total Clinic_bed_Care Clinic_bed_Total Hospital_bed_Care Hospital_bed_Specific Hospital_bed_Total Hospital_bed_Tuberculosis Hospital_bed_Type-I Hospital_bed_Type-II Population_Female Population_Male Population_Total Location_Latitude Location_Longitude
42 Kumamoto Kumamoto Kyushu 43 2796 7409 497 4628 8340 0 33710 95 2 46 933 833 1765 32.790513 130.742388
43 Oita Oita Kyushu 44 1799 6341 269 3561 2618 0 19834 50 2 38 607 546 1152 33.238391 131.612658
44 Miyazaki Miyazaki Kyushu 45 1850 7735 206 2357 3682 0 18769 33 1 30 577 512 1089 31.911188 131.423873
45 Kagoshima Kagoshima Kyushu 46 3313 9187 652 4827 7750 0 32651 98 1 44 863 763 1626 31.560052 130.557745
46 Okinawa Naha Okinawa 47 1169 2281 83 914 3804 0 18710 47 4 20 734 709 1443 26.211761 127.681119