covsirphy package

class BarPlot(filename=None, bbox_inches='tight', **kwargs)[source]

Bases: covsirphy.visualization.vbase.VisualizeBase

Create a bar plot.

Parameters
  • filename (str or None) – filename to save the figure or None (display)

  • bbox_inches (str) – bounding box in inches when creating the figure

  • kwargs – the other arguments of matplotlib.pyplot.savefig()

line(v=None, h=None, color='black', linestyle=':')[source]

Show vertical/horizontal lines.

Parameters
  • v (list[int/float] or None) – list of x values of vertical lines or None

  • h (list[int/float] or None) – list of y values of horizontal lines or None

  • color (str) – color of the line

  • linestyle (str) – linestyle

plot(data, vertical=True, colormap=None, color_dict=None, **kwargs)[source]

Create bar plot.

Parameters
  • data (pandas.DataFrame or pandas.Series) –

    data to show Index

    labels of the bars

    Columns

    variables to show

  • vartical (bool) – whether vertical bar plot (True) or horizontal bar plot (False)

  • colormap (str, matplotlib colormap object or None) – colormap, please refer to https://matplotlib.org/examples/color/colormaps_reference.html

  • color_dict (dict[str, str] or None) – dictionary of column names (keys) and colors (values)

  • kwargs – keyword arguments of pandas.DataFrame.plot()

x_axis(xlabel=None)[source]

Set x axis.

Parameters

xlabel (str or None) – x-label

y_axis(ylabel='Cases', y_logscale=False, ylim=(0, None), math_scale=True, y_integer=False)[source]

Set x axis.

Parameters
  • ylabel (str or None) – y-label

  • y_logscale (bool) – whether use log-scale in y-axis or not

  • ylim (tuple(int or float, int or float)) – limit of y dimain

  • math_scale (bool) – whether use LaTEX or not in y-label

  • y_integer (bool) – whether force to show the values as integer or not

Note

If None is included in ylim, the values will be automatically determined by Matplotlib

class COVID19DataHub(filename)[source]

Bases: covsirphy.util.term.Term

Load datasets retrieved from COVID-19 Data Hub. https://covid19datahub.io/

Parameters

filename (str) – CSV filename to save records

CITATION = '(Secondary source) Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.'
OBJ_DICT = {'jhu': <class 'covsirphy.cleaning.jhu_data.JHUData'>, 'oxcgrt': <class 'covsirphy.cleaning.oxcgrt.OxCGRTData'>, 'pcr': <class 'covsirphy.cleaning.pcr_data.PCRData'>, 'population': <class 'covsirphy.cleaning.population.PopulationData'>}
load(name='jhu', force=True, verbose=1)[source]

Load the datasets of COVID-19 Data Hub and create dataset object.

Parameters
  • name (str) – name of dataset, “jhu”, “population”, “oxcgrt” or “pcr”

  • force (bool) – if True, always download the dataset from the server

  • verbose (int) – level of verbosity

Returns

the dataset

Return type

covsirphy.CleaningBase

Note

If @verbose is 2, detailed citation list will be shown when downloading. If @verbose is 1, how to show the list will be explained. Citation of COVID-19 Data Hub will be set as JHUData.citation etc.

property primary

the list of primary sources.

Type

str

class ChangeFinder(**kwargs)[source]

Bases: covsirphy.trend.trend_detector.TrendDetector

Deprecated. Please use TrendDetector class.

class CleaningBase(filename, citation=None)[source]

Bases: covsirphy.util.term.Term

Basic class for data cleaning.

Parameters
  • filename (str or None) – CSV filename of the dataset

  • citation (str or None) – citation

Note

  • If @filename is None, empty dataframe will be set as raw data and geometry information will be saved in “input” directory.

  • If @filename is not None, geometry information will be saved in the directory which has the file.

  • The directory of geometry information could be changed with .directory property.

  • If @citation is None, citation will be empty string.

classmethod area_name(country, province=None)[source]

Return area name of the country/province.

Parameters
  • country (str) – country name or ISO3 code

  • province (str) – province name

Returns

area name

Return type

str

Note

If province is None or ‘-‘, return country name. If not, return the area name, like ‘Japan/Tokyo’

property citation

citation/description of the dataset

Type

str

cleaned()[source]

Return the cleaned dataset.

Note

Cleaning method is defined by CleaningBase._cleaning() method.

Returns

cleaned data

Return type

pandas.DataFrame

countries()[source]

Return names of countries where records are registered.

Raises

KeyError – Country names are not registered in this dataset

Returns

list of country names

Return type

list[str]

country_to_iso3(country, check_data=True)[source]

Convert country name to ISO3 code if records are available.

Parameters
  • country (str) – country name

  • check_data (bool) – whether validate the country name with the dataset

Raises

KeyError – ISO3 code of the country is not registered

Returns

ISO3 code or “—” (when unknown)

Return type

str

property directory

directory name to save geometry information

Type

str

ensure_country_name(country, errors='raise')[source]

Ensure that the country name is correct. If not, the correct country name will be found.

Parameters
  • country (str) – country name

  • errors (str) – ‘raise’ or ‘coerce’

Returns

country name

Return type

str

Raises

SubsetNotFoundError – no records were found for the country and @errors is ‘raise’

iso3_to_country(**kwargs)
layer(country=None)[source]

Return the cleaned data at the selected layer.

Parameters

country (str or None) – country name or None (country level data or country-specific dataset)

Returns

Index

reset index

Columns - Country (str): country names - Province (str): province names (or removed when country level data) - any other columns of the cleaned data

Return type

pandas.DataFrame

Raises
  • SubsetNotFoundError – no records were found for the country (when @country is not None)

  • KeyError – @country was None, but country names were not registered in the dataset

Note

When @country is None, country level data will be returned. When @country is a country name, province level data in the selected country will be returned.

static load(urlpath, header=0, columns=None, dtype='object')[source]

Load a local/remote file.

Parameters
  • urlpath (str or pathlib.Path) – filename or URL

  • header (int) – row number of the header

  • columns (list[str]) – columns to use

  • dtype (str or dict[str]) – data type for the dataframe or specified columns

Returns

raw dataset

Return type

pd.DataFrame

property raw

raw data

Type

pandas.DataFrame

records(country, province=None, start_date=None, end_date=None, auto_complement=True, **kwargs)[source]

Return the subset. If necessary, complemention will be performed.

Parameters
  • country (str) – country name or ISO3 code

  • province (str or None) – province name

  • start_date (str or None) – start date, like 22Jan2020

  • end_date (str or None) – end date, like 01Feb2020

  • auto_complement (bool) – if True and necessary, the number of cases will be complemented

  • kwargs – the other arguments of complement

Returns

pandas.DataFrame
Index

reset index

Columns

without ISO3, Country, Province column

subset(country, province=None, start_date=None, end_date=None)[source]

Return subset with country/province name and start/end date.

Parameters
  • country (str) – country name or ISO3 code

  • province (str or None) – province name

  • start_date (str or None) – start date, like 22Jan2020

  • end_date (str or None) – end date, like 01Feb2020

Returns

pandas.DataFrame
Index

reset index

Columns

without ISO3, Country, Province column

Raises

SubsetNotFoundError – no records were found for the condition

subset_complement(country, **kwargs)[source]

Return the subset. If necessary, complemention will be performed.

Raises

NotImplementedError

total()[source]

Calculate total values of the cleaned dataset.

class ColoredMap(filename=None, **kwargs)[source]

Bases: covsirphy.visualization.vbase.VisualizeBase

Create global map with pandas.DataFrame.

Parameters
  • filename (str or None) – filename to save the figure or None (display)

  • kwargs – the other arguments of matplotlib.pyplot.savefig()

property directory

directory to save the downloaded files of geometry information

Type

str

plot(data, level='Country', included=None, excluded=None, logscale=True, **kwargs)[source]

Set dataframe and the variable to show in a colored map.

Parameters
  • data (pandas.DataFrame) –

    data to show Index

    reset index

    Columns
    • Country (str or pandas.Category): country name(s)

    • Province (str or pandas.Category): province names, necessary when @level is ‘Province’

    • Value (int or float or None): values to coloring the map

    • ISO3 (str): ISO3 codes, optional

  • level (str) – ‘Country’ (global map) or ‘Province’ (country-specific map)

  • logscale (bool) – whether convert the value to log10 scale values or not

  • included (list[str] or None) – included countries/provinces or None (all)

  • excluded (list[str] or None) – excluded countries/provinces or None (all)

  • kwargs – arguments of geopandas.GeoDataFrame.plot() except for ‘column’

Raises
  • ValueError – labels for data are not unique

  • UnExpectedValueError – some countries’ records are included when @level is ‘Province’

  • SubsetNotFoundError – no geometry information available for the labels

class ComparePlot(filename=None, bbox_inches='tight', **kwargs)[source]

Bases: covsirphy.visualization.vbase.VisualizeBase

Compare two groups with specified variables.

Parameters
  • filename (str or None) – filename to save the figure or None (display)

  • bbox_inches (str) – bounding box in inches when creating the figure

  • kwargs – the other arguments of matplotlib.pyplot.savefig()

plot(data, variables, groups)[source]

Compare two groups with specified variables.

Parameters
  • data (pandas.DataFrame) –

    data to show Index

    x values

    Columns

    y variables to show, “{variable}_{group}” for all combinations of variables and groups

  • variables (list[str]) – variables to compare

  • groups (list[str]) – the first group name and the second group name

class CountryData(filename, country, province=None)[source]

Bases: covsirphy.cleaning.cbase.CleaningBase

Data cleaning of country level data.

Parameters
  • filename (str or None) – filename to read the data

  • country (str) – country name

  • province (str or None) – province name

Note

If province name will be set in CountryData.set_variables(), @province will be ignored.

cleaned()[source]

Return the cleaned dataset. Cleaning method is defined by CountryData._cleaning() method.

Returns

Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Country (pandas.Category): country/region name

  • Province (pandas.Category): province/prefecture/sstate name

  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

Return type

pandas.DataFrame

countries()[source]

Return names of countries where records are registered.

Returns

list of country names

Return type

list[str]

property country

country name

Type

str

map(country=None, variable='Confirmed', date=None, **kwargs)[source]

Create colored map to show the values.

Parameters
  • country (None) – None

  • variable (str) – variable name to show

  • date (str or None) – date of the records or None (the last value)

  • kwargs – arguments of ColoredMap() and ColoredMap.plot()

Raises

NotImplementedError – @country was specified

raw_columns()[source]

Return the column names of the raw data.

Returns

the list of column names of the raw data

Return type

list[str]

register_total()[source]

Register total value of all provinces as country level data.

Returns

self

Return type

covsirphy.CountryData

Note

If country level data was registered, this will be overwritten.

set_variables(date, confirmed, fatal, recovered, province=None)[source]

Set the correspondence of the variables and columns of the raw data.

Parameters
  • date (str) – column name for Date

  • confirmed (str) – column name for Confirmed

  • fatal (str) – column name for Fatal

  • recovered (str) – column name for Confirmed

  • province (str) – (optional) column name for Province

total()[source]

Return a dataframe to show chronological change of number and rates.

Returns

group-by Date, sum of the values

Index

Date (pd.Timestamp): Observation date

Columns
  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

  • Fatal per Confirmed (int)

  • Recovered per Confirmed (int)

  • Fatal per (Fatal or Recovered) (int)

Return type

pandas.DataFrame

class DataHandler(country, province=None, **kwargs)[source]

Bases: covsirphy.util.term.Term

Data handler for analysis.

Parameters
  • country (str) – country name

  • province (str or None) – province name

  • kwargs – arguments of DataHandler.register()

EXTRA_DICT = {'CountryData': <class 'covsirphy.cleaning.country_data.CountryData'>, 'OxCGRTData': <class 'covsirphy.cleaning.oxcgrt.OxCGRTData'>, 'PCRData': <class 'covsirphy.cleaning.pcr_data.PCRData'>, 'VaccineData': <class 'covsirphy.cleaning.vaccine_data.VaccineData'>}
MAIN_DICT = {'JHUData': <class 'covsirphy.cleaning.jhu_data.JHUData'>}
property complemented

whether complemented or not and the details, None when not confirmed

Raises

NotRegisteredMainError – no information because JHUData was not registered

Type

bool or str

estimate_delay(indicator, target, min_size=7, use_difference=False, delay_name='Period Length')[source]

Estimate the average day [days] between the indicator and the target. We assume that the indicator impact on the target value with delay. All results will be returned with a dataframe.

Parameters
  • indicator (str) – indicator name, a column of any registered datasets

  • target (str) – target name, a column of any registered datasets

  • min_size (int) – minimum size of the delay period

  • use_difference (bool) – if True, use first discrete difference of target

  • delay_name (str) – column name of delay in the output dataframe

Raises
  • NotRegisteredMainError – JHUData was not registered

  • SubsetNotFoundError – failed in subsetting because of lack of data

  • UserWarning – failed in calculating and returned the default value (recovery period)

Returns

Index

reset index

Columns
  • (int or float): column defined by @indicator

  • (int or float): column defined by @target

  • (int): column defined by @delay_name [days]

Return type

pandas.DataFrame

Note

property first_date

the first date of the records

Type

str

property last_date

the last date of the records

Type

str

property main_satisfied

all main datasets were registered or not

Type

bool

property population
records(main=True, extras=True, past=True, future=True)[source]

Return records of the datasets as a dataframe.

Parameters
  • main (bool) – whether include main datasets or not

  • extras (bool) – whether include extra datasets or not

  • past (bool) – whether include past records or not

  • future (bool) – whether include future records or not

Raises
Returns

Index

reset index

Columns:
  • Date(pd.Timestamp): Observation date

  • if @main is True,
    • Confirmed(int): the number of confirmed cases

    • Infected(int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases ( > 0)

    • Susceptible(int): the number of susceptible cases

  • if @extra is True,
    • columns defined in the extra datasets

Return type

pandas.DataFrame

records_all()[source]

Return registered all records of the datasets as a dataframe.

Raises
Returns

Index

reset index

Columns:
  • Date(pd.Timestamp): Observation date

  • Confirmed(int): the number of confirmed cases

  • Infected(int): the number of currently infected cases

  • Fatal(int): the number of fatal cases

  • Recovered (int): the number of recovered cases ( > 0)

  • Susceptible(int): the number of susceptible cases

  • columns defined in the extra datasets

Return type

pandas.DataFrame

records_extras()[source]

Return records of the extra datasets as a dataframe.

Raises
Returns

Index

reset index

Columns:
  • Date(pd.Timestamp): Observation date

  • columns defined in the extra datasets

Return type

pandas.DataFrame

records_main()[source]

Return records of the main datasets as a dataframe.

Raises
Returns

Index

reset index

Columns:
  • Date (pd.Timestamp): Observation date

  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases ( > 0)

  • Susceptible (int): the number of susceptible cases

Return type

pandas.DataFrame

recovery_period()[source]

Return representative value of recovery period of all countries.

Raises

NotRegisteredMainError – JHUData was not registered

Returns

recovery period [days]

Return type

int

register(jhu_data=None, population_data=None, extras=None)[source]

Register datasets.

Parameters
Raises
  • TypeError – non-data cleaning instance was included

  • UnExpectedValueError – instance of un-expected data cleaning class was included as an extra dataset

show_complement()[source]

Show the details of complement that was (or will be) performed for the records.

Raises

NotRegisteredMainError – JHUData was not registered

Returns

as the same as JHUData.show_complement()

Return type

pandas.DataFrame

Note

Keyword arguments of JHUData,subset_complement() can be specified with DataHandler.switch_complement().

switch_complement(whether=None, **kwargs)[source]

Switch whether perform auto complement or not. (Default: True)

Parameters
  • whether (bool or None) – if True and necessary, the number of cases will be complemented

  • kwargs – the other arguments of JHUData.subset_complement()

Note

When @whether is None, @whether will not be changed.

timepoints(first_date=None, last_date=None, today=None)[source]

Set the range of data and reference date to determine past/future of phases.

Parameters
  • first_date (str or None) – the first date of the records or None (min date of main dataset)

  • last_date (str or None) – the first date of the records or None (max date of main dataset)

  • today (str or None) – reference date to determine whether a phase is a past phase or a future phase

Raises

Note

When @today is None, the reference date will be the same as @last_date (or max date).

property today

reference date to determine whether a phase is a past phase or a future phase

Type

str

class DataLoader(directory='input', update_interval=12)[source]

Bases: covsirphy.util.term.Term

Download the dataset and perform data cleaning.

Parameters
  • directory (str or pathlib.Path) – directory to save the downloaded datasets

  • update_interval (int) – update interval of the local datasets

Note

GitHub datasets will be always updated because headers of GET response does not have ‘Last-Modified’ keys. If @update_interval hours have passed since the last update of local datasets, updating will be forced when updating is not prevented by the methods.

Examples

>>> # Setup
>>> import covsirphy as cs
>>> data_loader = cs.DataLoader("input")
>>> # JHU data: the number of cases
>>> jhu_data = data_loader.jhu()
>>> print(jhu_data.citation)
...
>>> print(type(jhu_data.cleaned()))
<class 'pandas.core.frame.DataFrame'>
>>> # The number of cases in Japan
>>> jpn_data = data_loader.japan()
>>> print(jpn_data.citation)
...
>>> print(type(jpn_data.cleaned()))
<class 'pandas.core.frame.DataFrame'>
>>> # Population values
>>> population_data = data_loader.population()
>>> print(population_data.citation)
...
>>> print(type(population_data.cleaned()))
<class 'pandas.core.frame.DataFrame'>
>>> # OxCGRT: Government responses
>>> oxcgrt_data = data_loader.oxcgrt()
>>> print(oxcgrt_data.citation)
...
>>> print(type(oxcgrt_data.cleaned()))
<class 'pandas.core.frame.DataFrame'>
>>> # Citation list of COVID-19 Data Hub
>>> print(data_loader.covid19dh_citation)
...
GITHUB_URL = 'https://raw.githubusercontent.com'
property covid19dh_citation

Return the list of primary sources of COVID-19 Data Hub.

japan(basename='covid_japan.csv', local_file=None, verbose=1)[source]

Load the dataset of the number of cases in Japan. https://github.com/lisphilar/covid19-sir/tree/master/data

Parameters
  • basename (str) – basename of the file to save the data

  • local_file (str or None) – if not None, load the data from this file

  • verbose (int) – level of verbosity

Returns

dataset at country level in Japan

Return type

covsirphy.CountryData

jhu(basename='covid19dh.csv', local_file=None, verbose=1)[source]

Load the dataset regarding the number of cases using local CSV file or COVID-19 Data Hub.

Parameters
  • basename (str or None) – basename of the file to save the data

  • local_file (str or None) – if not None, load the data from this file

  • verbose (int) – level of verbosity

Note

If @verbose is 2, detailed citation list will be shown when downloading. If @verbose is 1, how to show the list will be explained. Citation of COVID-19 Data Hub will be set as JHUData.citation.

Returns

dataset regarding the number of cases

Return type

covsirphy.JHUData

linelist(basename='linelist.csv', verbose=1)[source]

Load linelist of case reports. https://github.com/beoutbreakprepared/nCoV2019

Parameters
  • basename (str) – basename of the file to save the data

  • verbose (int) – level of verbosity

Returns

dataset at country level in Japan

Return type

covsirphy.CountryData

oxcgrt(basename='covid19dh.csv', local_file=None, verbose=1)[source]

Load the dataset regarding OxCGRT data using local CSV file or COVID-19 Data Hub.

Parameters
  • basename (str or None) – basename of the file to save the data

  • local_file (str or None) – if not None, load the data from this file

  • verbose (int) – level of verbosity

Note

If @verbose is 2, detailed citation list will be shown when downloading. If @verbose is 1, how to show the list will be explained. Citation of COVID-19 Data Hub will be set as OxCGRTData.citation.

Returns

dataset regarding OxCGRT data

Return type

covsirphy.JHUData

pcr(basename='covid19dh.csv', local_file=None, basename_owid='ourworldindata_pcr.csv', verbose=1)[source]

Load the dataset regarding the number of tests and confirmed cases, using local CSV file or COVID-19 Data Hub.

Parameters
  • basename (str or None) – basename of the file to save “COVID-19 Data Hub” data

  • local_file (str or None) – if not None, load the data from this file

  • basename_owid (str) – basename of the file to save “Our World In Data” data

  • verbose (int) – level of verbosity

Note

If @verbose is 2, detailed citation list will be shown when downloading. If @verbose is 1, how to show the list will be explained. Citation of COVID-19 Data Hub will be set as JHUData.citation.

Returns

dataset regarding the number of tests and confirmed cases

Return type

covsirphy.PCRData

population(basename='covid19dh.csv', local_file=None, verbose=1)[source]

Load the dataset regarding population values using local CSV file or COVID-19 Data Hub.

Parameters
  • basename (str or None) – basename of the file to save the data

  • local_file (str or None) – if not None, load the data from this file

  • verbose (int) – level of verbosity

Note

If @verbose is 2, detailed citation list will be shown when downloading. If @verbose is 1, how to show the list will be explained. Citation of COVID-19 Data Hub will be set as PopulationData.citation.

Returns

dataset regarding population values

Return type

covsirphy.PopulationData

pyramid(basename='wbdata_population_pyramid.csv', verbose=1)[source]

Load the dataset regarding population pyramid. World Bank Group (2020), World Bank Open Data, https://data.worldbank.org/

Parameters
  • basename (str) – basename of the file to save the data

  • verbose (int) – level of verbosity

Returns

dataset regarding population pyramid

Return type

covsirphy.PopulationPyramidData

vaccine(basename='ourworldindata_vaccine.csv', verbose=1)[source]

Load the dataset regarding vaccination. https://github.com/owid/covid-19-data/tree/master/public/data https://ourworldindata.org/coronavirus

Parameters
  • basename (str) – basename of the file to save the data

  • verbose (int) – level of verbosity

Returns

dataset regarding vaccines

Return type

covsirphy.VaccineData

class Estimator(**kwargs)[source]

Bases: covsirphy.util.term.Term

Hyperparameter optimization of an ODE model.

Parameters
  • record_df (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Confirmed (int): the number of confirmed cases

    • Infected (int): the number of currently infected cases

    • Fatal (int): the number of fatal cases

    • Recovered (int): the number of recovered cases

    • any other columns will be ignored

  • model (covsirphy.ModelBase) – ODE model

  • population (int) – total population in the place

  • tau (int) – tau value [min], a divisor of 1440

  • kwargs – parameter values of the model and data subseting

accuracy(show_figure=True, filename=None)[source]

Show accuracy as a figure.

Parameters
  • show_figure (bool) – if True, show the result as a figure

  • filename (str) – filename of the figure, or None (show figure)

Returns

Index

t (int): Elapsed time divided by tau value [-]

Columns
  • columns with “_actual”

  • columns with “_predicted:

  • columns are defined by self.variables

Return type

pandas.DataFrame

history(show_figure=True, filename=None)[source]

Return dataframe to show the history of optimization.

Parameters
  • show_figure (bool) – if True, show the history as a pair-plot of parameters.

  • filename (str) – filename of the figure, or None (show figure)

Returns

the history

Return type

pandas.DataFrame

run(timeout=180, reset_n_max=3, timeout_iteration=5, tail_n=4, allowance=(0.99, 1.01), seed=0, pruner='threshold', upper=0.5, percentile=50, metric=None, metrics='RMSLE', **kwargs)[source]

Run optimization. If the result satisfied the following conditions, optimization ends. - Score did not change in the last @tail_n iterations. - Monotonic increasing variables increases monotonically. - Predicted values are in the allowance when each actual value shows max value.

Parameters
  • timeout (int) – timeout of optimization

  • reset_n_max (int) – if study was reset @reset_n_max times, will not be reset anymore

  • timeout_iteration (int) – time-out of one iteration

  • tail_n (int) – the number of iterations to decide whether score did not change for the last iterations

  • allowance (tuple(float, float)) – the allowance of the predicted value

  • seed (int or None) – random seed of hyperparameter optimization

  • pruner (str) – hyperband, median, threshold or percentile

  • upper (float) – works for “threshold” pruner, intermediate score is larger than this value, it prunes

  • percentile (float) – works for “Percentile” pruner, the best intermediate value is in the bottom percentile among trials, it prunes

  • metric (str or None) – metric name or None (use @metrics)

  • metrics (str) – alias of @metric

  • kwargs – keyword arguments of ModelBase.param_range()

Note

@n_jobs was obsoleted because this does not work effectively in Optuna.

Note

Please refer to covsirphy.Evaluator.score() for metric names

to_dict()[source]

Summarize the results of optimization.

Returns

  • (parameters of the model)

  • tau

  • Rt: basic or phase-dependent reproduction number

  • (dimensional parameters [day])

  • {metric name}: score with the metric

  • Trials: the number of trials

  • Runtime: run time of estimation

Return type

dict[str, float or int]

class Evaluator(y_true, y_pred, how='inner', on=None)[source]

Bases: object

Evaluate residual errors.

Parameters
  • y_true (pandas.DataFrame or pandas.Series) – correct target values

  • y_pred (pandas.DataFrame or pandas.Series) – estimated target values

  • how (str) – “all” (use all records) or “inner” (intersection will be used)

  • on (str or list[str] or None) – column names to join on or None (join on index)

Raises

TypeError – un-expected types were used for the arguments

Note

Evaluation with metrics will be done with sklearn.metrics package. https://scikit-learn.org/stable/modules/model_evaluation.html

classmethod best_one(candidate_dict, **kwargs)[source]

Select the best one with scores.

Parameters
  • candidate_dict (dict[object, float]) – scores of candidates

  • kwargs – keyword arguments of Evaluator.smaller_is_better()

Returns

the best one and its score

Return type

tuple(object, float)

classmethod metrics()[source]

Return the list of metric names.

Returns

list of metric names

Return type

list[str]

score(metric=None, metrics='RMSLE')[source]

Calculate score with specified metric.

Parameters
  • metric (str or None) – ME, MAE, MSE, MSLE, MAPE, RMSE, RMSLE, R2 or None (use @metrics)

  • metrics (str) – alias of @metric

Raises
  • UnExpectedValueError – un-expected metric was applied

  • ValueError – ME was selected as metric when the targets have multiple columns

Returns

score with the metric

Return type

float

Note

ME: maximum residual error MAE: mean absolute error MSE: mean square error MSLE: mean squared logarithmic error MAPE: mean absolute percentage error RMSE: root mean squared error RMSLE: root mean squared logarithmic error R2: the coefficient of determination

Note

When @metric is None, @metrics will be used as @metric. Default value is “RMSLE”.

classmethod smaller_is_better(metric=None, metrics='RMSLE')[source]

Whether smaller value of the metric is better or not.

Parameters
  • metric (str or None) – ME, MAE, MSE, MSLE, MAPE, RMSE, RMSLE, R2 or None (use @metrics)

  • metrics (str) – alias of @metric

Returns

whether smaller value is better or not

Return type

bool

class ExampleData(clean_df=None, tau=1440, start_date='22Jan2020')[source]

Bases: covsirphy.cleaning.jhu_data.JHUData

Example dataset as a child class of JHUData.

Parameters
  • clean_df (pandas.DataFrame or None) –

    cleaned data

    Index
    • reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Country (pandas.Category): country/region name

    • Province (pandas.Category): province/prefecture/sstate name

    • Confirmed (int): the number of confirmed cases

    • Infected (int): the number of currently infected cases

    • Fatal (int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • tau (int) – tau value [min]

  • start_date (str) – start date, like 22Jan2020

add(model, country=None, province=None, **kwargs)[source]

Add example data. If the country has been registered, the start date will be the next data of the registered records.

Parameters
  • model (cs.ModelBase) – the first ODE model

  • country (str or None) – country name

  • province (str or None) – province name

  • kwargs – the other keyword arguments of ODESimulator.add()

Note

If country is None, the name of the model will be used. If province is None, ‘-‘ will be used.

non_dim(model=None, country=None, province=None)[source]

Return non-dimensional data.

Parameters
  • model (cs.ModelBase or None) – the first ODE model

  • country (str or None) – country name

  • province (str or None) – province name

Returns

Index

t: Dates divided by tau value (time steps)

Columns
  • (int) variables of the model

Return type

pandas.DataFrame

Note

If country is None, the name of the model will be used. If province is None, ‘-‘ will be used.

records(**kwargs)[source]

This is the same as ExampleData.subset(). Complement will not be done.

specialized(model=None, country=None, province=None)[source]

Return dimensional records with model variables.

Parameters
  • model (cs.ModelBase or None) – the first ODE model

  • country (str or None) – country name

  • province (str or None) – province name

Returns

Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • (int) variables of the model

Return type

pandas.DataFrame

Note

If country is None, the name of the model will be used. If province is None, ‘-‘ will be used.

subset(model=None, country=None, province=None, **kwargs)[source]

Return the subset of dataset.

Parameters
  • model (cs.ModelBase or None) – the first ODE model

  • country (str or None) – country name

  • province (str or None) – province name

  • kwargs – other keyword arguments of JHUData.subset()

Returns

(pandas.DataFrame)
Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases (> 0)

  • Susceptible (int): the number of susceptible cases, if calculated

Note

If country is None, the name of the model will be used. If province is None, ‘-‘ will be used. If @population is not None, the number of susceptible cases will be calculated. Records with Recovered > 0 will be selected.

subset_complement(**kwargs)[source]

This is the same as ExampleData.subset(). Complement will not be done.

class Filer(directory, prefix=None, suffix=None, numbering=None)[source]

Bases: object

Produce filenames and manage files.

Parameters
  • directory (str) – top directory name

  • prefix (str or None) – prefix of the filenames or None (no prefix)

  • suffix (str or None) – suffix of the filenames or None (no suffix)

  • numbering (str or None) – “001”, “01”, “1” or None (no numbering)

Examples

>>> import covsirphy as cs
>>> filer = cs.Filer(directory="output", prefix="jpn", suffix=None, numbering="01")
>>> filer.png("records")
{"filename": "<absolute path>/output/jpn_01_records.png"}
>>> filer.jpg("records")
{"filename": "<absolute path>/output/jpn_01_records.jpg"}
>>> filer.json("backup")
{"filename": "<absolute path>/output/jpn_01_backup.json"}
>>> filer.csv("records", index=True)
{"path_or_buf": "<absolute path>/output/jpn_01_records.csv", index: True}
csv(title, **kwargs)[source]

Create CSV filename and register it.

Parameters
  • title (str) – title of the filename, like ‘records’

  • kwargs – keyword arguments to be included in the output

Returns

absolute filename (key: ‘path_or_buf’) and kwargs

Return type

dict[str, str]

files(ext=None)[source]

List-up filenames.

Parameters

ext (str or None) – file extension or None (all)

Returns

list of files

Return type

list[str]

jpg(title, **kwargs)[source]

Create JPG filename and register it.

Parameters
  • title (str) – title of the filename, like ‘records’

  • kwargs – keyword arguments to be included in the output

Returns

absolute filename (key: ‘filename’) and kwargs

Return type

dict[str, str]

json(title, **kwargs)[source]

Create JSON filename and register it.

Parameters
  • title (str) – title of the filename, like ‘records’

  • kwargs – keyword arguments to be included in the output

Returns

absolute filename (key: ‘filename’) and kwargs

Return type

dict[str, str]

png(title, **kwargs)[source]

Create PNG filename and register it.

Parameters
  • title (str) – title of the filename, like ‘records’

  • kwargs – keyword arguments to be included in the output

Returns

absolute filename (key: ‘filename’) and kwargs

Return type

dict[str, str]

class JHUData(filename=None, data=None, citation=None)[source]

Bases: covsirphy.cleaning.cbase.CleaningBase

Data cleaning of JHU-style dataset.

Parameters
  • filename (str or None) – CSV filename of the dataset

  • data (pandas.DataFrame or None) –

    Index

    reset index

    Columns
    • Date: Observation date

    • ISO3: ISO3 code (optional)

    • Country: country/region name

    • Province: province/prefecture/state name

    • Confirmed: the number of confirmed cases

    • Fatal: the number of fatal cases

    • Recovered: the number of recovered cases

    • Population: population values (optional)

  • citation (str or None) – citation or None (empty)

Note

Either @filename (high priority) or @data must be specified.

CLEANED_COLS = ['Date', 'Country', 'Province', 'Confirmed', 'Infected', 'Fatal', 'Recovered']
OPTINAL_COLS = ['ISO3', 'Population']
RAW_COLS = ['Date', 'ISO3', 'Country', 'Province', 'Confirmed', 'Infected', 'Fatal', 'Recovered', 'Population']
REQUIRED_COLS = ['Date', 'Country', 'Province', 'Confirmed', 'Fatal', 'Recovered']
SUBSET_COLS = ['Date', 'Confirmed', 'Infected', 'Fatal', 'Recovered', 'Susceptible']
calculate_closing_period(**kwargs)
calculate_recovery_period()[source]

Calculate the median value of recovery period of all countries where recovered values are reported.

Returns

recovery period [days]

Return type

int

Note

If no records we can use for calculation were registered, 17 [days] will be applied.

cleaned(**kwargs)[source]

Return the cleaned dataset.

Parameters

kwargs – keword arguments will be ignored.

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pandas.Timestamp): Observation date

  • Country (pandas.Category): country/region name

  • Province (pandas.Category): province/prefecture/state name

  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

countries(complement=True, **kwargs)[source]

Return names of countries where records.

Parameters
  • complement (bool) – whether say OK for complement or not

  • interval (int) – expected update interval of the number of recovered cases [days]

  • kwargs – the other keyword arguments of JHUData.subset_complement()

Returns

list of country names

Return type

list[str]

classmethod from_dataframe(dataframe, directory='input')[source]

Create JHUData instance using a pandas dataframe.

Parameters
  • dataframe (pd.DataFrame) –

    cleaned dataset Index

    reset index

    Columns
    • Date: Observation date

    • ISO3: ISO3 code (optional)

    • Country: country/region name

    • Province: province/prefecture/state name

    • Confirmed: the number of confirmed cases

    • Infected: the number of currently infected cases

    • Fatal: the number of fatal cases

    • Recovered: the number of recovered cases

    • Popupation: population values (optional)

  • directory (str) – directory to save geometry information (for .map() method)

Returns

JHU-style dataset

Return type

covsirphy.JHUData

map(country=None, variable='Confirmed', date=None, **kwargs)[source]

Create global colored map to show the values.

Parameters
  • country (str or None) – country name or None (global map)

  • variable (str) – variable name to show

  • date (str or None) – date of the records or None (the last value)

  • kwargs – arguments of ColoredMap() and ColoredMap.plot()

Note

When @country is None, country level data will be shown on global map. When @country is a country name, province level data will be shown on country map.

records(country, province=None, start_date=None, end_date=None, population=None, auto_complement=True, **kwargs)[source]

JHU-style dataset for the area from the start date to the end date. Records with Recovered > 0 will be selected.

Parameters
  • country (str) – country name or ISO3 code

  • province (str or None) – province name

  • start_date (str or None) – start date, like 22Jan2020

  • end_date (str or None) – end date, like 01Feb2020

  • population (int or None) – population value

  • auto_complement (bool) – if True and necessary, the number of cases will be complemented

  • kwargs – the other arguments of JHUData.subset_complement()

Raises

SubsetNotFoundError – failed in subsetting because of lack of data

Returns

pandas.DataFrame:

Index

reset index

Columns
  • Date(pd.Timestamp): Observation date

  • Confirmed(int): the number of confirmed cases

  • Infected(int): the number of currently infected cases

  • Fatal(int): the number of fatal cases

  • Recovered (int): the number of recovered cases ( > 0)

  • Susceptible(int): the number of susceptible cases, if calculated

str or bool: kind of complement or False

Return type

tuple(pandas.DataFrame, bool)

Note

If @population (high priority) is not None or population values are registered in subset, the number of susceptible cases will be calculated.

Note

If necessary and @auto_complement is True, complement recovered data.

property recovery_period

expected value of recovery period [days]

Type

int

replace(country_data)[source]

Replace a part of cleaned dataset with a dataframe.

Parameters

country_data (covsirphy.CountryData) – dataset object of the country

Returns

self

Return type

covsirphy.JHUData

Note

Citation of the country data will be added to ‘JHUData.citation’ description.

show_complement(country=None, province=None, start_date=None, end_date=None, **kwargs)[source]

To monitor effectivity and safety of complement on JHU subset, we need to know what kind of complement was done for JHU subset for each country (if country/countries specified) or for all countries.

Parameters
  • country (str or list[str] or None) – country/countries name or None (all countries)

  • province (str or None) – province name

  • start_date (str or None) – start date, like 22Jan2020

  • end_date (str or None) – end date, like 01Feb2020

  • kwargs – keyword arguments of JHUDataComplementHandler(), control factors of complement

Raises
  • ValueError – @province was specified when @country is not a string

  • covsirphy.SubsetNotFoundError – No records were registered for the area/dates

Returns

pandas.DataFrame

Index

reset index

Columns
  • country (str): country name

  • province (str): province name

  • Monotonic_confirmed (bool): True if applied for confirmed cases or False otherwise

  • Monotonic_fatal (bool): True if applied for fatal cases or False otherwise

  • Monotonic_recovered (bool): True if applied for recovered or False otherwise

  • Full_recovered (bool): True if applied for recovered or False otherwise

  • Partial_recovered (bool): True if applied for recovered or False otherwise

subset(country, province=None, start_date=None, end_date=None, population=None)[source]

Return the subset of dataset with Recovered > 0.

Parameters
  • country (str) – country name or ISO3 code

  • province (str or None) – province name

  • start_date (str or None) – start date, like 22Jan2020

  • end_date (str or None) – end date, like 01Feb2020

  • population (int or None) – population value

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases (> 0)

  • Susceptible (int): the number of susceptible cases, if calculated

Note

If @population (high priority) is not None or population values are registered in subset, the number of susceptible cases will be calculated.

subset_complement(country, province=None, start_date=None, end_date=None, population=None, **kwargs)[source]

Return the subset of dataset and complement recovered data, if necessary. Records with Recovered > 0 will be selected.

Parameters
  • country (str) – country name or ISO3 code

  • province (str or None) – province name

  • start_date (str or None) – start date, like 22Jan2020

  • end_date (str or None) – end date, like 01Feb2020

  • population (int or None) – population value

  • kwargs – keyword arguments of JHUDataComplementHandler(), control factors of complement

Returns

pandas.DataFrame:
Index

reset index

Columns
  • Date(pd.Timestamp): Observation date

  • Confirmed(int): the number of confirmed cases

  • Infected(int): the number of currently infected cases

  • Fatal(int): the number of fatal cases

  • Recovered (int): the number of recovered cases ( > 0)

  • Susceptible(int): the number of susceptible cases, if calculated

str or bool: kind of complement or False

Return type

tuple(pandas.DataFrame, str or bool)

Note

If @population (high priority) is not None or population values are registered in subset, the number of susceptible cases will be calculated.

to_sr(**kwargs)
total()[source]

Calculate total number of cases and rates.

Returns

group-by Date, sum of the values

Index

Date (pandas.Timestamp): Observation date

Columns
  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

  • Fatal per Confirmed (int)

  • Recovered per Confirmed (int)

  • Fatal per (Fatal or Recovered) (int)

Return type

pandas.DataFrame

class JHUDataComplementHandler(recovery_period, interval=2, max_ignored=100, max_ending_unupdated=14, upper_limit_days=90, lower_limit_days=7, upper_percentage=0.5, lower_percentage=0.5)[source]

Bases: covsirphy.util.term.Term

Complement JHU dataset, if necessary.

Parameters
  • recovery_period (int) – expected value of recovery period [days]

  • interval (int) – expected update interval of the number of recovered cases [days]

  • max_ignored (int) – Max number of recovered cases to be ignored [cases]

  • max_ending_unupdated (int) – Max number of days to apply full complement, where max recovered cases are not updated [days]

  • upper_limit_days (int) – maximum number of valid partial recovery periods [days]

  • lower_limit_days (int) – minimum number of valid partial recovery periods [days]

  • upper_percentage (float) – fraction of partial recovery periods with value greater than upper_limit_days

  • lower_percentage (float) – fraction of partial recovery periods with value less than lower_limit_days

Note

To add new complement solutions, we need to update cls.STATUS_NAME_DICT and self._protocol(). Status names with high socres will be prioritized when status code will be determined. Status code: ‘fully complemented recovered data’ and so on as noted in self.run() docstring.

FULL_RECOVERED = 'Full_recovered'
MONOTONIC_CONFIRMED = 'Monotonic_confirmed'
MONOTONIC_FATAL = 'Monotonic_fatal'
MONOTONIC_RECOVERED = 'Monotonic_recovered'
PARTIAL_RECOVERED = 'Partial_recovered'
RAW_COLS = ['Confirmed', 'Fatal', 'Recovered']
RECOVERED_COLS = ['Monotonic_recovered', 'Full_recovered', 'Partial_recovered']
SHOW_COMPLEMENT_FULL_COLS = ['Monotonic_confirmed', 'Monotonic_fatal', 'Monotonic_recovered', 'Full_recovered', 'Partial_recovered']
STATUS_NAME_DICT = {1: 'sorting', 2: 'monotonic increasing', 3: 'partially', 4: 'fully'}
run(subset_df)[source]

Perform complement.

Parameters

subset_df (pandas.DataFrame) –

Subset of records

Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Confirmed (int): the number of confirmed cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

  • The other columns will be ignored

Returns

pandas.DataFrame

Index

reset index

Columns
  • Date(pd.Timestamp): Observation date

  • Confirmed(int): the number of confirmed cases

  • Infected(int): the number of currently infected cases

  • Fatal(int): the number of fatal cases

  • Recovered (int): the number of recovered cases

str: status code dict: status for each complement type

Return type

tuple(pandas.DataFrame, str, dict)

Note

Status code will be selected from: - ‘’ (not complemented) - ‘monotonic increasing complemented confirmed data’ - ‘monotonic increasing complemented fatal data’ - ‘monotonic increasing complemented recovered data’ - ‘fully complemented recovered data’ - ‘partially complemented recovered data’

class JapanData(filename, force=False, verbose=1)[source]

Bases: covsirphy.cleaning.country_data.CountryData

Japan-specific dataset.

Parameters
  • filename (str or pathlib.path) – CSV filename to save the raw dataset

  • force (bool) – if True, always download the dataset from the server

  • verbose (int) – level of verbosity

Note

Columns of JapanData.cleaned():
  • Date (pandas.TimeStamp): date

  • Country (pandas.Category): ‘Japan’

  • Province (pandas.Category): ‘-‘ (country level), ‘Entering’ or province names

  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

  • Tests (int): the number of tested persons

  • Moderate (int): the number of cases who requires hospitalization but not severe

  • Severe (int): the number of severe cases

  • Vaccinations (int): cumulative number of vaccinations

  • Vaccinated_once (int): cumulative number of people who received at least one vaccine dose

  • Vaccinated_full (int): cumulative number of people who received all doses prescrived by the protocol

GITHUB_URL = 'https://raw.githubusercontent.com'
JAPAN_COLS = ['Date', 'Country', 'Province', 'Confirmed', 'Infected', 'Fatal', 'Recovered', 'Tests', 'Moderate', 'Severe', 'Vaccinations', 'Vaccinated_once', 'Vaccinated_full']
JAPAN_META_CAT = ['Prefecture', 'Admin_Capital', 'Admin_Region']
JAPAN_META_COLS = ['Prefecture', 'Admin_Capital', 'Admin_Region', 'Admin_Num', 'Area_Habitable', 'Area_Total', 'Clinic_bed_Care', 'Clinic_bed_Total', 'Hospital_bed_Care', 'Hospital_bed_Specific', 'Hospital_bed_Total', 'Hospital_bed_Tuberculosis', 'Hospital_bed_Type-I', 'Hospital_bed_Type-II', 'Population_Female', 'Population_Male', 'Population_Total', 'Location_Latitude', 'Location_Longitude']
JAPAN_META_FLT = ['Location_Latitude', 'Location_Longitude']
JAPAN_META_INT = ['Admin_Num', 'Area_Habitable', 'Area_Total', 'Clinic_bed_Care', 'Clinic_bed_Total', 'Hospital_bed_Care', 'Hospital_bed_Specific', 'Hospital_bed_Total', 'Hospital_bed_Tuberculosis', 'Hospital_bed_Type-I', 'Hospital_bed_Type-II', 'Population_Female', 'Population_Male', 'Population_Total']
JAPAN_VALUE_COLS = ['Confirmed', 'Infected', 'Fatal', 'Recovered', 'Tests', 'Moderate', 'Severe', 'Vaccinations', 'Vaccinated_once', 'Vaccinated_full']
MODERATE = 'Moderate'
SEVERE = 'Severe'
URL_C = 'https://raw.githubusercontent.com/lisphilar/covid19-sir/master/data/japan/covid_jpn_total.csv'
URL_M = 'https://raw.githubusercontent.com/lisphilar/covid19-sir/master/data/japan/covid_jpn_metadata.csv'
URL_P = 'https://raw.githubusercontent.com/lisphilar/covid19-sir/master/data/japan/covid_jpn_prefecture.csv'
meta(basename='covid_japan_metadata.csv', cleaned=True, force=False)[source]

Return metadata of Japan-specific dataset.

Parameters
  • basename (str) – basename of the CSV file to save the raw dataset

  • cleaned (bool) – return cleaned (True) or raw (False) dataset

  • force (bool) – if True, always download the dataset from the server

Returns

(cleaned or raw) dataset
Index

reset index

Columns for cleaned dataset,
  • Prefecture (pandas.Category)

  • Admin_Capital (pandas.Category)

  • Admin_Region (pandas.Category)

  • Admin_Num (int)

  • Area_Habitable (int)

  • Area_Total (int)

  • Clinic_bed_Care (int)

  • Clinic_bed_Total (int)

  • Hospital_bed_Care (int)

  • Hospital_bed_Specific (int)

  • Hospital_bed_Total (int)

  • Hospital_bed_Tuberculosis (int)

  • Hospital_bed_Type-I (int)

  • Hospital_bed_Type-II (int)

  • Population_Female (int)

  • Population_Male (int)

  • Population_Total (int)

  • Location_Latitude (float)

  • Location_Longitude (float)

Return type

pandas.DataFrame

set_variables()[source]

Set the correspondence of the variables and columns of the raw data.

Parameters
  • date (str) – column name for Date

  • confirmed (str) – column name for Confirmed

  • fatal (str) – column name for Fatal

  • recovered (str) – column name for Confirmed

  • province (str) – (optional) column name for Province

class LinePlot(filename=None, bbox_inches='tight', **kwargs)[source]

Bases: covsirphy.visualization.vbase.VisualizeBase

Create a line plot.

Parameters
  • filename (str or None) – filename to save the figure or None (display)

  • bbox_inches (str) – bounding box in inches when creating the figure

  • kwargs – the other arguments of matplotlib.pyplot.savefig()

line(v=None, h=None, color='black', linestyle=':')[source]

Show vertical/horizontal lines.

Parameters
  • v (list/tuple[int/float] or None) – list of x values of vertical lines or None

  • h (list/tuple[int/float] or None) – list of y values of horizontal lines or None

  • color (str) – color of the line

  • linestyle (str) – linestyle

plot(data, colormap=None, color_dict=None, **kwargs)[source]

Plot chronological change of the data.

Parameters
  • data (pandas.DataFrame or pandas.Series) –

    data to show Index

    x values

    Columns

    y variables to show

  • colormap (str, matplotlib colormap object or None) – colormap, please refer to https://matplotlib.org/examples/color/colormaps_reference.html

  • color_dict (dict[str, str] or None) – dictionary of column names (keys) and colors (values)

  • kwargs – keyword arguments of pandas.DataFrame.plot()

x_axis(xlabel=None, x_logscale=False, xlim=(None, None))[source]

Set x axis.

Parameters
  • xlabel (str or None) – x-label

  • x_logscale (bool) – whether use log-scale in x-axis or not

  • xlim (tuple(int or float, int or float)) – limit of x dimain

Note

If None is included in xlim, the values will be automatically determined by Matplotlib

y_axis(ylabel='Cases', y_logscale=False, ylim=(0, None), math_scale=True, y_integer=False)[source]

Set x axis.

Parameters
  • ylabel (str or None) – y-label

  • y_logscale (bool) – whether use log-scale in y-axis or not

  • ylim (tuple(int or float, int or float)) – limit of y dimain

  • math_scale (bool) – whether use LaTEX or not in y-label

  • y_integer (bool) – whether force to show the values as integer or not

Note

If None is included in ylim, the values will be automatically determined by Matplotlib

class LinelistData(filename, force=False, verbose=1)[source]

Bases: covsirphy.cleaning.cbase.CleaningBase

Linelist of case reports.

Parameters
  • filename (str or pathlib.path) – CSV filename to save the raw dataset

  • force (bool) – if True, always download the dataset from the server

  • verbose (int) – level of verbosity

AGE = 'Age'
CHRONIC = 'Chronic_disease'
CONFIRM_DATE = 'Confirmation_date'
F_DATE = 'Fatal_date'
GITHUB_URL = 'https://raw.githubusercontent.com'
HOSPITAL_DATE = 'Hospitalized_date'
LINELIST_COLS = ['Country', 'Province', 'Hospitalized_date', 'Confirmation_date', 'Outcome_date', 'Confirmed', 'Infected', 'Recovered', 'Fatal', 'Symptoms', 'Chronic_disease', 'Age', 'Sex']
OUTCOME = 'Outcome'
OUTCOME_DATE = 'Outcome_date'
RAW_COL_DICT = {'age': 'Age', 'chronic_disease': 'Chronic_disease', 'country': 'Country', 'date_admission_hospital': 'Hospitalized_date', 'date_confirmation': 'Confirmation_date', 'date_death_or_discharge': 'Outcome_date', 'outcome': 'Outcome', 'province': 'Province', 'sex': 'Sex', 'symptoms': 'Symptoms'}
R_DATE = 'Recovered_date'
SEX = 'Sex'
SYMPTOM = 'Symptoms'
URL = 'https://raw.githubusercontent.com/beoutbreakprepared/nCoV2019/master/latest_data/latestdata.tar.gz'
closed(outcome='Recovered')[source]

Return subset of global outcome data (recovered/fatal).

Parameters

outcome (str) – ‘Recovered’ or ‘Fatal’

Returns

pandas.DataFrame
Index

reset index

Columns
  • Country (pandas.Category): country name

  • Province (pandas.Category): province name or “-“

  • Hospitalized_date (pandas.TimeStamp or NT)

  • Confirmation_date (pandas.TimeStamp)

  • Recovered_date (pandas.TimeStamp): if outcome is Recovered

  • Fatal_date (pandas.TimeStamp): if outcome is Fatal

  • Symptoms (str)

  • Chronic_disease (str)

  • Age (int or None)

  • Sex (str)

layer(**kwargs)[source]

Return the cleaned data at the selected layer.

Parameters

country (str or None) – country name or None (country level data or country-specific dataset)

Returns

Index

reset index

Columns - Country (str): country names - Province (str): province names (or removed when country level data) - any other columns of the cleaned data

Return type

pandas.DataFrame

Raises
  • SubsetNotFoundError – no records were found for the country (when @country is not None)

  • KeyError – @country was None, but country names were not registered in the dataset

Note

When @country is None, country level data will be returned. When @country is a country name, province level data in the selected country will be returned.

property raw

raw dataset

Type

pandas.DataFrame

recovery_period()[source]

Calculate median value of recovery period (from confirmation to recovery).

Returns

recovery period [days]

Return type

int

subset(country, province=None)[source]

Return subset of the country/province.

Parameters
  • country (str) – country name or ISO3 code

  • province (str or None) – province name

Returns

pandas.DataFrame
Index

reset index

Columns
  • Hospitalized_date (pandas.TimeStamp or NT)

  • Confirmation_date (pandas.TimeStamp or NT)

  • Outcome_date (pandas.TimeStamp or NT)

  • Confirmed (bool)

  • Infected (bool)

  • Recovered (bool)

  • Fatal (bool)

  • Symtoms (str)

  • Chronic_disease (str)

  • Age (int or None)

  • Sex (str)

total()[source]

This is not defined for this child class.

class MPEstimator(**kwargs)[source]

Bases: covsirphy.util.term.Term

Perform multiprocessing of Phaseunit.estimate()

Parameters
  • model (covsirphy.ModelBase or None) – ODE model

  • jhu_data (covsirphy.JHUData) – object of records

  • population_data (covsirphy.PopulationData) – PopulationData object

  • record_df (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Confirmed (int): the number of confirmed cases

    • Infected (int): the number of currently infected cases

    • Fatal (int): the number of fatal cases

    • Recovered (int): the number of recovered cases

    • any other columns will be ignored

  • tau (int or None) – tau value [min], a divisor of 1440

  • kwargs – keyword arguments of model parameters

Note

When @record_df is None, @jhu_data and @population_data must be specified.

add(units)[source]

Register PhaseUnits.

Parameters

units (list[covsirphy.PhaseUnit]) – list of phases

Returns

self

Return type

covsirphy.MPEstimator

run(n_jobs=- 1, **kwargs)[source]

Run estimation.

Parameters
  • n_jobs (int) – the number of parallel jobs or -1 (CPU count)

  • kwargs – keyword arguments of model parameters and covsirphy.Estimator.run()

Returns

list[covsirphy.PhaseUnit]

property tau

tau value [min]

Type

int or None

class ModelBase(population)[source]

Bases: covsirphy.util.term.Term

Base class of ODE models.

DAY_PARAMETERS = []
EXAMPLE = {'param_dict': {}, 'population': 1000000, 'step_n': 180, 'y0_dict': {}}
NAME = 'ModelBase'
PARAMETERS = []
VARIABLES = []
VARS_INCLEASE = []
VAR_DICT = {}
WEIGHTS = array([], dtype=float64)
calc_days_dict(tau)[source]

Calculate 1/beta [day] etc. This method should be overwritten in subclass.

Parameters

tau (param) – tau value [min]

Returns

dict[str, int]

calc_r0()[source]

Calculate (basic) reproduction number. This method should be overwritten in subclass.

Returns

float

classmethod convert(data, tau)[source]

Divide dates by tau value [min] and convert variables to model-specialized variables. This will be overwitten by child classes.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Susceptible(int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • tau (int) – tau value [min] or None (skip division by tau values)

Returns

Index

time steps: Dates divided by tau value

Columns
  • model-specialized variables

Return type

pandas.DataFrame

classmethod convert_reverse(converted_df, start, tau)[source]

Calculate date with tau and start date, and restore Susceptible/Infected/”Fatal or Recovered”. This will be overwitten by child classes.

Parameters
  • converted_df (pandas.DataFrame) –

    Index

    time steps: Dates divided by tau value

    Columns
    • model-specialized variables

  • start (pd.Timestamp) – start date of simulation, like 14Apr2021

  • tau (int) – tau value [min]

Returns

Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Susceptible(int): the number of susceptible cases

  • Infected (int): the number of currently infected cases

  • Fatal(int): the number of fatal cases

  • Recovered (int): the number of recovered cases

Return type

pandas.DataFrame

classmethod guess(data, tau, q=0.5)[source]

With (X, dX/dt) for X=S, I, R and so on, guess parameter values. This will be overwitten by child classes.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Susceptible(int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • tau (int) – tau value [min]

  • q (float or tuple(float,)) – the quantile(s) to compute, value(s) between (0, 1)

Returns

guessed parameter values with the quantile(s)

Return type

dict(str, float or pandas.Series)

classmethod param_range(*args, **kwargs)
classmethod restore(*args, **kwargs)
classmethod specialize(*args, **kwargs)
classmethod tau_free(*args, **kwargs)
class ModelValidator(tau=1440, n_trials=8, step_n=None, seed=0)[source]

Bases: covsirphy.util.term.Term

Evaluate ODE models performance as follows. 1. Select model parameter sets randomly 2. Set user-defined/random phase duration 3. Perform simulation with a specified ODE model 4. Perform parameter estimation 5. Compare the estimated parameters and the parameters produced with th 1st step 6. Repeat trials (1 trial = from the 1st step to the 5th step) Small difference is expected in the 6th step.

Parameters
  • tau (int) – tau value [min]

  • n_trials (int) – the number of trials

  • step_n (int or None) – the number of steps in simulation (over 2) or None (randomly selected)

  • seed (int) – random seed

Note

Population value and initial values are defined by model.EXAMPLE. Estimators know tau values before parameter estimation.

run(model, timeout=180, allowance=(0.98, 1.02), n_jobs=- 1)[source]

Execute model validation.

Parameters
  • model (covsirphy.ModelBase) – ODE model

  • timeout (int) – time-out of run

  • allowance (tuple(float, float)) – the allowance of the predicted value

  • n_jobs (int) – the number of parallel jobs or -1 (CPU count)

Returns

self

Return type

covsirphy.ModelValidator

summary()[source]

Show the summary of validation.

Returns

Index

reset index

Columns
  • ID (str): ID, like SIR_0

  • ODE (str): model name

  • Rt (float): reproduction number set by ._setup() method

  • Rt_est (float): estimated reproduction number

  • rho etc. (float): parameter values set by ._setup() method

  • rho_est etc. (float): estimated parameter values

  • step_n (int): step number of simulation

  • RMSLE (float): RMSLE score of parameter estimation

  • Trials (int): the number of trials in parameter estimation

  • Runtime (str): runtime of parameter estimation

Return type

pandas.DataFrame

exception NotInteractiveError(message=None)[source]

Bases: ValueError

Error when interactive shell is not used but forced to use it.

Parameters

message (str or None) – the other messages

exception NotRegisteredExtraError(method_name, message=None)[source]

Bases: covsirphy.util.error.UnExecutedError

Error when extra datasets were not registered.

exception NotRegisteredMainError(method_name, message=None)[source]

Bases: covsirphy.util.error.UnExecutedError

Error when main datasets were not registered.

class ODEHandler(model, first_date, tau=None, metric='RMSLE', n_jobs=- 1)[source]

Bases: covsirphy.util.term.Term

Perform simulation and parameter estimation with a multi-phased ODE model.

Parameters
  • model (covsirphy.ModelBase) – ODE model

  • first_date (str or pandas.Timestamp) – the first date of simulation, like 14Apr2021

  • tau (int or None) – tau value [min] or None (to be estimated)

  • metric (str) – metric name for estimation

  • n_jobs (int) – the number of parallel jobs or -1 (CPU count)

add(end_date, param_dict=None, y0_dict=None)[source]

Add a new phase.

Parameters
  • end_date (str or pandas.Timestamp) – end date of the phase

  • param_dict (dict[str, float] or None) – parameter values or None (not set)

  • y0_dict (dict[str, int] or None) – initial values or None (not set)

Returns

setting of the phase (key: phase name)
  • Start (pandas.Timestamp): start date

  • End (pandas.Timestamp): end date

  • y0 (dict[str, int]): initial values of model-specialized variables or empty dict

  • param (dict[str, float]): parameter values or empty dict

Return type

dict(str, object)

estimate(data, **kwargs)[source]

Estimate tau value [min] and ODE parameter values.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pandas.Timestamp): Observation date

    • Susceptible (int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • kwargs – keyword arguments of ODEHander.estimate_tau() and ODEHander.estimate_param()

Raises

covsirphy.UnExecutedError – phase information was not set

Returns

tuple(int, dict(str, dict[str, object]))
  • int: tau value [min]

  • dict(str, object): setting of the phase (key: phase name)
    • Start (pandas.Timestamp): start date

    • End (pandas.Timestamp): end date

    • Rt (float): phase-dependent reproduction number

    • (str, float): estimated parameter values, including rho

    • (int or float): day parameters, including 1/beta [days]

    • {metric}: score with the estimated parameter values

    • Trials (int): the number of trials

    • Runtime (str): runtime of optimization

estimate_params(data, quantiles=(0.1, 0.9), check_dict=None, study_dict=None, **kwargs)[source]

Estimate ODE parameter values of the all phases to minimize the score of the metric.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pandas.Timestamp): Observation date

    • Susceptible (int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal (int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • quantiles (tuple(int, int)) – quantiles to cut parameter range, like confidence interval

  • check_dict (dict[str, object] or None) – setting of validation - None means {“timeout”: 180, “timeout_interation”: 5, “tail_n”: 4, “allowance”: (0.99, 1.01)} - timeout (int): timeout of optimization - timeout_iteration (int): timeout of one iteration - tail_n (int): the number of iterations to decide whether score did not change for the last iterations - allowance (tuple(float, float)): the allowance of the max predicted values

  • study_dict (dict[str, object] or None) – setting of optimization study - None means {“pruner”: “threshold”, “upper”: 0.5, “percentile”: 50, “seed”: 0} - pruner (str): kind of pruner (hyperband, median, threshold or percentile) - upper (float): works for “threshold” pruner, intermediate score is larger than this value, it prunes - percentile (float): works for “Percentile” pruner, the best intermediate value is in the bottom percentile among trials, it prunes

  • kwargs – we can set arguments directly. E.g. timeout=180 for check_dict={“timeout”: 180,…}

Raises

covsirphy.UnExecutedError – either tau value or phase information was not set

Returns

setting of the phase (key: phase name)
  • Start (pandas.Timestamp): start date

  • End (pandas.Timestamp): end date

  • Rt (float): phase-dependent reproduction number

  • (str, float): estimated parameter values, including rho

  • (int or float): day parameters, including 1/beta [days]

  • {metric}: score with the estimated parameter values

  • Trials (int): the number of trials

  • Runtime (str): runtime of optimization

Return type

dict(str, object)

estimate_tau(data, guess_quantile=0.5)[source]

Select tau value [min] which minimize the score of the metric.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Susceptible(int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • guess_quantile (float) – quantile to guess ODE parameter values for the candidates of tau

Returns

estimated tau value [min]

Return type

int

Raises

covsirphy.UnExecutedError – phase information was not set

Note

ODE parameter for each tau value will be guessed by .guess() classmethod of the model. Tau value will be selected from the divisors of 1440 [min] and set to self.

simulate()[source]

Perform simulation with the multi-phased ODE model.

Raises

covsirphy.UnExecutedError – either tau value or phase information was not set

Returns

Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Susceptible (int): the number of susceptible cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

Return type

pandas.DataFrame

class ODESimulator(**kwargs)[source]

Bases: covsirphy.util.term.Term

Simulation of an ODE model for one phase.

Parameters
  • country (str or None) – country name

  • province (str or None) – province name

add(model, step_n, population, param_dict=None, y0_dict=None)[source]

Add models to the simulator.

Parameters
  • model (subclass of cs.ModelBase) – the first ODE model

  • step_n (int) – the number of steps

  • population (int) – population in the place

  • param_dict (dict) –

    • key (str): parameter name

    • value (float): parameter value

    • dictionary of parameter values or None

    • if not include some params, the last values will be used
      • NameError when the model is the first model

      • NameError if new params are included

  • y0_dict (dict) –

    • key (str): variable name

    • value (float): initial value

    • dictionary of dimensional initial values or None

    • None or if not include some variables, the last values will be used
      • NameError when the model is the first model

      • NameError if new variable are included

dim(tau, start_date)[source]

Return the dimensionalized results.

Parameters
  • tau (int) – tau value [min]

  • start_date (str) – start date of the records, like 22Jan2020

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Country (str): country/region name

  • Province (str): province/prefecture/state name

  • variables of the models (int)

non_dim()[source]

Return the non-dimensionalized results.

Returns

Index

reset index

Columns
  • t (int): Elapsed time divided by tau value [-]

  • non-dimensionalized variables of Susceptible etc.

Return type

(pandas.DataFrame)

run(**kwargs)
taufree()[source]

Return tau-free results.

Returns

Index

reset index

Columns
  • t (int): Elapsed time divided by tau value [-]

  • columns with dimensionalized variables

Return type

(pandas.DataFrame)

class Optimizer(**kwargs)[source]

Bases: covsirphy.simulation.estimator.Estimator

This is deprecated. Please use Estimator class.

class OxCGRTData(filename=None, data=None, citation=None)[source]

Bases: covsirphy.cleaning.cbase.CleaningBase

Data cleaning of OxCGRT dataset.

Parameters
  • filename (str or None) – CSV filename of the dataset

  • data (pandas.DataFrame or None) –

    Index

    reset index

    Columns
    • Date: Observation date

    • ISO3: ISO 3166-1 alpha-3, like JPN

    • Country: country/region name

    • School_closing

    • Workplace_closing

    • Cancel_events

    • Gatherings_restrictions

    • Transport_closing

    • Stay_home_restrictions

    • Internal_movement_restrictions

    • International_movement_restrictions

    • Information_campaigns

    • Testing_policy

    • Contact_tracing

    • Stringency_index

  • citation (str or None) – citation or None (empty)

Note

Either @filename (high priority) or @data must be specified.

Note

Policy indices (Overall etc.) are from README.md and documentation/index_methodology.md in https://github.com/OxCGRT/covid-policy-tracker/

CLEANED_COLS = ['Date', 'ISO3', 'Country', 'School_closing', 'Workplace_closing', 'Cancel_events', 'Gatherings_restrictions', 'Transport_closing', 'Stay_home_restrictions', 'Internal_movement_restrictions', 'International_movement_restrictions', 'Information_campaigns', 'Testing_policy', 'Contact_tracing', 'Stringency_index']
OXCGRT_COLS = ['Date', 'ISO3', 'Country', 'School_closing', 'Workplace_closing', 'Cancel_events', 'Gatherings_restrictions', 'Transport_closing', 'Stay_home_restrictions', 'Internal_movement_restrictions', 'International_movement_restrictions', 'Information_campaigns', 'Testing_policy', 'Contact_tracing', 'Stringency_index']
OXCGRT_COLS_WITHOUT_COUNTRY = ['Date', 'School_closing', 'Workplace_closing', 'Cancel_events', 'Gatherings_restrictions', 'Transport_closing', 'Stay_home_restrictions', 'Internal_movement_restrictions', 'International_movement_restrictions', 'Information_campaigns', 'Testing_policy', 'Contact_tracing', 'Stringency_index']
OXCGRT_VARIABLES_RAW = ['school_closing', 'workplace_closing', 'cancel_events', 'gatherings_restrictions', 'transport_closing', 'stay_home_restrictions', 'internal_movement_restrictions', 'international_movement_restrictions', 'information_campaigns', 'testing_policy', 'contact_tracing', 'stringency_index']
OXCGRT_VARS = ['School_closing', 'Workplace_closing', 'Cancel_events', 'Gatherings_restrictions', 'Transport_closing', 'Stay_home_restrictions', 'Internal_movement_restrictions', 'International_movement_restrictions', 'Information_campaigns', 'Testing_policy', 'Contact_tracing', 'Stringency_index']
OXCGRT_VARS_INDICATORS = ['School_closing', 'Workplace_closing', 'Cancel_events', 'Gatherings_restrictions', 'Transport_closing', 'Stay_home_restrictions', 'Internal_movement_restrictions', 'International_movement_restrictions', 'Information_campaigns', 'Testing_policy', 'Contact_tracing']
RAW_COLS = ['Date', 'ISO3', 'Country', 'School_closing', 'Workplace_closing', 'Cancel_events', 'Gatherings_restrictions', 'Transport_closing', 'Stay_home_restrictions', 'Internal_movement_restrictions', 'International_movement_restrictions', 'Information_campaigns', 'Testing_policy', 'Contact_tracing', 'Stringency_index']
SUBSET_COLS = ['Date', 'School_closing', 'Workplace_closing', 'Cancel_events', 'Gatherings_restrictions', 'Transport_closing', 'Stay_home_restrictions', 'Internal_movement_restrictions', 'International_movement_restrictions', 'Information_campaigns', 'Testing_policy', 'Contact_tracing', 'Stringency_index']
map(country=None, variable='Stringency_index', date=None, **kwargs)[source]

Create global colored map to show the values.

Parameters
  • country (None) – always None

  • variable (str) – variable name to show

  • date (str or None) – date of the records or None (the last value)

  • kwargs – arguments of ColoredMap() and ColoredMap.plot()

Raises

NotImplementedError – @country was specified

subset(country, **kwargs)[source]

Create a subset for a country.

Parameters
  • country (str) – country name or ISO 3166-1 alpha-3, like JPN

  • kwargs – the other arguments will be ignored in the latest version.

Raises

covsirphy.SubsetNotFoundError – no records were found

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pandas.Timestamp): Observation date

  • School_closing

  • Workplace_closing

  • Cancel_events

  • Gatherings_restrictions

  • Transport_closing

  • Stay_home_restrictions

  • Internal_movement_restrictions

  • International_movement_restrictions

  • Information_campaigns

  • Testing_policy

  • Contact_tracing

  • Stringency_index

total()[source]

This method is not defined for OxCGRTData class.

class PCRData(filename=None, data=None, interval=2, min_pcr_tests=100, citation=None)[source]

Bases: covsirphy.cleaning.cbase.CleaningBase

Data cleaning of PCR dataset.

Parameters
  • filename (str or None) – CSV filename of the dataset

  • data (pandas.DataFrame or None) –

    Index

    reset index

    Columns
    • Date: Observation date

    • ISO3: ISO3 code

    • Country: country/region name

    • Province: province/prefecture/state name

    • Tests: the number of tests

  • interval (int) – expected update interval of the number of confirmed cases and tests [days]

  • min_pcr_tests (int) – minimum number of valid daily tests performed in order to calculate positive rate

  • citation (str) – citation

CLEANED_COLS = ['Date', 'ISO3', 'Country', 'Province', 'Tests', 'Confirmed']
C_DIFF = 'Confirmed_diff'
PCR_COLUMNS = ['Date', 'Country', 'Province', 'Tests', 'Confirmed']
PCR_NLOC_COLUMNS = ['Date', 'Tests', 'Confirmed']
PCR_RATE = 'Test_positive_rate'
PCR_VALUE_COLUMNS = ['Tests', 'Confirmed']
RAW_COLS = ['Date', 'ISO3', 'Country', 'Province', 'Tests', 'Confirmed']
SUBSET_COLS = ['Date', 'Tests', 'Confirmed']
T_DIFF = 'Tests_diff'
cleaned()[source]

Return the cleaned dataset of PCRData with tests and confirmed data.

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Country (pandas.Category): country/region name

  • Province (pandas.Category): province/prefecture/state name

  • Tests (int): the number of total tests performed

  • Confirmed (int): the number of confirmed cases

classmethod from_dataframe(dataframe, directory='input')[source]

Create PCRData instance using a pandas dataframe.

Parameters
  • dataframe (pd.DataFrame) –

    cleaned dataset Index

    reset index

    Columns
    • Date: Observation date

    • ISO3: ISO3 code (optional)

    • Country: country/region name

    • Province: province/prefecture/state name

    • Tests: the number of tests

  • directory (str) – directory to save geometry information (for .map() method)

Returns

PCR dataset

Return type

covsirphy.PCRData

map(country=None, variable='Tests', date=None, **kwargs)[source]

Create colored map with the number of tests.

Parameters
  • country (str or None) – country name or None (global map)

  • variable (str) – always ‘vaccinations’

  • date (str or None) – date of the records or None (the last value)

  • kwargs – arguments of ColoredMap() and ColoredMap.plot()

Raises

NotImplementedError – @variable was specified

Note

When @country is None, country level data will be shown on global map. When @country is a country name, province level data will be shown on country map.

positive_rate(country, province=None, window=7, last_date=None, show_figure=True, filename=None)[source]

Return the PCR rate of a country as a dataframe.

Parameters
  • country (str) – country name or ISO3 code

  • province (str or None) – province name

  • window (int) – window of moving average, >= 1

  • last_date (str or None) – the last date of the total tests records or None (max date of main dataset)

  • show_figure (bool) – if True, show the records as a line-plot.

  • filename (str) – filename of the figure, or None (display figure)

Raises

covsirphy.PCRIncorrectPreconditionError – the dataset has too many missing values

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pandas.TimeStamp): Observation date

  • Tests (int): the number of total tests performed

  • Confirmed (int): the number of confirmed cases

  • Tests_diff (int): daily tests performed

  • Confirmed_diff (int): daily confirmed cases

  • Test_positive_rate (float): positive rate (%) of the daily cases over the total daily tests performed

Note

If non monotonic records were found for either confirmed cases or tests, “with partially complemented tests data” will be added to the title of the figure.

replace(country_data)[source]

Replace a part of cleaned dataset with a dataframe.

Parameters

country_data (covsirphy.CountryData) –

dataset object of the country

Index

reset index

Columns
  • Date (pandas.TimeStamp): Observation date

  • Province (pandas.Category): province name

  • Tests (int): the number of total tests performed

  • Confirmed (int): the number of confirmed cases

  • The other columns will be ignored

Returns

self

Return type

covsirphy.PCRData

subset(country, province=None, start_date=None, end_date=None)[source]

Return subset of the country/province and start/end date.

Parameters
  • country (str) – country name or ISO3 code

  • province (str or None) – province name

  • start_date (str or None) – start date, like 22Jan2020

  • end_date (str or None) – end date, like 01Feb2020

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Tests (int): the number of total tests performed

  • Tests_diff (int): daily number of tests on date

  • Confirmed (int): the number of confirmed cases

use_ourworldindata(filename, force=False)[source]

Set the cleaned dataset retrieved from “Our World In Data” site. https://github.com/owid/covid-19-data/tree/master/public/data https://ourworldindata.org/coronavirus

Parameters
  • filename (str) – CSV filename to save the datasetretrieved from “Our World In Data”

  • force (bool) – if True, always download the dataset from “Our World In Data”

exception PCRIncorrectPreconditionError(country, province=None, message=None)[source]

Bases: KeyError

Error when checking preconditions in the PCR data.

Parameters
  • country (str) – country name

  • province (str or None) – province name

  • message (str or None) – the other messages

class ParamTracker(**kwargs)[source]

Bases: covsirphy.util.term.Term

Split records with S-R trend analysis and estimate parameter values of the phases.

Parameters
  • record_df (pandas.DataFrame) –

    records

    Index

    reset index

    Columns
    • Date (pandas.TimeStamp): Observation date

    • Confirmed (int): the number of confirmed cases

    • Infected (int): the number of currently infected cases

    • Fatal (int): the number of fatal cases

    • Recovered (int): the number of recovered cases

    • Susceptible (int): the number of susceptible cases

  • phase_series (covsirphy.PhaseSeries) – phase series object with first/last dates and population

  • area (str or None) – area name, like Japan/Tokyo, or empty string

  • tau (int or None) – tau value [min]

add(end_date=None, days=None, population=None, model=None, **kwargs)[source]

Add a new phase. The start date will be the next date of the last registered phase.

Parameters
  • end_date (str) – end date of the new phase

  • days (int) – the number of days to add

  • population (int or None) – population value of the start date

  • model (covsirphy.ModelBase or None) – ODE model

  • kwargs – keyword arguments of ODE model parameters, not including tau value

Returns

covsirphy.PhaseSeries

Note

  • If the phases series has not been registered, new phase series will be created.

  • Either @end_date or @days must be specified.

  • If @end_date and @days are None, the end date will be the last date of the records.

  • If both of @end_date and @days were specified, @end_date will be used.

  • If @popultion is None, initial value will be used.

  • If @model is None, the model of the last phase will be used.

  • Tau will be fixed as the last phase’s value.

  • kwargs: Default values are the parameter values of the last phase.

all_phases()[source]

Return the names of all enabled phases.

Returns

the names of all enabled phases

Return type

list[str]

change_dates()[source]

Return the list of changed dates (start dates of phases since 1st phase).

Returns

list of change dates

Return type

list[str]

combine(phases, population=None, **kwargs)[source]

Combine the sequential phases as one phase. New phase name will be automatically determined.

Parameters
  • phases (list[str]) – list of phases

  • population (int) – population value of the start date

  • kwargs – keyword arguments to save as phase information

Raises

TypeError – @phases is not a list

Returns

self

Return type

covsirphy.Scenario

static create_series(first_date, last_date, population)[source]

Create PhaseSeries instance.

Parameters
  • first_date (str) – the first date of the records

  • last_date (str) – the last date of the records

  • population (int) – population value

Returns

covsirphy.PhaseSeries

delete(phases)[source]

Delete selected phases.

Parameters

phases (list[str]) – phases names to delete

Returns

covsirphy.PhaseSeries

delete_all()[source]

Delete all phases.

Returns

covsirphy.PhaseSeries

disable(phases)[source]

The phases will be disabled.

Parameters

phase (list[str] or None) – phase names or None (all enabled phases)

Returns

covsirphy.PhaseSeries

enable(phases)[source]

The phases will be enabled.

Parameters

phase (list[str] or None) – phase names or None (all disabled phases)

Returns

covsirphy.PhaseSeries

estimate(model, phases=None, n_jobs=- 1, **kwargs)[source]

Perform parameter estimation for each phases.

Parameters
  • model (covsirphy.ModelBase) – ODE model

  • phases (list[str]) – list of phase names, like 1st, 2nd…

  • n_jobs (int) – the number of parallel jobs or -1 (CPU count)

  • kwargs – keyword arguments of model parameters and covsirphy.Estimator.run()

Returns

tau value [min] and phase series

Return type

tuple(int, covsirphy.PhaseSeries)

Note

  • If @phases is None, all past phase will be used.

  • Phases with estimated parameter values will be ignored.

  • In kwargs, tau value cannot be included.

find_phase(date)[source]

Find the name of the phase which has the date.

Parameters

date (str) – date, like 01Jan2020

Returns

str: phase name, like 1st, 2nd,… covsirphy.PhaseUnit: phase unit

Return type

tuple(str, covsirphy.PhaseUnit)

future_phases()[source]

Return names and phase units of the future phases.

Returns

list[str]: list of phase names list[covsirphy.PhaseUnit]: list of phase units

Return type

tuple(list[str], list[covsirphy.PhaseUnit])

last_end_date()[source]

Return the last end date of the series.

Returns

the last end date

Return type

str

property last_model

ODE model if the last phase

Type

covsirphy.ModelBase

near_change_dates()[source]

Show the list of dates which are yesterday/tomorrow of the start/end dates.

Returns

list of dates

Return type

list[str]

past_phases(phases=None)[source]

Return names and phase units of the past phases.

Parameters

phases (tuple/list[str]) – list of phase names, like 1st, 2nd…

Returns

list[str]: list of phase names list[covsirphy.PhaseUnit]: list of phase units

Return type

tuple(list[str], list[covsirphy.PhaseUnit])

Note

If @phases is None, return the all past phases. If @phases is not None, intersection will be selected.

score(variables=None, phases=None, y0_dict=None, **kwargs)[source]

Evaluate accuracy of phase setting and parameter estimation of selected enabled phases.

Parameters
  • variables (list[str] or None) – variables to use in calculation

  • phases (list[str] or None) – phases to use in calculation

  • y0_dict (dict[str, float] or None) – dictionary of initial values of variables

  • kwargs – keyword arguments of covsirphy.Evaluator.score()

Returns

score with the specified metrics

Return type

float

Note

If @variables is None, [“Infected”, “Fatal”, “Recovered”] will be used. “Confirmed”, “Infected”, “Fatal” and “Recovered” can be used in @variables. If @phases is None, all phases will be used.

separate(date, population=None, **kwargs)[source]

Create a new phase with the change point. New phase name will be automatically determined.

Parameters
  • date (str) – change point, i.e. start date of the new phase

  • population (int) – population value of the change point

  • kwargs – keyword arguments of PhaseUnit.set_ode() if update is necessary

Returns

covsirphy.PhaseSeries

property series

phase series object (series of covsirphy.PhaseUnit)

Type

covsirphy.PhaseSeries

simulate(y0_dict=None)[source]

Simulate ODE models with set/estimated parameter values.

Parameters

y0_dict (dict[str, float] or None) – dictionary of initial values of variables

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Country (str): country/region name

  • Province (str): province/prefecture/state name

  • Variables of the model and dataset (int): Confirmed etc.

trend(force=True, show_figure=False, **kwargs)[source]

Split the records with trend analysis.

Parameters
  • force (bool) – if True, change points will be over-written

  • show_figure (bool) – if True, show the result as a figure

  • kwargs – keyword arguments of covsirphy.TrendDetector(), .TrendDetector.sr() and .trend_plot()

Returns

covsirphy.PhaseSeries

class PhaseSeries(**kwargs)[source]

Bases: covsirphy.util.term.Term

A series of phases.

Parameters
  • first_date (str) – the first date of the series, like 22Jan2020

  • last_date (str) – the last date of the records, like 25May2020

  • population (int) – initial value of total population in the place

add(end_date=None, days=None, population=None, model=None, tau=None, **kwargs)[source]

Add a past phase.

Parameters
  • end_date (str) – end date of the past phase, like 22Jan2020

  • days (int or None) – the number of days to add

  • population (int or None) – population value

  • model (covsirphy.ModelBase) – ODE model

  • tau (int or None) – tau value [min], a divisor of 1440 (prioritize the previous value)

  • kwargs – keyword arguments of model parameters

Returns

self

Return type

covsirphy.PhaseSeries

Note

If @population is None, the previous initial value will be used. When addition of past phases was not completed and the new phase is future phase, fill in the blank.

clear(include_past=False)[source]

Clear phase information. Future phases will be always deleted.

Parameters

include_past (bool) – if True, include past phases.

Returns

self

Return type

covsirphy.PhaseSeries

delete(phase='last')[source]

Delete a phase. The phase will be combined to the previous phase.

Parameters

phase (str) – phase name, like 0th, 1st, 2nd… or ‘last’

Returns

self

Return type

covsirphy.PhaseSeries

Note

When @phase is ‘0th’, disable 0th phase. 0th phase will not be deleted. When @phase is ‘last’, the last phase will be deleted.

disable(phase)[source]

The phase will be disabled and removed from summary.

Parameters

phase (str) – phase name, like 0th, 1st, 2nd…

Returns

self

Return type

covsirphy.PhaseSeries

enable(phase)[source]

The phase will be enabled and appears in summary.

Parameters

phase (str) – phase name, like 0th, 1st, 2nd…

Returns

self

Return type

covsirphy.PhaseSeries

property first_date

the first date of the series, like 22Jan2020

Type

str

property last_date

the last date of the series, like 25May2020

Type

str

replace(phase, new)[source]

Replace phase object.

Parameters
  • phase (str) – phase name, like 0th, 1st, 2nd…

  • new (covsirphy.PhaseUnit) – new phase object

Returns

self

Return type

covsirphy.PhaseSeries

replaces(phase=None, new_list=None, keep_old=False)[source]

Replace phase object.

Parameters
  • phase (str or None) – phase name, like 0th, 1st, 2nd…

  • new_list (list[covsirphy.PhaseUnit]) – new phase objects

Returns

self

Return type

covsirphy.PhaseSeries

Note

If @phase is None and @keep_old is False, all old phases will be deleted. If @phase is not None, the phase will be deleted. @new_list must be specified.

simulate(record_df, y0_dict=None)[source]

Simulate ODE models with set parameter values.

Parameters
  • record_df (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Confirmed (int): the number of confirmed cases

    • Infected (int): the number of currently infected cases

    • Fatal (int): the number of fatal cases

    • Recovered (int): the number of recovered cases (> 0)

    • Susceptible (int): the number of susceptible cases

  • y0_dict (dict or None) – dictionary of initial values or None - key (str): variable name - value (float): initial value

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Country (str): country/region name

  • Province (str): province/prefecture/state name

  • Variables of the model and dataset (int): Confirmed etc.

summary()[source]

Summarize the series of phases in a dataframe.

Returns

Index
  • phase name, like 1st, 2nd, 3rd…

Columns
  • Type: ‘Past’ or ‘Future’

  • Start: start date of the phase

  • End: end date of the phase

  • Population: population value of the start date

  • other information registered to the phases

Return type

(pandas.DataFrame)

to_dict()[source]

Summarize the series of phase in a dictionary.

Returns

nested dictionary of phase information
  • key (str): phase number, like 1th, 2nd,…

  • value (dict): phase information
    • ’Type’: (str) ‘Past’ or ‘Future’

    • values of PhaseUnit.to_dict()

Return type

(dict)

trend(**kwargs)
trend_show(**kwargs)
unit(phase='last')[source]

Return the unit of the phase.

Parameters

phase (str) – phase name (1st etc.) or “last”

Returns

the unit of the phase

Return type

covsirphy.PhaseUnit

Note

When @phase is ‘last’ and no phases were registered, returns A phase with the start/end dates are the previous date of the first date and initial population value.

class PhaseTracker(data, today, area)[source]

Bases: covsirphy.util.term.Term

Track phase information of one scenario.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pandas.Timestamp): Observation date

    • Confirmed (int): the number of confirmed cases

    • Infected (int): the number of currently infected cases

    • Fatal (int): the number of fatal cases

    • Recovered (int): the number of recovered cases

    • Susceptible (int): the number of susceptible cases

  • today (str or pandas.Timestamp) – reference date to determine whether a phase is a past phase or not

  • area (str) – area name, like Japan/Tokyo

Note

(Internally) ID=0 means not registered, ID < 0 means disabled, IDs (>0) are active phase ID.

deactivate(start, end)[source]

Deactivate the phase information from the date range.

Parameters
  • start (str or pandas.Timestamp) – start date of the phase to remove

  • end (str or pandas.Timestamp) – end date of the phase to remove

Returns

self

Return type

covsirphy.PhaseTracker

define_phase(start, end)[source]

Define an active phase with the series of dates.

Parameters
  • start (str or pandas.Timestamp) – start date of the new phase

  • end (str or pandas.Timestamp) – end date of the new phase

Returns

self

Return type

covsirphy.PhaseTracker

Note

When today is in the range of (start, end), a past phase and a future phase will be created.

estimate(model, tau=None, **kwargs)[source]

Perform parameter estimation for each phases and update parameter values.

Parameters
  • model (covsirphy.ModelBase) – ODE model

  • tau (int or None) – tau value [min] or None (to be estimated)

  • kwargs – keyword arguments of ODEHander(), ODEHandler.estimate_tau() and .estimate_param()

Returns

applied or estimated tau value [min]

Return type

int

Note

ODE parameter estimation will be done for all active phases.

parse_range(dates=None, past_days=None, phases=None)[source]

Parse date range and return the minimum date and maximum date.

Parameters
  • dates (tuple(str or pandas.Timestamp or None, ) or None) – start date and end date

  • past_days (int or None) – how many past days to use in calculation from today (property)

  • phases (list[str] or None) – phase names to use in calculation

Raises
Returns

the minimum date and maximum date

Return type

tuple(pandas.Timestamp, pandas.Timestamp)

Notes

When not specified (i.e. None was applied), the start date of the 0th phase will be used as the minimum date.

Notes

When not specified (i.e. None was applied), the end date of the last phase phase will be used as the maximum date.

Note

When @past_days was specified, (today - @past_days, today) will be returned.

Note

In @phases, ‘last’ means the last registered phase.

Note

Priority is given in the order of @dates, @past_days, @phases.

remove_phase(start, end)[source]

Remove phase information from the date range.

Parameters
  • start (str or pandas.Timestamp) – start date of the phase to remove

  • end (str or pandas.Timestamp) – end date of the phase to remove

Returns

self

Return type

covsirphy.PhaseTracker

set_ode(model, param_df, tau)[source]

Set ODE model, parameter values manually, not using parameter estimation.

Parameters
  • model (covsirphy.ModelBase) – ODE model

  • param_df (pandas.DataFrame) –

    Index

    Date (pandas.Timestamp): dates to update parameter values

    Columns

    (float): parameter values

  • tau (int) – tau value [min] (must not be None)

Raises

ValueError – some model parameters are not included in @param_df

Note

Parameters are defined by model.PARAMETERS.

Returns

applied tau value [min]

Return type

int

Note

ODE model for simulation will be overwritten.

simulate()[source]

Perform simulation with the multi-phased ODE model.

Raises

covsirphy.UnExecutedError – either tau value or phase information was not set

Returns

Index

reset index

Columns
  • Date (pandas.Timestamp): Observation date

  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

  • Susceptible (int): the number of susceptible cases

Return type

pandas.DataFrame

Note

Deactivated phases will be included.

Note

Un-registered phases will not be included.

Note

If parameter set is not registered for the current phase and the previous phase has parameter set, this set will be used for the current phase.

summary()[source]

Summarize phase information.

Returns

pandas.DataFrame
Index

str: phase names

Columns
  • Type: ‘Past’ or ‘Future’

  • Start: start date of the phase

  • End: end date of the phase

  • Population: population value of the start date

  • If available,
    • ODE (str): ODE model names

    • Rt (float): phase-dependent reproduction number

    • (str, float): estimated parameter values, including rho

    • tau (int): tau value [min]

    • (int or float): day parameters, including 1/beta [days]

    • {metric}: score with the estimated parameter values

    • Trials (int): the number of trials

    • Runtime (str): runtime of optimization

track()[source]

Track data with all dates.

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pandas.Timestamp): Observation date

  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

  • Susceptible (int): the number of susceptible cases

  • If available,
    • Rt (float): phase-dependent reproduction number

    • (str, float): estimated parameter values, including rho

    • (int or float): day parameters, including 1/beta [days]

    • {metric}: score with the estimated parameter values

    • Trials (int): the number of trials

    • Runtime (str): runtime of optimization

Note

C/I/F/R/S/I is simulated values if parameter values are available.

trend(force, show_figure, **kwargs)[source]

Define past phases with S-R trend analysis.

Parameters
  • force (bool) – if True, change points will be over-written

  • show_figure (bool) – if True, show the result as a figure

  • kwargs – keyword arguments of covsirphy.TrendDetector(), .TrendDetector.sr() and .trend_plot()

Returns

self

Return type

covsirphy.PhaseTracker

class PhaseUnit(**kwargs)[source]

Bases: covsirphy.util.term.Term

Save information of a phase.

Parameters
  • start_date (str) – start date of the phase

  • end_date (str) – end date of the phase

  • population (int) – population value

Examples

>>> unit1 = PhaseUnit("01Jan2020", "01Feb2020", 1000)
>>> unit2 = PhaseUnit("02Feb2020", "01Mar2020", 1000)
>>> unit3 = PhaseUnit("02Mar2020", "01Apr2020", 1000)
>>> unit4 = PhaseUnit("02Mar2020", "01Apr2020", 1000)
>>> unit5 = PhaseUnit("01Jan2020", "01Apr2020", 1000)
>>> str(unit1)
'Phase (01Jan2020 - 01Feb2020)'
>>> unit4 == unit4
True
>>> unit1 != unit2
True
>>> unit1 < unit2
True
>>> unit3 > unit1
True
>>> unit3 < unit4
False
>>> unit3 <= unit4
True
>>> unit1 < "02Feb2020"
True
>>> unit1 <= "01Feb2020"
True
>>> unit1 > "31Dec2019"
True
>>> unit1 >= "01Jan2020"
True
>>> sorted([unit3, unit1, unit2]) == [unit1, unit2, unit3]
True
>>> str(unit1 + unit2)
'Phase (01Jan2020 - 01Mar2020)'
>>> str(unit5 - unit1)
'Phase (02Feb2020 - 01Apr2020)'
>>> str(unit5 - unit4)
'Phase (01Jan2020 - 01Mar2020)'
>>> set([unit1, unit3, unit4]) == set([unit1, unit3])
True
del_id()[source]

Delete identifers.

Returns

self

Return type

covsirphy.PhaseUnit

disable()[source]

Disable the phase.

Examples

>>> unit.disable
>>> bool(unit)
False
enable()[source]

Enable the phase.

Examples

>>> unit.enable
>>> bool(unit)
True
property end_date

end date

Type

str

estimate(record_df=None, **kwargs)[source]

Perform parameter estimation.

Parameters
  • record_df (pandas.DataFrame or None) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Confirmed (int): the number of confirmed cases

    • Infected (int): the number of currently infected cases

    • Fatal (int): the number of fatal cases

    • Recovered (int): the number of recovered cases

    • any other columns will be ignored

  • **kwargs – keyword arguments of Estimator.run()

Note

If @record_df is None, registered records will be used.

property estimator

estimator object

Type

covsirphy.Estimator or None

property id_dict

id_dict of the phase

Type

tuple(str)

property model

model description

Type

covsirphy.ModelBase or None

property population

population value

Type

str

property record_df

records of the phase

Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

  • Susceptible (int): the number of susceptible cases

Type

pandas.DataFrame

set_id(**kwargs)[source]

Set identifiers.

Parameters

id_dict (dict[str, str]) – dictionary of identifiers

Returns

self

Return type

covsirphy.PhaseUnit

set_ode(model=None, tau=None, **kwargs)[source]

Set ODE model, tau value and parameter values, if necessary.

Parameters
  • model (covsirphy.ModelBase or None) – ODE model

  • tau (int or None) – tau value [min], a divisor of 1440

  • kwargs – keyword arguments of model parameters

Returns

self

Return type

covsirphy.PhaseUnit

set_y0(record_df)[source]

Set initial values.

Parameters

record_df (pandas.DataFrame) –

Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

  • any other columns will be ignored

simulate(y0_dict=None)[source]

Perform simulation with the set/estimated parameter values.

Parameters

y0_dict (dict or None) – dictionary of initial values or None - key (str): variable name - value (float): initial value

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

  • Variables of the model (int): Confirmed etc.

Note

Simulation starts at the start date of the phase. Simulation end at the next date of the end date of the phase.

property start_date

start date

Type

str

summary()[source]

Summarize information.

Returns

Index

reset index

Columns
  • Start: start date of the phase

  • End: end date of the phase

  • Population: population value of the start date

  • if available:
    • ODE (str): model name

    • Rt (float): (basic) reproduction number

    • rho etc. (float): parameter values if available

    • tau (int): tau value [min]

    • (int): day parameter values if available

    • {metric name} (float): score of parameter estimation

    • Trials (int): the number of trials in parameter estimation

    • Runtime (str): runtime of parameter estimation

Return type

pandas.DataFrame

property tau

tau value [min]

Type

int or None

to_dict()[source]

Summarize phase information and return as a dictionary.

Returns

  • Start: start date of the phase

  • End: end date of the phase

  • Population: population value of the start date

  • if available:
    • ODE: model name

    • Rt: (basic) reproduction number

    • parameter values if available

    • day parameter values if available

    • tau: tau value [min]

    • {metric name}: score of parameter estimation

    • Trials: the number of trials in estimation

    • Runtime: runtime of estimation

Return type

dict

class PolicyMeasures(**kwargs)[source]

Bases: covsirphy.util.term.Term

Analyse the relationship of policy measures and parameters of ODE models. This analysis will be done at country level because OxCGRT tracks policies at country level.

Parameters
property countries

countries to analyse

Type

list[str]

estimate(model, n_jobs=- 1, **kwargs)[source]

Estimate the parameter values of phases in the registered countries.

Parameters
  • model (covsirphy.ModelBase) – ODE model

  • n_jobs (int) – the number of parallel jobs or -1 (CPU count)

  • kwargs – keyword arguments of model parameters and covsirphy.Estimator.run()

history(param, roll_window=None, show_figure=True, filename=None, **kwargs)[source]

Return subset of summary and show a figure to show the history of all countries.

Parameters
  • param (str) – parameter/day parameter/Rt/OxCGRT score to show

  • roll_window (int or None) – rolling average window if necessary

  • show_figure (bool) – If True, show the result as a figure

  • filename (str) – filename of the figure, or None (show figure)

  • kwargs – keword arguments of line_plot()

Returns

Index

Date (pd.Timestamp) date

Columns

(str) country names

Values:

parameter values

Return type

pandas.DataFrame

param_history(**kwargs)
phase_len()[source]

Make groups of countries with the length of phases.

Returns

list of countries with the length of phases

Return type

dict(int, list[str])

scenario(country)[source]

Return Scenario instance of the country.

Parameters

country (str) – country name

Raises

KeyError – the country is not registered

Returns

Scenario instance

Return type

covsirphy.Scenario

summary(columns=None, countries=None)[source]

Summarize of scenarios.

Parameters
  • columns (list[str] or None) – columns to show

  • countries (list[str] or None) – countries to show

Returns

pandas.DataFrame

Note

If @columns is None, all columns will be shown.

track()[source]

Return subset of summary and show a figure to show the history in each country.

Parameters
  • param (str) – parameter to show

  • roll_window (int or None) – rolling average window if necessary

  • show_figure (bool) – If True, show the result as a figure

  • filename (str) – filename of the figure, or None (show figure)

  • kwargs – keword arguments of pd.DataFrame.plot or line_plot()

Returns

parameter values
Index

reset index

Columns
  • Country (str): country name

  • Date (pd.Timestamp): date

  • (float): model parameters

  • (float): model day parameters

  • Rt (float): reproduction number

  • (float): OxCGRT values

Return type

pandas.DataFrame

trend(min_len=2)[source]

Perform S-R trend analysis for all registered countries.

Parameters

min_len (int) – minimum length of phases to have

Returns

self

Return type

covsirphy.PolicyMeasures

Note

Countries which do not have @min_len phases will be un-registered.

class Population(**kwargs)[source]

Bases: covsirphy.cleaning.population.PopulationData

class PopulationData(filename=None, data=None, citation=None)[source]

Bases: covsirphy.cleaning.cbase.CleaningBase

Data cleaning of total population dataset.

Parameters
  • filename (str or None) – CSV filename of the dataset

  • data (pandas.DataFrame or None) –

    Index

    reset index

    Columns
    • Date: Observation date

    • ISO3: ISO3 code

    • Country: country/region name

    • Province: province/prefecture/state name

    • Population: total population

  • citation (str or None) – citation or None (empty)

Note

Either @filename (high priority) or @data must be specified.

CLEANED_COLS = ['Date', 'ISO3', 'Country', 'Province', 'Population']
POPULATION_COLS = ['Date', 'ISO3', 'Country', 'Province', 'Population']
RAW_COLS = ['Date', 'ISO3', 'Country', 'Province', 'Population']
SUBSET_COLS = ['Date', 'Population']
countries()[source]

Return names of countries where records are registered.

Raises

KeyError – Country names are not registered in this dataset

Returns

list of country names

Return type

list[str]

Note

Country ‘Others’ will be removed.

map(country=None, variable='Population', date=None, **kwargs)[source]

Create colored map with the number of tests.

Parameters
  • country (str or None) – country name or None (global map)

  • variable (str) – always ‘Population’

  • date (str or None) – date of the records or None (the last value)

  • kwargs – arguments of ColoredMap() and ColoredMap.plot()

Raises

NotImplementedError – @variable was specified

Note

When @country is None, country level data will be shown on global map. When @country is a country name, province level data will be shown on country map.

to_dict(country_level=True)[source]

Return dictionary of population values.

Args: country_level (str): whether key is country name or not

Returns

dict
  • if @country_level is True, {“country”, population}

  • if False, {“country/province”, population}

total()[source]

Return the total value of population in the dataset.

Returns

int

update(value, country, province=None, date=None)[source]

Update the value of a new place.

Parameters
  • value (int) – population in the place

  • country (str) – country name

  • province (str) – province name

  • date (str or None) – observation date, like 01Jun2020

Returns

self

Return type

covsirphy.PopulationData

Note

If @date is None, the created date of the instance will be used. If @province is None, “-” will be used.

value(country, province=None, date=None)[source]

Return the value of population in the place.

Parameters
  • country (str) – country name or ISO3 code

  • province (str) – province name

  • date (str or None) – observation date, like 01Jun2020

Returns

population in the place

Return type

int

Note

If @date is None, the created date of the instancewill be used

class PopulationPyramidData(filename, force=False, verbose=1)[source]

Bases: covsirphy.cleaning.cbase.CleaningBase

Population pyramid dataset. World Bank Group (2020), World Bank Open Data, https://data.worldbank.org/

Parameters
  • filename (str or None) – CSV filename to save the dataset

  • force (bool) – if True, always download the dataset from the server

  • verbose (int) – level of verbosity

Returns

If @filename is None, empty dataframe will be set as raw data. If @citation is None, citation will be empty string.

AGE = 'Age'
AGE_KEYS = ['0004', '0509', '1014', '1519', '2024', '2529', '3034', '3539', '4044', '4549', '5054', '5559', '6064', '6569', '7579', '80UP']
ELDEST = 122
INDICATOR_DICT = {'SP.POP.0004.FE': '00-04-FE', 'SP.POP.0004.MA': '00-04-MA', 'SP.POP.0509.FE': '05-09-FE', 'SP.POP.0509.MA': '05-09-MA', 'SP.POP.1014.FE': '10-14-FE', 'SP.POP.1014.MA': '10-14-MA', 'SP.POP.1519.FE': '15-19-FE', 'SP.POP.1519.MA': '15-19-MA', 'SP.POP.2024.FE': '20-24-FE', 'SP.POP.2024.MA': '20-24-MA', 'SP.POP.2529.FE': '25-29-FE', 'SP.POP.2529.MA': '25-29-MA', 'SP.POP.3034.FE': '30-34-FE', 'SP.POP.3034.MA': '30-34-MA', 'SP.POP.3539.FE': '35-39-FE', 'SP.POP.3539.MA': '35-39-MA', 'SP.POP.4044.FE': '40-44-FE', 'SP.POP.4044.MA': '40-44-MA', 'SP.POP.4549.FE': '45-49-FE', 'SP.POP.4549.MA': '45-49-MA', 'SP.POP.5054.FE': '50-54-FE', 'SP.POP.5054.MA': '50-54-MA', 'SP.POP.5559.FE': '55-59-FE', 'SP.POP.5559.MA': '55-59-MA', 'SP.POP.6064.FE': '60-64-FE', 'SP.POP.6064.MA': '60-64-MA', 'SP.POP.6569.FE': '65-69-FE', 'SP.POP.6569.MA': '65-69-MA', 'SP.POP.7579.FE': '75-79-FE', 'SP.POP.7579.MA': '75-79-MA', 'SP.POP.80UP.FE': '80-UP-FE', 'SP.POP.80UP.MA': '80-UP-MA'}
PORTION = 'Per_total'
PYRAMID_COLS = ['Country', 'Year', 'Sex', 'Age', 'Population']
SEX = 'Sex'
SUBSET_COLS = ['Age', 'Population', 'Per_total']
YEAR = 'Year'
cleaned()[source]

Return the cleaned dataset.

Returns

Index

reset index

Columns
  • Country (pandas.Category): country name

  • Year (int): year

  • Sex (str): Female/Male

  • Age (int): age

  • Population (int): population value

Return type

pandas.DataFrame

layer(country=None)[source]

Return the cleaned data at the selected layer.

Parameters

country (str or None) – country name or None (country level data or country-specific dataset)

Returns

Index

reset index

Columns - Country (str): country names - Province (str): province names (or removed when country level data) - any other columns of the cleaned data

Return type

pandas.DataFrame

Raises
  • SubsetNotFoundError – no records were found for the country (when @country is not None)

  • KeyError – @country was None, but country names were not registered in the dataset

Note

When @country is None, country level data will be returned. When @country is a country name, province level data in the selected country will be returned.

records(country, year=None, sex=None)[source]

Return the subset.

Parameters
  • country (str) – country name

  • year (int or None) – year or None (the last records)

  • sex (str) – Female/Male or None (total)

Returns

pandas.DataFrame
Index

reset index

Columns
  • Age (int): age

  • Population (int): population value

  • Per_total (float): portion of the total

retrieve(country)[source]

Retrieve the dataset of the country from the local file or the server.

Parameters

country (str) – country name

Returns

retrieved data
Index

reset index

Columns
  • Country (pandas.Category): country name

  • Year (int): year

  • Sex (str): Female/Male

  • Age (int): age

  • Population (int): population value

Return type

pandas.DataFrame

subset(country, year=None, sex=None)[source]

Return the subset.

Parameters
  • country (str) – country name

  • year (int or None) – year or None (the last records)

  • sex (str) – Female/Male or None (total)

Returns

pandas.DataFrame
Index

reset index

Columns
  • Age (int): age

  • Population (int): population value

  • Per_total (float): portion of the total

class RegressionHandler(data, model, delay, **kwargs)[source]

Bases: covsirphy.util.term.Term

Handle regressors to predict parameter values of ODE models. With .fit() method, the best regressor will be selected based on the scores with test dataset.

Parameters
  • data (pandas.DataFrame) –

    Index

    Date (pandas.Timestamp): observation date

    Columns
    • parameter values

    • the number of cases

    • indicators

  • model (covsirphy.ModelBase) – ODE model

  • delay (int or tuple(int, int)) – exact (or value range of) delay period [days]

  • kwargs – keyword arguments of sklearn.model_selection.train_test_split()

Note

If @seed is included in kwargs, this will be converted to @random_state.

Note

default values regarding sklearn.model_selection.train_test_split() are test_size=0.2, random_state=0, shuffle=False.

fit(metric)[source]

Fit regressors and select the best regressor based on the scores with test dataset.

Parameters

metric (str) – metric name to select the best regressor

Raises

ValueError – un-expected parameter values were predcited by all regressors, out of range (0, 1)

Returns

the best score

Return type

float

Note

All regressors are here. - Indicators -> Parameters with Elastic Net - Indicators -> Parameters with Decision Tree Regressor - Indicators -> Parameters with Epsilon-Support Vector Regressor - Indicators(n)/Indicators(n-1) -> Parameters(n)/Parameters(n-1) with Elastic Net - Indicators(n)/Indicators(n-1) -> Parameters(n)/Parameters(n-1) with Decision Tree Regressor - Indicators(n)/Indicators(n-1) -> Parameters(n)/Parameters(n-1) with Epsilon-Support Vector Regressor

predict()[source]

Predict parameter values of the ODE model using the best regressor.

Returns

Index

Date (pandas.Timestamp): future dates

Columns

(float): parameter values (4 digits)

Return type

pandas.DataFrame

to_dict(metric)[source]

Return information regarding the best regressor.

Parameters

metric (str) – metric name to select the best regressor

Returns

regressor information of the best model, including
  • best (str): description of the selected approach

  • scaler (object): scaler class

  • regressor (object): regressor class

  • alpha (float): alpha value used in Elastic Net regression

  • l1_ratio (float): l1_ratio value used in Elastic Net regression

  • score_name (str): scoring method (specified with @metric or @metrics)

  • score_train (float): score with train dataset

  • score_test (float): score with test dataset

  • dataset (dict[numpy.ndarray]): X_train, X_test, y_train, y_test, X_target

  • intercept (pandas.DataFrame): intercept and coefficients (Index ODE parameters, Columns indicators)

  • coef (pandas.DataFrame): intercept and coefficients (Index ODE parameters, Columns indicators)

  • delay (list[int]): list of delay period [days]

Return type

dict(str, object)

class SEWIRF(population, theta, kappa, rho1, rho2, rho3, sigma)[source]

Bases: covsirphy.ode.mbase.ModelBase

SEWIR-F model.

DAY_PARAMETERS = ['alpha1 [-]', '1/alpha2 [day]', '1/beta1 [day]', '1/beta2 [day]', '1/beta3 [day]', '1/gamma [day]']
EXAMPLE = {'param_dict': {'kappa': 0.005, 'rho1': 0.2, 'rho2': 0.167, 'rho3': 0.167, 'sigma': 0.075, 'theta': 0.002}, 'population': 1000000, 'step_n': 180, 'y0_dict': {'Exposed': 3000, 'Fatal': 0, 'Infected': 1000, 'Recovered': 0, 'Susceptible': 994000, 'Waiting': 2000}}
NAME = 'SEWIR-F'
PARAMETERS = ['theta', 'kappa', 'rho1', 'rho2', 'rho3', 'sigma']
VARIABLES = ['Susceptible', 'Exposed', 'Waiting', 'Infected', 'Recovered', 'Fatal']
VARS_INCLEASE = ['Recovered', 'Fatal']
VAR_DICT = {'w': 'Fatal', 'x1': 'Susceptible', 'x2': 'Exposed', 'x3': 'Waiting', 'y': 'Infected', 'z': 'Recovered'}
WEIGHTS = array([ 0, 10, 10,  2,  0,  0])
calc_days_dict(tau)[source]

Calculate 1/beta [day] etc.

Parameters

tau (param) – tau value [min]

Returns

dict[str, int]

calc_r0()[source]

Calculate (basic) reproduction number.

Returns

float

classmethod convert(data, tau)[source]

Divide dates by tau value [min] and convert variables to model-specialized variables.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Susceptible(int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • tau (int) – tau value [min] or None (skip division by tau values)

Returns

Index
  • Date (pd.Timestamp): Observation date (available when @tau is None)

  • t (int): time steps (available when @tau is not None)

Columns
  • Susceptible (int): the number of susceptible cases

  • Exposed (int): 0

  • Waiting (int): 0

  • Infected (int): the number of currently infected cases

  • Recovered (int): the number of recovered cases

  • Fatal (int): the number of fatal cases

Return type

pandas.DataFrame

classmethod convert_reverse(converted_df, start, tau)[source]

Calculate date with tau and start date, and restore Susceptible/Infected/Fatal/Recovered.

Parameters
  • converted_df (pandas.DataFrame) –

    Index

    time steps: Dates divided by tau value

    Columns
    • Susceptible (int): the number of susceptible cases

    • Exposed (int): exposed and in latent period (without infectivity)

    • Waiting (int): waiting for confirmaion diagnosis (with infectivity)

    • Infected (int): the number of currently infected cases

    • Recovered (int): the number of recovered cases

    • Fatal (int): the number of fatal cases

  • start (pd.Timestamp) – start date of simulation, like 14Apr2021

  • tau (int) – tau value [min]

Returns

Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Susceptible(int): the number of susceptible cases

  • Infected (int): the number of currently infected cases

  • Fatal(int): the number of fatal cases

  • Recovered (int): the number of recovered cases

Return type

pandas.DataFrame

classmethod guess(data, tau, q=0.5)[source]

With (X, dX/dt) for X=S, I, R and so on, guess parameter values. This is not implemented for SEWIR-F model.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Susceptible(int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • tau (int) – tau value [min]

  • q (float or tuple(float,)) – the quantile(s) to compute, value(s) between (0, 1)

classmethod param_range(*args, **kwargs)
classmethod restore(*args, **kwargs)
classmethod specialize(*args, **kwargs)
class SIR(population, rho, sigma)[source]

Bases: covsirphy.ode.mbase.ModelBase

SIR model.

Parameters
  • population (int) – total population

  • rho (float) –

  • sigma (float) –

DAY_PARAMETERS = ['1/beta [day]', '1/gamma [day]']
EXAMPLE = {'param_dict': {'rho': 0.2, 'sigma': 0.075}, 'population': 1000000, 'step_n': 180, 'y0_dict': {'Fatal or Recovered': 0, 'Infected': 1000, 'Susceptible': 999000}}
NAME = 'SIR'
PARAMETERS = ['rho', 'sigma']
VARIABLES = ['Susceptible', 'Infected', 'Fatal or Recovered']
VARS_INCLEASE = ['Fatal or Recovered']
VAR_DICT = {'x': 'Susceptible', 'y': 'Infected', 'z': 'Fatal or Recovered'}
WEIGHTS = array([1, 1, 1])
calc_days_dict(tau)[source]

Calculate 1/beta [day] etc.

Parameters

tau (param) – tau value [min]

Returns

dict[str, int]

calc_r0()[source]

Calculate (basic) reproduction number.

Returns

float

classmethod convert(data, tau)[source]

Divide dates by tau value [min] and convert variables to model-specialized variables.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Susceptible(int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • tau (int) – tau value [min] or None (skip division by tau values)

Returns

Index
  • Date (pd.Timestamp): Observation date (available when @tau is None)

  • t (int): time steps (available when @tau is not None)

Columns
  • Susceptible (int): the number of susceptible cases

  • Infected (int): the number of currently infected cases

  • Fatal or Recovered (int): the number of fatal/recovered cases

Return type

pandas.DataFrame

classmethod convert_reverse(converted_df, start, tau)[source]

Calculate date with tau and start date, and restore Susceptible/Infected/”Fatal or Recovered”.

Parameters
  • converted_df (pandas.DataFrame) –

    Index

    t: Dates divided by tau value (time steps)

    Columns
    • Susceptible (int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal or Recovered (int): the number of fatal/recovered cases

  • start (pd.Timestamp) – start date of simulation, like 14Apr2021

  • tau (int) – tau value [min]

Returns

Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Susceptible(int): the number of susceptible cases

  • Infected (int): the number of currently infected cases

  • Fatal(int): the number of fatal cases

  • Recovered (int): the number of recovered cases

Return type

pandas.DataFrame

classmethod guess(data, tau, q=0.5)[source]

With (X, dX/dt) for X=S, I, R, guess parameter values.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Susceptible(int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • tau (int) – tau value [min]

  • q (float or tuple(float,)) – the quantile(s) to compute, value(s) between (0, 1)

Returns

guessed parameter values with the quantile(s)

Return type

dict(str, float or pandas.Series)

Note

We can guess parameter values with difference equations as follows. - rho = - n * (dS/dt) / S / I - sigma = (dR/dt) / I

classmethod param_range(*args, **kwargs)
classmethod restore(*args, **kwargs)
classmethod specialize(*args, **kwargs)
class SIRD(population, kappa, rho, sigma)[source]

Bases: covsirphy.ode.mbase.ModelBase

SIR-D model.

Parameters
  • population (int) – total population

  • kappa (float) –

  • rho (float) –

  • sigma (float) –

DAY_PARAMETERS = ['1/alpha2 [day]', '1/beta [day]', '1/gamma [day]']
EXAMPLE = {'param_dict': {'kappa': 0.005, 'rho': 0.2, 'sigma': 0.075}, 'population': 1000000, 'step_n': 180, 'y0_dict': {'Fatal': 0, 'Infected': 1000, 'Recovered': 0, 'Susceptible': 999000}}
NAME = 'SIR-D'
PARAMETERS = ['kappa', 'rho', 'sigma']
VARIABLES = ['Susceptible', 'Infected', 'Recovered', 'Fatal']
VARS_INCLEASE = ['Recovered', 'Fatal']
VAR_DICT = {'w': 'Fatal', 'x': 'Susceptible', 'y': 'Infected', 'z': 'Recovered'}
WEIGHTS = array([ 1, 10, 10,  2])
calc_days_dict(tau)[source]

Calculate 1/beta [day] etc.

Parameters

tau (param) – tau value [min]

Returns

dict[str, int]

calc_r0()[source]

Calculate (basic) reproduction number.

Returns

float

classmethod convert(data, tau)[source]

Divide dates by tau value [min] and convert variables to model-specialized variables.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Susceptible(int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • tau (int) – tau value [min] or None (skip division by tau values)

Returns

Index
  • Date (pd.Timestamp): Observation date (available when @tau is None)

  • t (int): time steps (available when @tau is not None)

Columns
  • Susceptible (int): the number of susceptible cases

  • Infected (int): the number of currently infected cases

  • Recovered (int): the number of recovered cases

  • Fatal (int): the number of fatal cases

Return type

pandas.DataFrame

classmethod convert_reverse(converted_df, start, tau)[source]

Calculate date with tau and start date, and restore Susceptible/Infected/Fatal/Recovered.

Parameters
  • converted_df (pandas.DataFrame) –

    Index

    t: Dates divided by tau value (time steps)

    Columns
    • Susceptible (int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Recovered (int): the number of recovered cases

    • Fatal (int): the number of fatal cases

  • start (pd.Timestamp) – start date of simulation, like 14Apr2021

  • tau (int) – tau value [min]

Returns

Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Susceptible(int): the number of susceptible cases

  • Infected (int): the number of currently infected cases

  • Fatal(int): the number of fatal cases

  • Recovered (int): the number of recovered cases

Return type

pandas.DataFrame

classmethod guess(data, tau, q=0.5)[source]

With (X, dX/dt) for X=S, I, R, D, guess parameter values.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Susceptible(int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • tau (int) – tau value [min]

  • q (float or tuple(float,)) – the quantile(s) to compute, value(s) between (0, 1)

Returns

guessed parameter values with the quantile(s)

Return type

dict(str, float or pandas.Series)

Note

We can guess parameter values with difference equations as follows. - kappa = (dF/dt) / I - rho = - n * (dS/dt) / S / I - sigma = (dR/dt) / I

classmethod param_range(*args, **kwargs)
classmethod restore(*args, **kwargs)
classmethod specialize(*args, **kwargs)
class SIRF(population, theta, kappa, rho, sigma)[source]

Bases: covsirphy.ode.mbase.ModelBase

SIR-F model.

Parameters
  • population (int) – total population

  • theta (float) –

  • kappa (float) –

  • rho (float) –

  • sigma (float) –

DAY_PARAMETERS = ['alpha1 [-]', '1/alpha2 [day]', '1/beta [day]', '1/gamma [day]']
EXAMPLE = {'param_dict': {'kappa': 0.005, 'rho': 0.2, 'sigma': 0.075, 'theta': 0.002}, 'population': 1000000, 'step_n': 180, 'y0_dict': {'Fatal': 0, 'Infected': 1000, 'Recovered': 0, 'Susceptible': 999000}}
NAME = 'SIR-F'
PARAMETERS = ['theta', 'kappa', 'rho', 'sigma']
VARIABLES = ['Susceptible', 'Infected', 'Recovered', 'Fatal']
VARS_INCLEASE = ['Recovered', 'Fatal']
VAR_DICT = {'w': 'Fatal', 'x': 'Susceptible', 'y': 'Infected', 'z': 'Recovered'}
WEIGHTS = array([0, 1, 1, 1])
calc_days_dict(tau)[source]

Calculate 1/beta [day] etc.

Parameters

tau (param) – tau value [min]

Returns

dict[str, int]

calc_r0()[source]

Calculate (basic) reproduction number.

Returns

float

classmethod convert(data, tau)[source]

Divide dates by tau value [min] and convert variables to model-specialized variables.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Susceptible(int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • tau (int) – tau value [min] or None (skip division by tau values)

Returns

Index
  • Date (pd.Timestamp): Observation date (available when @tau is None)

  • t (int): time steps (available when @tau is not None)

Columns
  • Susceptible (int): the number of susceptible cases

  • Infected (int): the number of currently infected cases

  • Recovered (int): the number of recovered cases

  • Fatal (int): the number of fatal cases

Return type

pandas.DataFrame

classmethod convert_reverse(converted_df, start, tau)[source]

Calculate date with tau and start date, and restore Susceptible/Infected/Fatal/Recovered.

Parameters
  • converted_df (pandas.DataFrame) –

    Index

    t: Dates divided by tau value (time steps)

    Columns
    • Susceptible (int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Recovered (int): the number of recovered cases

    • Fatal (int): the number of fatal cases

  • start (pd.Timestamp) – start date of simulation, like 14Apr2021

  • tau (int) – tau value [min]

Returns

Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Susceptible(int): the number of susceptible cases

  • Infected (int): the number of currently infected cases

  • Fatal(int): the number of fatal cases

  • Recovered (int): the number of recovered cases

Return type

pandas.DataFrame

classmethod guess(data, tau, q=0.5)[source]

With (X, dX/dt) for X=S, I, R, F, guess parameter values.

Parameters
  • data (pandas.DataFrame) –

    Index

    reset index

    Columns
    • Date (pd.Timestamp): Observation date

    • Susceptible(int): the number of susceptible cases

    • Infected (int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

  • tau (int) – tau value [min]

  • q (float or tuple(float,)) – the quantile(s) to compute, value(s) between (0, 1)

Returns

guessed parameter values with the quantile(s)

Return type

dict(str, float or pandas.Series)

Note

We can guess parameter values with difference equations as follows. - theta -> +0 (i.e. around 0 and not negative) - kappa -> (dF/dt) / I when theta -> +0 - rho = - n * (dS/dt) / S / I - sigma = (dR/dt) / I

classmethod param_range(*args, **kwargs)
classmethod restore(*args, **kwargs)
classmethod specialize(*args, **kwargs)
class SIRFV(population, theta, kappa, rho, sigma, omega=None, v_per_day=None)[source]

Bases: covsirphy.ode.mbase.ModelBase

SIR-FV model.

Parameters

population (int) – total population theta (float) kappa (float) rho (float) sigma (float) omega (float) or v_per_day (int)

DAY_PARAMETERS = ['alpha1 [-]', '1/alpha2 [day]', '1/beta [day]', '1/gamma [day]', 'Vaccinated [persons]']
EXAMPLE = {'param_dict': {'kappa': 0.005, 'omega': 0.001, 'rho': 0.2, 'sigma': 0.075, 'theta': 0.002}, 'population': 1000000, 'step_n': 180, 'y0_dict': {'Fatal': 0, 'Infected': 1000, 'Recovered': 0, 'Susceptible': 999000, 'Vaccinated': 0}}
NAME = 'SIR-FV'
PARAMETERS = ['theta', 'kappa', 'rho', 'sigma', 'omega']
VARIABLES = ['Susceptible', 'Infected', 'Recovered', 'Fatal', 'Vaccinated']
VARS_INCLEASE = ['Recovered', 'Fatal']
VAR_DICT = {'v': 'Vaccinated', 'w': 'Fatal', 'x': 'Susceptible', 'y': 'Infected', 'z': 'Recovered'}
WEIGHTS = array([ 0, 10, 10,  2,  0])
class ScatterPlot(filename=None, bbox_inches='tight', **kwargs)[source]

Bases: covsirphy.visualization.line_plot.LinePlot

Create a scatter plot.

Parameters
  • filename (str or None) – filename to save the figure or None (display)

  • bbox_inches (str) – bounding box in inches when creating the figure

  • kwargs – the other arguments of matplotlib.pyplot.savefig()

legend(**kwargs)[source]

ScatterPlot.legend() is not implemented.

legend_hide()[source]

ScatterPlot.legend_hide() is not implemented.

line_straight(p1=None, p2=None, color='black', linestyle=':')[source]

Connect the points with a straight line.

Parameters
  • p1 (tuple(int or float, int or float) or None) – (x, y) of the first point or None (min values)

  • p2 (tuple(int or float, int or float) or None) – (x, y) of the second point or None (max values)

  • color (str) – color of the line

  • linestyle (str) – linestyle

Note

The same line will be show when p1 and p2 is reordered.

plot(data, colormap=None, color_dict=None, **kwargs)[source]

Plot chronological change of the data.

Parameters
  • data (pandas.DataFrame) –

    data to show Index

    reset index

    Columns

    x (int or float): x values y (int or float): y values

  • colormap (str, matplotlib colormap object or None) – colormap, please refer to https://matplotlib.org/examples/color/colormaps_reference.html

  • color_dict (dict[str, str] or None) – dictionary of column names (keys) and colors (values)

  • kwargs – keyword arguments of pandas.DataFrame.plot()

class Scenario(jhu_data=None, population_data=None, country=None, province=None, tau=None, auto_complement=True)[source]

Bases: covsirphy.util.term.Term

Scenario analysis.

Parameters
  • jhu_data (covsirphy.JHUData or None) – object of records

  • population_data (covsirphy.PopulationData or None) – PopulationData object

  • country (str) – country name (must not be None)

  • province (str or None) – province name

  • tau (int or None) – tau value

  • auto_complement (bool) – if True and necessary, the number of cases will be complemented

Note

@jhu_data and @population_data must be registered with Scenario.register() if not specified here.

add(name='Main', end_date=None, days=None, model=None, tau=None, **kwargs)[source]

Add a new phase. The start date will be the next date of the last registered phase.

Parameters
  • name (str) – phase series name, ‘Main’ or user-defined name

  • end_date (str) – end date of the new phase

  • days (int) – the number of days to add

  • model (covsirphy.ModelBase or None) – ODE model or None (not specified here)

  • tau (int or None) – tau value [min] or None (not specified here)

  • kwargs – keyword arguments of ODE model parameters, not including tau value.

Raises
  • ValueError – @end_date if smaller than the last end date of registered phases

  • KeyError – model was registered, but some parameter values were not specified

Returns

self

Return type

covsirphy.Scenario

Note

If @end_date and @days are None, the end date will be the last date of the records.

Note

When registered, ODE model and tau value will not be updated by @model and @tau.

Note

If both of @end_date and @days were specified, @end_date will be used.

Note

When ODE model and tau value has been or were registered, parameter values will be also added. Default values are that of the last phase. Er can change them with kwargs.

add_phase(**kwargs)
adjust_end()[source]

Adjust the last end dates of the registered scenarios, if necessary.

Returns

self

Return type

covsirphy.Scenario

backup(filename)[source]

Backup scenario information to a JSON file (so that we can use it with Scenario.restore()).

Parameters

filename (str or pathlib.Path) – JSON filename to backup the information

clear(name='Main', include_past=False, template='Main')[source]

Clear phase information.

Parameters
  • name (str) – scenario name

  • include_past (bool) – if True, past phases will be removed as well as future phases

  • template (str) – name of template scenario

Returns

self

Return type

covsirphy.Scenario

Note

If un-registered scenario name was specified, new scenario will be created. Future phases will be always deleted.

combine(phases, name='Main', **kwargs)[source]

Combine the sequential phases as one phase. New phase name will be automatically determined.

Parameters
  • phases (list[str]) – list of phases

  • name (str, optional) – name of phase series

  • kwargs – keyword arguments of parameters

Note

kwargs will be ignore when model and tau is not registered.

Raises

TypeError – @phases is not a list

Returns

self

Return type

covsirphy.Scenario

complement(**kwargs)[source]

Complement the number of recovered cases, if necessary.

Parameters

kwargs – the other arguments of JHUData.subset_complement()

Returns

self

Return type

covsirphy.Scenario

complement_reverse()[source]

Restore the raw records. Reverse method of covsirphy.Scenario.complement().

Returns

self

Return type

covsirphy.Scenario

delete(phases=None, name='Main')[source]

Delete phases.

Parameters
  • phase (list[str] or None) – phase names, or [‘last’]

  • name (str) – name of phase series

Returns

self

Return type

covsirphy.Scenario

Note

If @phases is None, the phase series will be deleted. When @phase is ‘0th’, disable 0th phase. 0th phase will not be deleted. If the last phase is included in @phases, the dates will be released from phases. If the last phase is not included, the dates will be assigned to the previous phase.

describe(with_rt=True, **kwargs)[source]

Describe representative values.

Parameters
  • with_rt (bool) – whether show the history of Rt values

  • kwargs – the other arguments will be ignored

Returns

Index

str: scenario name

Columns
  • max(Infected): max value of Infected

  • argmax(Infected): the date when Infected shows max value

  • Confirmed({date}): Confirmed on the next date of the last phase

  • Infected({date}): Infected on the next date of the last phase

  • Fatal({date}): Fatal on the next date of the last phase

  • nth_Rt etc.: Rt value if the values are not the same values

Return type

pandas.DataFrame

disable(phases, name='Main')[source]

The phases will be disabled and removed from summary.

Parameters
  • phase (list[str] or None) – phase names or None (all enabled phases)

  • name (str) – scenario name

Returns

self

Return type

covsirphy.Scenario

enable(phases, name='Main')[source]

The phases will be enabled and appear in summary.

Parameters
  • phase (list[str] or None) – phase names or None (all disabled phases)

  • name (str) – scenario name

Returns

self

Return type

covsirphy.Scenario

estimate(model, phases=None, name='Main', **kwargs)[source]

Perform parameter estimation for each phases.

Parameters
  • model (covsirphy.ModelBase) – ODE model

  • phases (list[str]) – list of phase names, like 1st, 2nd…

  • name (str) – phase series name

  • kwargs – keyword arguments of ODEHander(), ODEHandler.estimate_tau() and .estimate_param()

Note

If @name phase was not registered, new tracker will be created.

Note

If @phases is None, all past phase will be used.

estimate_accuracy(phase, name='Main', **kwargs)[source]

Show the accuracy as a figure.

Parameters
  • phase (str) – phase name, like 1st, 2nd…

  • name (str) – phase series name

  • kwargs – keyword arguments of covsirphy.compare_plot()

Note

If ‘Main’ was used as @name, main PhaseSeries will be used.

estimate_delay(oxcgrt_data=None, indicator='Stringency_index', target='Confirmed', percentile=25, limits=(7, 30), **kwargs)[source]

Estimate delay period [days], assuming the indicator impact on the target value with delay. The average of representative value (percentile) and @min_size will be returned.

Parameters
  • oxcgrt_data (covsirphy.OxCGRTData) – OxCGRT dataset

  • indicator (str) – indicator name, a column of any registered datasets

  • target (str) – target name, a column of any registered datasets

  • percentile (int) – percentile to calculate the representative value, in (0, 100)

  • limits (tuple(int, int)) – minimum/maximum size of the delay period [days]

  • kwargs – keyword arguments of DataHandler.estimate_delay()

Raises
  • NotRegisteredMainError – JHUData was not registered

  • SubsetNotFoundError – failed in subsetting because of lack of data

  • UserWarning – failed in calculating and returned the default value (recovery period)

Returns

  • int: the estimated number of days of delay [day] (mode value)

  • pandas.DataFrame:
    Index

    reset index

    Columns
    • (int or float): column defined by @indicator

    • (int or float): column defined by @target

    • (int): column defined by @delay_name [days]

Return type

tuple(int, pandas.DataFrame)

Note

  • Average recovered period of JHU dataset will be used as returned value when the estimated value was not in value_range.

  • @oxcgrt_data argument was deprecated. Please use Scenario.register(extras=[oxcgrt_data]).

estimate_history(**kwargs)
property first_date

the first date of the records

Type

str

fit(oxcgrt_data=None, name='Main', delay=(7, 31), removed_cols=None, metric='R2', **kwargs)[source]

Fit regressors to predict the parameter values in the future phases, assuming that indicators will impact on ODE parameter values/the number of cases with delay. Please refer to covsirphy.RegressionHander class.

Parameters
  • oxcgrt_data (covsirphy.OxCGRTData) – OxCGRT dataset, deprecated

  • name (str) – scenario name

  • test_size (float) – proportion of the test dataset of Elastic Net regression

  • seed (int) – random seed when spliting the dataset to train/test data

  • delay (int or tuple(int, int) or None) –

    • (int): delay period [days],

    • tuple(int, int): select the best value with grid search in this range, or

    • None: Scenario.estimate_delay() calculate automatically

  • removed_cols (list[str] or None) – list of variables to remove from X dataset or None (indicators used to estimate delay period)

  • metric (str) – metric name

  • kwargs – keyword arguments of sklearn.model_selection.train_test_split()

Raises

covsirphy.UnExecutedError – Scenario.estimate() or Scenario.add() were not performed

Returns

this is the same as covsirphy.Regressionhander.to_dict()

Return type

dict

Note

@oxcgrt_data argument was deprecated. Please use Scenario.register(extras=[oxcgrt_data]).

Note

Please refer to covsirphy.Evaluator.score() for metric names.

Note

If @seed is included in kwargs, this will be converted to @random_state.

Note

default values regarding sklearn.model_selection.train_test_split() are test_size=0.2, random_state=0, shuffle=False.

fit_predict(oxcgrt_data=None, name='Main', **kwargs)[source]

Predict parameter values of the future phases using Elastic Net regression with OxCGRT scores, assuming that OxCGRT scores will impact on ODE parameter values with delay. New future phases will be added (over-written).

Parameters
  • oxcgrt_data (covsirphy.OxCGRTData or None) – OxCGRT dataset

  • name (str) – scenario name

  • kwargs – the other arguments of Scenario.fit() and Scenario.predict()

Raises
Returns

self

Return type

covsirphy.Scenario

Note

@oxcgrt_data argument was deprecated. Please use Scenario.register(extras=[oxcgrt_data]).

get(param, phase='last', name='Main')[source]

Get the parameter value of the phase.

Parameters
  • param (str) – parameter name (columns in self.summary())

  • phase (str) – phase name or ‘last’ - if ‘last’, the value of the last phase will be returned

  • name (str) – phase series name

Returns

str or int or float

Note

If ‘Main’ was used as @name, main PhaseSeries will be used.

history(target, with_actual=True, ref_name='Main', **kwargs)[source]

Show the history of variables and parameter values to compare scenarios.

Parameters
  • target (str) – parameter or variable name to show (Rt, Infected etc.)

  • with_actual (bool) – if True and @target is a variable name, show actual number of cases

  • ref_name (str) – name of reference scenario to specify phases and dates

  • kwargs – the other keyword arguments of Scenario.line_plot() and PhaseTracker.parse_range()

Returns

Index

Date (pandas.Timestamp)

Columns

{scenario name} (int or float): values of the registered scenario

Return type

pandas.DataFrame

history_rate(params=None, name='Main', **kwargs)[source]

Show change rates of parameter values in one figure. We can find the parameters which increased/decreased significantly.

Parameters
  • params (list[str] or None) – parameters to show

  • name (str) – phase series name

  • kwargs – the other keyword arguments of Scenario.line_plot()

Returns

pandas.DataFrame

property interactive

interactive mode (display figures) or not

Note

When running scripts, interactive mode cannot be selected.

Type

bool

property last_date

the last date of the records

Type

str

line_plot(df, show_figure=True, filename=None, **kwargs)[source]

Display or save a line plot of the dataframe.

Parameters
  • show_figure (bool) – whether show figure when interactive mode or not

  • filename (str or None) – filename of the figure or None (not save) when script mode

Note

When interactive mode and @show_figure is True, display the figure. When script mode and filename is not None, save the figure. When using interactive shell, we can change the modes by Scenario.interactive = True/False.

param_history(**kwargs)
phase_estimator(**kwargs)
predict(days=None, name='Main')[source]

Predict parameter values of the future phases using Elastic Net regression with OxCGRT scores, assuming that OxCGRT scores will impact on ODE parameter values with delay. New future phases will be added (over-written).

Parameters
  • days (list[int]) – list of days to predict or None (only the max value)

  • name (str) – scenario name

Raises
Returns

self

Return type

covsirphy.Scenario

records(variables=None, **kwargs)[source]

Return the records as a dataframe.

Parameters
  • variables (list[str] or str or None) – variable names or abbreviated names

  • kwargs – the other keyword arguments of Scenario.line_plot()

Raises
Returns

pandas.DataFrame

Index

reset index

Columns
  • Date (pd.Timestamp): Observation date

  • Columns set by @variables (int)

Note

  • Records with Recovered > 0 will be selected.

  • If complement was performed by Scenario.complement() or Scenario(auto_complement=True),

The kind of complement will be added to the title of the figure.

Note

Selectable values of @variables are as follows. - None: return default list, [“Infected”, “Recovered”, “Fatal”] (changed in the future) - list[str]: return the selected variables - “all”: the all available variables - str: abbr, like “CIFR” (Confirmed/Infected/Fatal/Recovered), “CFR”, “RC”

records_diff(variables=None, window=7, **kwargs)[source]

Return the number of daily new cases (the first discreate difference of records).

Parameters
  • variables (list[str] or str or None) – variable names or abbreviated names (as the same as Scenario.records())

  • window (int) – window of moving average, >= 1

  • kwargs – the other keyword arguments of Scenario.line_plot()

Returns

pandas.DataFrame
Index
  • Date (pd.Timestamp): Observation date

Columns
  • Confirmed (int): daily new cases of Confirmed, if calculated

  • Infected (int): daily new cases of Infected, if calculated

  • Fatal (int): daily new cases of Fatal, if calculated

  • Recovered (int): daily new cases of Recovered, if calculated

register(jhu_data=None, population_data=None, extras=None)[source]

Register datasets.

Parameters
Raises
  • TypeError – non-data cleaning instance was included

  • UnExpectedValueError – instance of un-expected data cleaning class was included as an extra dataset

restore(filename)[source]

Restore scenario information with a JSON file (written by Scenario.backup()).

Parameters

filename (str or pathlib.Path) – JSON file to read

Returns

self

Return type

covsirphy.Scenario

Note

If keyword arguments of Scenario.timepoint() are available as the values of “timepoint” key, timpoints will be restored.

retrospective(beginning_date, model, control='Main', target='Target', **kwargs)[source]

Perform retrospective analysis. Compare the actual series of phases (control) and series of phases with specified parameters (target).

Parameters
  • beginning_date (str) – when the parameter values start to be changed from actual values

  • model (covsirphy.ModelBase) – ODE model

  • control (str) – scenario name of control

  • target (str) – scenario name of target

  • kwargs – keyword arguments of ODEHander(), ODEHandler.estimate_tau() and .estimate_param()

Note

When parameter values are not specified, actual values of the last date before the beginning date will be used.

score(variables=None, name='Main', **kwargs)[source]

Evaluate accuracy of phase setting and parameter estimation.

Parameters
  • variables (list[str] or None) – variables to use in calculation

  • name (str) – phase series name. If ‘Main’, main PhaseSeries will be used

  • kwargs – keyword arguments of covsirphy.Evaluator.score() and PhaseTracker.parse_range()

Returns

score with the specified metrics (covsirphy.Evaluator.score())

Return type

float

Note

If @variables is None, [“Infected”, “Fatal”, “Recovered”] will be used.

Note

“Susceptible”, “Confirmed”, “Infected”, “Fatal” and “Recovered” can be used in @variables.

separate(date, name='Main', **kwargs)[source]

Create a new phase with the change point. New phase name will be automatically determined.

Parameters
  • date (pandas.Timestamp or str) – change point, i.e. start date of the new phase

  • name (str) – scenario name

  • kwargs – will be ignored

Raises

ValueError – the date is close to one of the registered change dates

Returns

self

Return type

covsirphy.Scenario

show_complement(**kwargs)[source]

Show the details of complement that was (or will be) performed for the records.

Parameters

kwargs – keyword arguments of JHUDataComplementHandler() i.e. control factors of complement

Returns

as the same as JHUData.show_complement()

Return type

pandas.DataFrame

simulate(variables=None, name='Main', **kwargs)[source]

Simulate ODE models with set/estimated parameter values and show it as a figure.

Parameters
  • variables (list[str] or str or None) – variable names or abbreviated names (as the same as Scenario.records())

  • name (str) – phase series name. If ‘Main’, main PhaseSeries will be used

  • kwargs – the other keyword arguments of Scenario.line_plot() and PhaseTracker.parse_range()

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pandas.Timestamp): Observation date

  • Country (str): country/region name

  • Province (str): province/prefecture/state name

  • Variables of the main dataset (int): Confirmed etc.

summary(columns=None, name=None)[source]

Summarize the series of phases and return a dataframe.

Parameters
  • name (str) – phase series name - name of alternative phase series registered by Scenario.add() - if None, all phase series will be shown

  • columns (list[str] or None) – columns to show

Returns

  • if @name not None, as the same as PhaseTracker().summary()

  • if @name is None, index will be phase series name and phase name

Return type

pandas.DataFrame

Note

If ‘Main’ was used as @name, main PhaseSeries will be used.

Note

If @columns is None, all columns will be shown.

Note

“Start” and “End” are string at this time.

timepoints(first_date=None, last_date=None, today=None)[source]

Set the range of data and reference date to determine past/future of phases.

Parameters
  • first_date (str or None) – the first date of the records or None (min date of main dataset)

  • last_date (str or None) – the first date of the records or None (max date of main dataset)

  • today (str or None) – reference date to determine whether a phase is a past phase or a future phase

Raises

Note

When @today is None, the reference date will be the same as @last_date (or max date).

property today

reference date to determine whether a phase is a past phase or a future phase

Type

str

track(with_actual=True, ref_name='Main', **kwargs)[source]

Show values of parameters and variables in one dataframe.

Parameters
  • with_actual (bool) – if True, show actual number of cases will included as “Actual” scenario

  • ref_name (str) – name of reference scenario to specify phases and dates

  • kwargs – keyword arguments of PhaseTracker.parse_range()

Returns

tracking records
Index

reset index

Columns
  • Scenario (str)

  • Date (pandas.TimeStamp)

  • Confirmed (int): the number of confirmed cases

  • Infected (int): the number of currently infected cases

  • Fatal (int): the number of fatal cases

  • Recovered (int): the number of recovered cases

  • Susceptible (int): the number of susceptible cases

  • Population (int)

  • If available,
    • Rt (float)

    • parameter values (float)

    • day parameter values (int)

Return type

pandas.DataFrame

trend(min_size=None, force=True, name='Main', show_figure=True, filename=None, **kwargs)[source]

Perform S-R trend analysis and set phases.

Parameters
  • min_size (int or None) – minimum value of phase length [days] (over 2) or None (equal to max of 7 and delay period)

  • force (bool) – if True, change points will be over-written

  • name (str) – phase series name

  • show_figure (bool) – if True, show the result as a figure

  • filename (str) – filename of the figure, or None (display)

  • kwargs – keyword arguments of covsirphy.TrendDetector(), .TrendDetector.sr() and .trend_plot()

Returns

self

Return type

covsirphy.Scenario

Note

If @min_size is None, this will be thw max value of 7 days and delay period calculated with .estimate_delay() method.

exception ScenarioNotFoundError(name)[source]

Bases: KeyError

Error when unregistered scenario name was specified.

Parameters

name (str) – scenario name

class StopWatch[source]

Bases: object

Calculate elapsed time.

static show(time_sec)[source]

Show the elapsed time as string.

Parameters

time_sec (int) – time [sec]

Returns

eg. ‘1 min 30 sec’

Return type

str

stop()[source]

Stop.

Returns

elapsed time [sec]

Return type

int

stop_show()[source]

Stop and show time.

Returns

eg. ‘1 min 30 sec’

Return type

str

exception SubsetNotFoundError(country, country_alias=None, province=None, start_date=None, end_date=None, date=None, message=None)[source]

Bases: KeyError, ValueError

Error when subset was failed with specified arguments.

Parameters
  • country (str) – country name

  • country_alias (str or None) – country name used in the dataset

  • province (str or None) – province name

  • start_date (str or None) – start date, like 22Jan2020

  • end_date (str or None) – end date, like 01Feb2020

  • date (str or None) – specified date, like 22Jan2020

  • message (str or None) – the other messages

class Term[source]

Bases: object

Term definition.

A = '_actual'
ACTUAL = 'Actual'
AREA_ABBR_COLS = ['ISO3', 'Country', 'Province']
AREA_COLUMNS = ['Country', 'Province']
C = 'Confirmed'
CI = 'Infected'
COLUMNS = ['Date', 'Country', 'Province', 'Confirmed', 'Infected', 'Fatal', 'Recovered']
COUNTRY = 'Country'
DATE = 'Date'
DATE_FORMAT = '%d%b%Y'
DATE_FORMAT_DESC = 'DDMmmYYYY'
DSIFR_COLUMNS = ['Date', 'Susceptible', 'Infected', 'Fatal', 'Recovered']
E = 'Exposed'
END = 'End'
F = 'Fatal'
FIG_COLUMNS = ['Infected', 'Fatal', 'Recovered', 'Fatal or Recovered', 'Vaccinated', 'Exposed', 'Waiting']
FITTED = 'Fitted'
FR = 'Fatal or Recovered'
FUTURE = 'Future'
ID = 'ID'
INITIAL = 'Initial'
ISO3 = 'ISO3'
MAIN = 'Main'
MONO_COLUMNS = ['Confirmed', 'Fatal', 'Recovered']
N = 'Population'
NLOC_COLUMNS = ['Date', 'Confirmed', 'Infected', 'Fatal', 'Recovered']
ODE = 'ODE'
OTHERS = 'Others'
P = '_predicted'
PARAM_DICT = 'param_dict'
PAST = 'Past'
PHASE = 'Phase'
PRODUCT = 'Product'
PROVINCE = 'Province'
R = 'Recovered'
RATE_COLUMNS = ['Fatal per Confirmed', 'Recovered per Confirmed', 'Fatal per (Fatal or Recovered)']
RT = 'Rt'
RT_FULL = 'Reproduction number'
RUNTIME = 'Runtime'
S = 'Susceptible'
SEP = '/'
SERIES = 'Scenario'
START = 'Start'
STEP_N = 'step_n'
STR_COLUMNS = ['Date', 'Country', 'Province']
SUB_COLUMNS = ['Date', 'Confirmed', 'Infected', 'Fatal', 'Recovered', 'Susceptible']
SUFFIX_DICT = {1: 'st', 2: 'nd', 3: 'rd'}
T = 'Elapsed'
TAU = 'tau'
TENSE = 'Type'
TESTS = 'Tests'
TRIALS = 'Trials'
TS = 't'
UNKNOWN = '-'
V = 'Vaccinated'
VAC = 'Vaccinations'
VALUE_COLUMNS = ['Confirmed', 'Infected', 'Fatal', 'Recovered']
V_FULL = 'Vaccinated_full'
V_ONCE = 'Vaccinated_once'
W = 'Waiting'
Y0_DICT = 'y0_dict'
classmethod date_change(date_str, days=0)[source]

Return @days days ago or @days days later.

Parameters
  • date_str (str) – today

  • days (int) – (negative) days ago or (positive) days later

Returns

the date

Return type

str

date_obj(**kwargs)
classmethod divisors(value)[source]

Return the list of divisors of the value.

Parameters

value (int) – target value

Returns

the list of divisors

Return type

list[int]

static flatten(nested_list, unique=True)[source]

Flatten the nested list.

Parameters
  • nested_list (list[list[object]]) – nested list

  • unique (bool) – if True, only unique values will remain

Returns

list[object]

static linear(x, a, b)[source]

Linear function f(x) = A x + b.

Parameters
  • x (float) – x values

  • a (float) – the first parameter of the function

  • b (float) – the second parameter of the function

Returns

float

static negative_exp(x, a, b)[source]

Negative exponential function f(x) = A exp(-Bx).

Parameters
  • x (float) – x values

  • a (float) – the first parameters of the function

  • b (float) – the second parameters of the function

Returns

float

classmethod num2str(num)[source]

Convert numbers to 1st, 2nd etc.

Parameters

num (int) – number

Returns

str

classmethod steps(start_date, end_date, tau)[source]

Return the number of days (round up).

Parameters
  • start_date (str) – start date, like 01Jan2020

  • end_date (str) – end date, like 01Jan2020

  • tau (int) – tau value [min]

static str2num(string, name='phase names')[source]

Convert 1st to 1 and so on.

Parameters
  • string (str) – like 1st, 2nd, 3rd,…

  • name (str) – name of the string

Returns

int

classmethod tomorrow(date_str)[source]

Tomorrow of the date.

Parameters

date_str (str) – today

Returns

tomorrow

Return type

str

classmethod yesterday(date_str)[source]

Yesterday of the date.

Parameters

date_str (str) – today

Returns

yesterday

Return type

str

class Trend(**kwargs)[source]

Bases: covsirphy.trend.trend_detector.TrendDetector

Deprecated. Please use TrendDetector class.

class TrendDetector(data, area='Selected area', min_size=7)[source]

Bases: covsirphy.util.term.Term

Interface for trend analysis (change point analysis).

Parameters
  • data (pandas.DataFrame) –

    data to analyse Index:

    reset index

    Column:
    • Date(pd.Timestamp): Observation date

    • Confirmed(int): the number of confirmed cases

    • Infected(int): the number of currently infected cases

    • Fatal(int): the number of fatal cases

    • Recovered (int): the number of recovered cases

    • Susceptible(int): the number of susceptible cases

  • area (str) – area name (used in the figure title)

  • min_size (int) – minimum value of phase length [days], over 2

Note

“Phase” means a sequential dates in which the parameters of SIR-derived models are fixed. “Change points” means the dates when trend was changed. “Change points” is the same as the start dates of phases except for the 0th phase.

dates()[source]

Return the list of start dates and end dates.

Returns

list of start dates and end dates

Return type

tuple(list[str], list[str])

reset()[source]

Reset the phase setting with the end dates of the records.

Returns

self

Return type

covsirphy.TrendDetector

show(**kwargs)[source]

Show the trend on S-R plane.

Parameters

kwargs – keyword arguments of covsirphy.trend_plot()

sr(algo='Binseg-normal', **kwargs)[source]

Perform S-R trend analysis.

Parameters
  • algo (str) – detection algorithms and models

  • kwargs – the other arguments of algorithm classes (ruptures.Pelt, .Binseg, BottomUp)

Raises

UnExpectedValueError – un-expected value was applied as algorithm name

Returns

self

Return type

covsirphy.TrendDetector

Note

Candidates of @algo are “Pelt-rbf”, “Binseg-rbf”, “Binseg-normal”, “BottomUp-rbf”, “BottomUp-normal”. Please refer to documentation of ruptures package. https://centre-borelli.github.io/ruptures-docs/

summary(metric=None, metrics='MSE')[source]

Summarize the phases with a dataframe.

Parameters
  • metric (str or None) – metric name or None (use @metrics)

  • metrics (str) – alias of @metric

Returns

Index

(str): phase names

Columns
  • Start (str): star dates

  • End (str): end dates

  • Duration (int): phase duration

  • {metric}_S-R (float): scores on S-R plane with the selected metric

Return type

pandas.Dataframe

Note

Please refer to covsirphy.Evaluator.score() for metric names

class TrendPlot(filename=None, bbox_inches='tight', **kwargs)[source]

Bases: covsirphy.visualization.line_plot.LinePlot

Create line plot with actual values for S-R trend analysis.

Parameters
  • filename (str or None) – filename to save the figure or None (display)

  • bbox_inches (str) – bounding box in inches when creating the figure

  • kwargs – the other arguments of matplotlib.pyplot.savefig()

legend(bbox_to_anchor=(0.5, - 0.5), bbox_loc='lower center', ncol=7, **kwargs)[source]

Set legend.

Parameters
  • bbox_to_anchor (tuple(int or float, int or float)) – distance of legend and plot

  • bbox_loc (str) – location of legend

  • ncol (int or None) – the number of columns that the legend has

  • kwargs – keyword arguments of matplotlib.pyplot.legend()

plot(data, actual_col='Actual')[source]

Plot chronological change of the data with multiple lines.

Parameters
  • data (pandas.DataFrame) –

    data to show Index

    x values

    Columns
    • column defined by @actual_col, actual values for y-axis

    • the other arguments will be assumed as predicted values for y-axis

  • actual_col (str) – column name for y-axis

x_axis(xlabel=None, xlim=(None, None))[source]

Set x axis.

Parameters
  • xlabel (str or None) – x-label

  • xlim (tuple(int or float, int or float)) – limit of x dimain

Note

When xlim[0] is None and lower x-axis limit determined by matplotlib automatically is lower than 0, lower x-axis limit will be set to 0.

y_axis(ylabel=None)[source]

Set x axis.

Parameters

ylabel (str or None) – y-label

exception UnExecutedError(method_name, message=None)[source]

Bases: AttributeError, NameError, ValueError

Error when we have unexecuted methods that we need to run in advance.

Parameters
  • method_name (str) – method name to run in advance

  • message (str or None) – the other messages

exception UnExpectedReturnValueError(name, value, plural=False, message=None)[source]

Bases: ValueError

Error when unexpected value was returned.

Parameters
  • name (str) – argument name

  • value (object) – value user applied or None (will not be shown)

  • plural (bool) – whether prulal or not

  • message (str or None) – the other messages

exception UnExpectedValueError(name, value, candidates, message=None)[source]

Bases: ValueError

Error when unexpected value was applied as the value of an argument.

Parameters
  • name (str) – argument name

  • value (object) – value user applied

  • candidates (list[object]) – candidates of the argument

  • message (str or None) – the other messages

class VaccineData(filename, force=False, verbose=1)[source]

Bases: covsirphy.cleaning.cbase.CleaningBase

Dataset regarding vaccination retrieved from “Our World In Data”. https://github.com/owid/covid-19-data/tree/master/public/data https://ourworldindata.org/coronavirus

Parameters
  • filename (str or pathlib.path) – CSV filename to save the raw dataset

  • force (bool) – if True, always download the dataset from the server

  • verbose (int) – level of verbosity

Note

Columns of VaccineData.cleaned():
  • Date (pandas.TimeStamp): observation dates

  • Country (pandas.Category): country (or province) names

  • ISO3 (pandas.Category): ISO3 codes

  • Product (pandas.Category): product names

  • Vaccinations (int): cumulative number of vaccinations

  • Vaccinated_once (int): cumulative number of people who received at least one vaccine dose

  • Vaccinated_full (int): cumulative number of people who received all doses prescrived by the protocol

URL = 'https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/'
URL_LOC = 'https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/locations.csv'
URL_REC = 'https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv'
VAC_COLS = ['Date', 'Country', 'ISO3', 'Product', 'Vaccinations', 'Vaccinated_once', 'Vaccinated_full']
VAC_SUBSET_COLS = ['Date', 'Vaccinations', 'Vaccinated_once', 'Vaccinated_full']
map(country=None, variable='Vaccinations', date=None, **kwargs)[source]

Create colored map with the number of vaccinations.

Parameters
  • country (None) – always None

  • variable (str) – variable to show

  • date (str or None) – date of the records or None (the last value)

  • kwargs – arguments of ColoredMap() and ColoredMap.plot()

Raises

NotImplementedError – @country was specified

records(country, product=None, start_date=None, end_date=None)[source]

Return subset of the country/province and start/end date.

Parameters
  • country (str or None) – country name or ISO3 code

  • product (str or None) – product name

  • start_date (str or None) – start date, like 22Jan2020

  • end_date (str or None) – end date, like 01Feb2020

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pandas.TimeStamp): observation date

  • Vaccinations (int): the number of vaccinations

  • Vaccinated_once (int): cumulative number of people who received at least one vaccine dose

  • Vaccinated_full (int): cumulative number of people who received all doses prescrived by the protocol

subset(country, product=None, start_date=None, end_date=None)[source]

Return subset of the country/province and start/end date.

Parameters
  • country (str or None) – country name or ISO3 code

  • product (str or None) – product name

  • start_date (str or None) – start date, like 22Jan2020

  • end_date (str or None) – end date, like 01Feb2020

Returns

pandas.DataFrame
Index

reset index

Columns
  • Date (pandas.TimeStamp): observation date

  • Vaccinations (int): the number of vaccinations

  • Vaccinated_once (int): cumulative number of people who received at least one vaccine dose

  • Vaccinated_full (int): cumulative number of people who received all doses prescrived by the protocol

total()[source]

Calculate total values of the cleaned dataset.

Returns

Index

reset index

Columns
  • Date (pandas.TimeStamp): observation date

  • Vaccinations (int): the number of vaccinations

  • Vaccinated_once (int): cumulative number of people who received at least one vaccine dose

  • Vaccinated_full (int): cumulative number of people who received all doses prescrived by the protocol

Return type

pandas.DataFrame

class VisualizeBase(filename=None, bbox_inches='tight', **kwargs)[source]

Bases: covsirphy.util.term.Term

Base class for visualization.

Parameters
  • filename (str or None) – filename to save the figure or None (display)

  • bbox_inches (str) – bounding box in inches when creating the figure

  • kwargs – the other arguments of matplotlib.pyplot.savefig

property ax

axis

Type

matplotlib.axis

legend(bbox_to_anchor=(0.5, - 0.2), bbox_loc='lower center', ncol=None, **kwargs)[source]

Set legend.

Parameters
  • bbox_to_anchor (tuple(int or float, int or float)) – distance of legend and plot

  • bbox_loc (str) – location of legend

  • ncol (int or None) – the number of columns that the legend has

  • kwargs – keyword arguments of matplotlib.pyplot.legend()

legend_hide()[source]

Hide legend.

plot()[source]

Method for plotting. This will be defined in child classes.

Raises

NotImplementedError – not implemented

tick_params(**kwargs)[source]

Directly calling matplotlib.pyplot.tick_params, change the appearance of ticks, tick labels and gridlines.

Parameters

kwargs – arguments of matplotlib.pyplot.tick_params

property title

title of the figure

Type

str

class Word(**kwargs)[source]

Bases: covsirphy.util.term.Term

bar_plot(df, title=None, filename=None, show_legend=True, **kwargs)[source]

Wrapper function: show chronological change of the data.

Parameters
  • data (pandas.DataFrame or pandas.Series) –

    data to show Index

    Date (pandas.Timestamp)

    Columns

    variables to show

  • title (str) – title of the figure

  • filename (str or None) – filename to save the figure or None (display)

  • show_legend (bool) – whether show legend or not

  • kwargs – keyword arguments of the following classes and methods. - covsirphy.BarPlot() and its methods, - matplotlib.pyplot.savefig(), matplotlib.pyplot.legend(), - pandas.DataFrame.plot()

compare_plot(df, variables, groups, filename=None, **kwargs)[source]

Wrapper function: show chronological change of the data.

Parameters
  • df (pandas.DataFrame) –

    data to show Index

    x values

    Columns

    y variables to show, “{variable}_{group}” for all combinations of variables and groups

  • variables (list[str]) – variables to compare

  • groups (list[str]) – the first group name and the second group name

  • filename (str or None) – filename to save the figure or None (display)

  • kwargs – keyword arguments of the following classes and methods. - matplotlib.pyplot.savefig() - matplotlib.pyplot.legend()

deprecate(old, new=None, version=None)[source]

Decorator to raise deprecation warning.

Parameters
  • old (str) – description of the old method/function

  • new (str or None) – description of the new method/function

  • version (str or None) – version number, like 2.7.3-alpha

find_args(func_list, **kwargs)[source]

Find values of enabled arguments of the function from the keyword arguments.

Parameters
  • func_list (list[function] or function) – target function

  • kwargs – keyword arguments

Returns

dictionary of enabled arguments

Return type

dict

jpn_map(*args, **kwargs)
line_plot(df, title=None, filename=None, show_legend=True, **kwargs)[source]

Wrapper function: show chronological change of the data.

Parameters
  • data (pandas.DataFrame or pandas.Series) –

    data to show Index

    Date (pandas.Timestamp)

    Columns

    variables to show

  • title (str) – title of the figure

  • filename (str or None) – filename to save the figure or None (display)

  • show_legend (bool) – whether show legend or not

  • kwargs – keyword arguments of the following classes and methods. - covsirphy.LinePlot() and its methods, - matplotlib.pyplot.savefig(), matplotlib.pyplot.legend(), - pandas.DataFrame.plot()

line_plot_multiple(*args, **kwargs)
save_dataframe(*args, **kwargs)
scatter_plot(df, title=None, filename=None, **kwargs)[source]

Wrapper function: show chronological change of the data.

Parameters
  • data (pandas.DataFrame) –

    data to show Index

    reset index

    Columns

    x (int or float): x values y (int or float): y values

  • title (str) – title of the figure

  • filename (str or None) – filename to save the figure or None (display)

  • kwargs – keyword arguments of the following classes and methods. - covsirphy.ScatterPlot() and its methods, - matplotlib.pyplot.savefig(), matplotlib.pyplot.legend(), - pandas.DataFrame.plot()

trend_plot(df, title=None, filename=None, show_legend=True, **kwargs)[source]

Wrapper function: show chronological change of the data.

Parameters
  • df (pandas.DataFrame) –

    data to show Index

    x values

    Columns
    • column defined by @actual_col, actual values for y-axis

    • columns defined by @predicted_cols, predicted values for y-axis

  • actual_col (str) – column name for y-axis

  • predicted_cols (list[str]) – list of columns which have predicted values

  • title (str) – title of the figure

  • filename (str or None) – filename to save the figure or None (display)

  • show_legend (bool) – whether show legend or not

  • kwargs – keyword arguments of the following classes and methods. - covsirphy.TrendPlot() and its methods, - matplotlib.pyplot.savefig() and matplotlib.pyplot.legend()

Subpackages