covsirphy.science package

covsirphy.science.ml module

class MLEngineer(seed=0, **kwargs)[source]

Bases: Term

Class for machine learning and preprocessing.

Parameters:

seed (int or None) – random seed

forecast(Y, days, X=None, **kwargs)[source]

Forecast Y for given days with/without indicators (X).

Parameters:
  • Y (pandas.DataFrame) –

    Index

    pandas.Timestamp: Observation date

    Columns

    observed and the target variables (int or float)

  • X (pandas.DataFrame or None) –

    indicators for regression or None (no indicators) Index

    pandas.Timestamp: Observation date

    Columns

    observed and the target variables (int or float)

  • days (int) – days to predict

  • **kwargs – keyword arguments of autots.AutoTS() except for verbose, forecast_length (always the same as @days)

Returns:

pandas.DataFrame

Index

pandas.Timestamp: Observation date, from the next date of Y.index to the ast predicted date

Columns

observed and the target variables (int or float)

Note

AutoTS package is developed at https://github.com/winedarksea/AutoTS

pca(X, n_components=0.95)[source]

Perform PCA (principal component analysis) after standardization (Z-score normalization) with pca package.

Parameters:
  • X (pandas.DataFrame or None) –

    Index

    pandas.Timestamp: Observation date

    Columns

    (int or float): observed values of the training vectors

  • n_components (float or int) – _the number of principal components or percentage of variance to cover at least

Returns:

dict of {str

object}: as the same as pca.pca().fit_transform()

{“loadings”: pandas.DataFrame}: structured dataframe containing loadings for PCs {“PC”: pandas.DataFrame}: reduced dimensionality space, the Principal Components (PCs)

Index

pandas.Timestamp

COlumns

PC1, PC2,…

{“explained_var”: array-like}: explained variance for each fo the PCs (same ordering as the PCs) {“variance_ratio”: array-like};: variance ratio {“model”: object}: fitted model to be used for further usage of the model {“scaler”: object}: scaler model {“pcp”: int}: pcp {“topfeat”: pandas.DataFrame}: top features

Index

reset index

Columns

PC (str): PC1, PC2,… feature (str): feature name of X loading (float): loading values type (str): “best” or “weak

{“outliers”: pandas.DataFrame}: outliers
Index

pandas.Timestamp

Columns

y_proba (float) y_score (float) y_bool (bool) y_bool_spe (bool) y_score_spe (float)

{“outlier_params”: object}: parameter values of the model of finding outliers

Note

Regarding pca package, please refer to https://github.com/erdogant/pca

covsirphy.science.ode_scenario module

class ODEScenario(data, location_name, complement=True)[source]

Bases: Term

Perform scenario analysis, changing ODE parameters.

Parameters:
  • data (pandas.DataFrame) –

    actual data of the number of cases Index

    Date (pandas.Timestamp): observation dates

    Columns

    Population (int): total population Confirmed (int): the number of confirmed cases Recovered (int): the number of recovered cases, must be over 0 Fatal (int): the number of fatal cases Susceptible (int): the number of susceptible cases, will be ignored because overwritten Infected (int): the number of currently infected cases, will be ignored because overwritten the other columns will be ignored

  • location_name (str) – name to identify the location to show in figure titles

  • complement (bool) – perform data complement with covsirphy.DataEngineer().subset(complement=True) or not

Note

Data cleaning will be performed with covsirphy.DataEngineer().clean() automatically.

append(name=None, end=None, **kwargs)[source]

Append a new phase, specifying ODE parameter values.

Parameters:
  • name (str or list[str] None) – scenario name(s) or None (all scenarios)

  • end (pandas.Timestamp or int or None) – end date or the number days of new phase or None (the max date of all scenarios and actual data)

  • **kwargs – keyword arguments of ODE parameter values (default: values of the last phase)

Raises:
Returns:

covsirphy.ODEScenario – self

classmethod auto_build(geo, model, complement=True)[source]

Prepare cleaned and subset data from recommended dataset, create instance, build baseline scenario.

Parameters:
  • geo (tuple(list[str] or tuple(str) or str) or str or None) – country, province, city

  • model (covsirphy.ODEModel) – definition of ODE model

  • complement (bool) – whether perform complement or not

Raises:

SubsetNotFoundError – actual data of the location was not included in the recommended dataset

Returns:

covsirphy.ODEScenario – created instance

Note

geo=None means total values of all countries.

Note

geo=”Japan” and geo=(“Japan”,) means country level data of Japan, as an example.

Note

geo=(“Japan”, “Tokyo”) means prefecture (province) level data of Tokyo/Japan, as an example.

Note

geo=(“USA”, “Alabama”, “Baldwin”) means country level data of Baldwin/Alabama/USA, as an example.

Note

Complemented (if @complement is True) data with Recovered > 0 will be analyzed.

build_with_dynamics(name, dynamics)[source]

Build a scenario with covsirphy.Dynamics() instance.

Parameters:
  • name (str) – scenario name

  • dynamics (covsirphy.Dynamics) – covsirphy.Dynamics() instance which has ODE model, tau value and ODE parameter values

Returns:

covsirphy.ODEScenario – self

build_with_model(name, model, date_range=None, tau=None)[source]

Build a scenario with covsirphy.Dynamics() instance created with the actual data automatically.

Parameters:
  • name (str) – scenario name

  • model (covsirphy.ODEModel) – definition of ODE model

  • date_range (tuple of (str, str) or None) – start date and end date of dynamics to analyze

  • tau (int or None) – tau value [min] or None (set later with data)

Returns:

covsirphy.ODEScenario – self

build_with_template(name, template)[source]

Build a scenario with a template scenario.

Parameters:
  • name (str) – new scenario name

  • template (str) – template name

Raises:

SubsetNotFoundError – scenario with the name is un-registered

Returns:

covsirphy.ODEScenario – self

compare_cases(variable, date_range=None, ref=None, display=True, **kwargs)[source]

Compare the number of cases of scenarios.

Parameters:
  • variable (str) – variable name or alias

  • date_range (tuple of (str, str)) – start date and end date to analyze

  • ref (str or None) – name of reference scenario to specify phases and dates or None (the first scenario)

  • display (bool) – whether display figure of the result or not

  • **kwargs – keyword arguments of covsirphy.line_plot() except for @df

Returns:

pandas.DataFrame

Index

Date (pandas.Timestamp): dates

Columns

Actual (numpy.int64): actual records {scenario name} (numpy.int64): values of the scenario

compare_param(param, date_range=None, ref=None, display=True, **kwargs)[source]

Compare the number of cases of scenarios.

Parameters:
  • param (str) – one of ODE parameters, “Rt”, dimensional parameters

  • date_range (tuple of (str, str)) – start date and end date to analyze

  • ref (str or None) – name of reference scenario to specify phases and dates or None (the first scenario)

  • display (bool) – whether display figure of the result or not

  • **kwargs – keyword arguments of covsirphy.line_plot() except for @df

Returns:

pandas.DataFrame

Index

Date (pandas.Timestamp)

Columns

{scenario name} (str): values of the scenario

delete(pattern, exact=False)[source]

Delete scenario(s).

Parameters:
  • pattern (str) – scenario name or pattern to search

  • exact (bool) – if False, use regular expressions

Returns:

covsirphy.ODEScenario – self

describe()[source]

Describe representative values.

Returns:

pandas.DataFrame

Index

str: scenario name

Columns
  • max(Infected) (numpy.int64): max value of Infected

  • argmax(Infected) (pandas.Timestamp): the date when Infected shows max value

  • Confirmed({date}) (numpy.int64): Confirmed on the last date

  • Infected({date} (numpy.int64)): Infected on the last date

  • Fatal({date}) (numpy.int64): Fatal on the last date

classmethod from_json(filename)[source]

Create ODEScenario instance with a JSON file.

Parameters:

filename (str or Path) – JSON filename

Returns:

covsirphy.ODEScenario – self

predict(days, name, seed=0, verbose=1, X=None, **kwargs)[source]

Create scenarios and append a phase, performing prediction ODE parameter prediction for given days.

Parameters:
  • days (int) – days to predict

  • name (str) – scenario name

  • X (pandas.DataFrame or None) –

    information for regression or None (no information) Index

    pandas.Timestamp: Observation date

    Columns

    observed and the target variables (int or float)

  • seed (int or None) – random seed

  • verbose (int) – verbosity

  • **kwargs – keyword arguments of autots.AutoTS() except for verbose, forecast_length (always the same as @days)

Returns:

covsirphy.ODEScenario – self

Note

AutoTS package is developed at https://github.com/winedarksea/AutoTS

Note

Phases are determined with rounded reproduction number (one decimal place).

rename(old, new)[source]

Rename the given scenario names with a new one.

Parameters:
  • old (str) – old name

  • new (str) – new name

Returns:

covsirphy.Scenario – self

represent(q, variable, date=None, included=None, excluded=None)[source]

Return the names of representative scenarios using quantiles of the variable on on the date.

Parameters:
  • q (list[float] or float) – quantiles

  • variable (str) – reference variable, Confirmed, Infected, Fatal or Recovered

  • date (str or None) – reference date or None (the last end date in the all scenarios)

  • included (list[str] or None) – included scenarios or None (all included)

  • excluded (list[str] or None) – excluded scenarios or None (no scenarios not excluded)

Raises:

ValueError – the end dates of the last phase is not aligned

Returns:

list[float] or float – the nearest scenarios which has the values at the given quantiles

Note

Dimension of returned object corresponds to the type of @q.

simulate(name=None, variables=None, display=True, **kwargs)[source]

Perform simulation with phase-dependent ODE model.

Parameters:
  • name (str or None) – scenario name registered or None (actual data)

  • variables (list of [str] or None) – variables/alias to return or None ([“Confirmed”, “Fatal”, “Recovered”])

  • display (bool) – whether display figure of the result or not

  • **kwargs – keyword arguments of covsirphy.line_plot() except for @df

Returns:

pandas.DataFrame or pandas.Series

Index

Date (pd.Timestamp): dates

Columns

Population (int): total population (if selected with @variables) Confirmed (int): the number of confirmed cases (if selected with @variables) Recovered (int): the number of recovered cases (if selected with @variables) Fatal (int): the number of fatal cases (if selected with @variables) Susceptible (int): the number of susceptible cases (if selected with @variables) Infected (int): the number of currently infected cases (if selected with @variables)

summary()[source]

Summarize phase information of all scenarios.

Returns:

pandas.DataFrame

Index

Scenario (str): scenario names Phase (str): phase names, 0th, 1st,…

Columns

Start (pandas.Timestamp): start date of the phase End (pandas.Timestamp): end date of the phase ODE (str): ODE model name Rt (float): phase-dependent reproduction number (if parameters are available) (float): parameter values, including rho (if available) tau (int): tau value [min] (int or float): dimensional parameters, including 1/beta [days] (if tau and parameters are available)

to_dynamics(name)[source]

Create covsirphy.Dynamics instance of the scenario.

Parameters:

name (str) – scenario name

Raises:

SubsetNotFoundError – scenario with the name is un-registered

Returns:

covsirphy.Dynamics – instance which has ODE model, tau value and ODE parameter values

to_json(filename)[source]

Write a JSON file which can usable for recreating ODEScenario instance with .from_json()

Parameters:

filename (str or Path) – JSON filename

Returns:

str – filename

track()[source]

Track reproduction number, parameter value and dimensional parameter values.

Returns:

pandas.DataFrame
Index

reset index

Columns

Scenario (str): scenario names Phase (str): phase names Date (pandas.Timestamp): dates Rt (float): phase-dependent reproduction number (if parameters are available) (float): parameter values, including rho (if available) (int or float): dimensional parameters, including 1/beta [days] (if tau and parameters are available)