covsirphy.science package

Submodules

covsirphy.science.ml module

class MLEngineer(seed=0, **kwargs)[source]

Bases: Term

Class for machine learning and preprocessing.

Parameters:: seed (int or None) – random seed

forecast(Y, days, X=None, **kwargs)[source]

Forecast Y for given days with/without indicators (X).

Parameters:

Y (pandas.DataFrame) –

Index
pandas.Timestamp: Observation date

Columns
observed and the target variables (int or float)
X (pandas.DataFrame or None) –
indicators for regression or None (no indicators) Index

pandas.Timestamp: Observation date

Columns
observed and the target variables (int or float)
days (int) – days to predict
**kwargs – keyword arguments of autots.AutoTS() except for verbose, forecast_length (always the same as @days)

Returns:

pandas.DataFrame –

Index: pandas.Timestamp: Observation date, from the next date of Y.index to the ast predicted date
Columns: observed and the target variables (int or float)

Note

AutoTS package is developed at https://github.com/winedarksea/AutoTS

pca(X, n_components=0.95)[source]

Perform PCA (principal component analysis) after standardization (Z-score normalization) with pca package.

Parameters:

X (pandas.DataFrame or None) –

Index
pandas.Timestamp: Observation date

Columns
(int or float): observed values of the training vectors
n_components (float or int) – _the number of principal components or percentage of variance to cover at least

Returns:

dict of {str –

object}: as the same as pca.pca().fit_transform()

{“loadings”: pandas.DataFrame}: structured dataframe containing loadings for PCs {“PC”: pandas.DataFrame}: reduced dimensionality space, the Principal Components (PCs)

Index
pandas.Timestamp

COlumns
PC1, PC2,…

{“explained_var”: array-like}: explained variance for each fo the PCs (same ordering as the PCs) {“variance_ratio”: array-like};: variance ratio {“model”: object}: fitted model to be used for further usage of the model {“scaler”: object}: scaler model {“pcp”: int}: pcp {“topfeat”: pandas.DataFrame}: top features

Index
reset index

Columns
PC (str): PC1, PC2,… feature (str): feature name of X loading (float): loading values type (str): “best” or “weak

{“outliers”: pandas.DataFrame}: outliers

Index: pandas.Timestamp
Columns: y_proba (float) y_score (float) y_bool (bool) y_bool_spe (bool) y_score_spe (float)

{“outlier_params”: object}: parameter values of the model of finding outliers

Note

Regarding pca package, please refer to https://github.com/erdogant/pca

covsirphy.science.ode_scenario module

class ODEScenario(data, location_name, complement=True)[source]

Bases: Term

Perform scenario analysis, changing ODE parameters.

Parameters:

data (pandas.DataFrame) –
actual data of the number of cases Index

Date (pandas.Timestamp): observation dates

Columns
Population (int): total population Confirmed (int): the number of confirmed cases Recovered (int): the number of recovered cases, must be over 0 Fatal (int): the number of fatal cases Susceptible (int): the number of susceptible cases, will be ignored because overwritten Infected (int): the number of currently infected cases, will be ignored because overwritten the other columns will be ignored
location_name (str) – name to identify the location to show in figure titles
complement (bool) – perform data complement with covsirphy.DataEngineer().subset(complement=True) or not

Note

Data cleaning will be performed with covsirphy.DataEngineer().clean() automatically.

append(name=None, end=None, **kwargs)[source]

Append a new phase, specifying ODE parameter values.

Parameters:

name (str or list[str] None) – scenario name(s) or None (all scenarios)
end (pandas.Timestamp or int or None) – end date or the number days of new phase or None (the max date of all scenarios and actual data)
**kwargs – keyword arguments of ODE parameter values (default: values of the last phase)

Raises:

SubsetNotFoundError – scenario with the name is un-registered
UnExpectedValueRangeError – end_date - (the last date of the registered phases) < 3 and parameters were changed

Returns:

covsirphy.ODEScenario – self

classmethod auto_build(geo, model, complement=True)[source]

Prepare cleaned and subset data from recommended dataset, create instance, build baseline scenario.

Parameters:

geo (tuple(list[str] or tuple(str) or str) or str or None) – country, province, city
model (covsirphy.ODEModel) – definition of ODE model
complement (bool) – whether perform complement or not

Raises:

SubsetNotFoundError – actual data of the location was not included in the recommended dataset

Returns:

covsirphy.ODEScenario – created instance

Note

geo=None means total values of all countries.

Note

geo=”Japan” and geo=(“Japan”,) means country level data of Japan, as an example.

Note

geo=(“Japan”, “Tokyo”) means prefecture (province) level data of Tokyo/Japan, as an example.

Note

geo=(“USA”, “Alabama”, “Baldwin”) means country level data of Baldwin/Alabama/USA, as an example.

Note

Complemented (if @complement is True) data with Recovered > 0 will be analyzed.

build_with_dynamics(name, dynamics)[source]

Build a scenario with covsirphy.Dynamics() instance.

Parameters:

name (str) – scenario name
dynamics (covsirphy.Dynamics) – covsirphy.Dynamics() instance which has ODE model, tau value and ODE parameter values

Returns:

covsirphy.ODEScenario – self

build_with_model(name, model, date_range=None, tau=None)[source]

Build a scenario with covsirphy.Dynamics() instance created with the actual data automatically.

Parameters:

name (str) – scenario name
model (covsirphy.ODEModel) – definition of ODE model
date_range (tuple of (str, str) or None) – start date and end date of dynamics to analyze
tau (int or None) – tau value [min] or None (set later with data)

Returns:

covsirphy.ODEScenario – self

build_with_template(name, template)[source]

Build a scenario with a template scenario.

Parameters:

name (str) – new scenario name
template (str) – template name

Raises:

SubsetNotFoundError – scenario with the name is un-registered

Returns:

covsirphy.ODEScenario – self

compare_cases(variable, date_range=None, ref=None, display=True, **kwargs)[source]

Compare the number of cases of scenarios.

Parameters:

variable (str) – variable name or alias
date_range (tuple of (str, str)) – start date and end date to analyze
ref (str or None) – name of reference scenario to specify phases and dates or None (the first scenario)
display (bool) – whether display figure of the result or not
**kwargs – keyword arguments of covsirphy.line_plot() except for @df

Returns:

pandas.DataFrame –

Index: Date (pandas.Timestamp): dates
Columns: Actual (numpy.int64): actual records {scenario name} (numpy.int64): values of the scenario

compare_param(param, date_range=None, ref=None, display=True, **kwargs)[source]

Compare the number of cases of scenarios.

Parameters:

param (str) – one of ODE parameters, “Rt”, dimensional parameters
date_range (tuple of (str, str)) – start date and end date to analyze
ref (str or None) – name of reference scenario to specify phases and dates or None (the first scenario)
display (bool) – whether display figure of the result or not
**kwargs – keyword arguments of covsirphy.line_plot() except for @df

Returns:

pandas.DataFrame –

Index: Date (pandas.Timestamp)
Columns: {scenario name} (str): values of the scenario

delete(pattern, exact=False)[source]

Delete scenario(s).

Parameters:

pattern (str) – scenario name or pattern to search
exact (bool) – if False, use regular expressions

Returns:

covsirphy.ODEScenario – self

describe()[source]

Describe representative values.

Returns:

pandas.DataFrame –

Index

str: scenario name

Columns

max(Infected) (numpy.int64): max value of Infected
argmax(Infected) (pandas.Timestamp): the date when Infected shows max value
Confirmed({date}) (numpy.int64): Confirmed on the last date
Infected({date} (numpy.int64)): Infected on the last date
Fatal({date}) (numpy.int64): Fatal on the last date

classmethod from_json(filename)[source]

Create ODEScenario instance with a JSON file.

Parameters:: filename (str or Path) – JSON filename
Returns:: covsirphy.ODEScenario – self

predict(days, name, seed=0, verbose=1, X=None, **kwargs)[source]

Create scenarios and append a phase, performing prediction ODE parameter prediction for given days.

Parameters:

days (int) – days to predict
name (str) – scenario name
X (pandas.DataFrame or None) –
information for regression or None (no information) Index

pandas.Timestamp: Observation date

Columns
observed and the target variables (int or float)
seed (int or None) – random seed
verbose (int) – verbosity
**kwargs – keyword arguments of autots.AutoTS() except for verbose, forecast_length (always the same as @days)

Returns:

covsirphy.ODEScenario – self

Note

AutoTS package is developed at https://github.com/winedarksea/AutoTS

Note

Phases are determined with rounded reproduction number (one decimal place).

rename(old, new)[source]

Rename the given scenario names with a new one.

Parameters:

old (str) – old name
new (str) – new name

Returns:

covsirphy.Scenario – self

represent(q, variable, date=None, included=None, excluded=None)[source]

Return the names of representative scenarios using quantiles of the variable on on the date.

Parameters:

q (list[float] or float) – quantiles
variable (str) – reference variable, Confirmed, Infected, Fatal or Recovered
date (str or None) – reference date or None (the last end date in the all scenarios)
included (list[str] or None) – included scenarios or None (all included)
excluded (list[str] or None) – excluded scenarios or None (no scenarios not excluded)

Raises:

ValueError – the end dates of the last phase is not aligned

Returns:

list[float] or float – the nearest scenarios which has the values at the given quantiles

Note

Dimension of returned object corresponds to the type of @q.

simulate(name=None, variables=None, display=True, **kwargs)[source]

Perform simulation with phase-dependent ODE model.

Parameters:

name (str or None) – scenario name registered or None (actual data)
variables (list of [str] or None) – variables/alias to return or None ([“Confirmed”, “Fatal”, “Recovered”])
display (bool) – whether display figure of the result or not
**kwargs – keyword arguments of covsirphy.line_plot() except for @df

Returns:

pandas.DataFrame or pandas.Series –

Index: Date (pd.Timestamp): dates
Columns: Population (int): total population (if selected with @variables) Confirmed (int): the number of confirmed cases (if selected with @variables) Recovered (int): the number of recovered cases (if selected with @variables) Fatal (int): the number of fatal cases (if selected with @variables) Susceptible (int): the number of susceptible cases (if selected with @variables) Infected (int): the number of currently infected cases (if selected with @variables)

summary()[source]

Summarize phase information of all scenarios.

Returns:

pandas.DataFrame –

Index: Scenario (str): scenario names Phase (str): phase names, 0th, 1st,…
Columns: Start (pandas.Timestamp): start date of the phase End (pandas.Timestamp): end date of the phase ODE (str): ODE model name Rt (float): phase-dependent reproduction number (if parameters are available) (float): parameter values, including rho (if available) tau (int): tau value [min] (int or float): dimensional parameters, including 1/beta [days] (if tau and parameters are available)

to_dynamics(name)[source]

Create covsirphy.Dynamics instance of the scenario.

Parameters:: name (str) – scenario name
Raises:: SubsetNotFoundError – scenario with the name is un-registered
Returns:: covsirphy.Dynamics – instance which has ODE model, tau value and ODE parameter values

to_json(filename)[source]

Write a JSON file which can usable for recreating ODEScenario instance with .from_json()

Parameters:: filename (str or Path) – JSON filename
Returns:: str – filename

track()[source]

Track reproduction number, parameter value and dimensional parameter values.

Returns:

pandas.DataFrame

Index: reset index
Columns: Scenario (str): scenario names Phase (str): phase names Date (pandas.Timestamp): dates Rt (float): phase-dependent reproduction number (if parameters are available) (float): parameter values, including rho (if available) (int or float): dimensional parameters, including 1/beta [days] (if tau and parameters are available)