covsirphy.science package
Submodules
covsirphy.science.ml module
- class MLEngineer(seed=0, **kwargs)[source]
Bases:
Term
Class for machine learning and preprocessing.
- Parameters:
seed (int or None) – random seed
- forecast(Y, days, X=None, **kwargs)[source]
Forecast Y for given days with/without indicators (X).
- Parameters:
Y (pandas.DataFrame) –
- Index
pandas.Timestamp: Observation date
- Columns
observed and the target variables (int or float)
X (pandas.DataFrame or None) –
indicators for regression or None (no indicators) Index
pandas.Timestamp: Observation date
- Columns
observed and the target variables (int or float)
days (int) – days to predict
**kwargs – keyword arguments of autots.AutoTS() except for verbose, forecast_length (always the same as @days)
- Returns:
pandas.DataFrame –
- Index
pandas.Timestamp: Observation date, from the next date of Y.index to the ast predicted date
- Columns
observed and the target variables (int or float)
Note
AutoTS package is developed at https://github.com/winedarksea/AutoTS
- pca(X, n_components=0.95)[source]
Perform PCA (principal component analysis) after standardization (Z-score normalization) with pca package.
- Parameters:
X (pandas.DataFrame or None) –
- Index
pandas.Timestamp: Observation date
- Columns
(int or float): observed values of the training vectors
n_components (float or int) – _the number of principal components or percentage of variance to cover at least
- Returns:
dict of {str –
- object}: as the same as pca.pca().fit_transform()
{“loadings”: pandas.DataFrame}: structured dataframe containing loadings for PCs {“PC”: pandas.DataFrame}: reduced dimensionality space, the Principal Components (PCs)
- Index
pandas.Timestamp
- COlumns
PC1, PC2,…
{“explained_var”: array-like}: explained variance for each fo the PCs (same ordering as the PCs) {“variance_ratio”: array-like};: variance ratio {“model”: object}: fitted model to be used for further usage of the model {“scaler”: object}: scaler model {“pcp”: int}: pcp {“topfeat”: pandas.DataFrame}: top features
- Index
reset index
- Columns
PC (str): PC1, PC2,… feature (str): feature name of X loading (float): loading values type (str): “best” or “weak
- {“outliers”: pandas.DataFrame}: outliers
- Index
pandas.Timestamp
- Columns
y_proba (float) y_score (float) y_bool (bool) y_bool_spe (bool) y_score_spe (float)
{“outlier_params”: object}: parameter values of the model of finding outliers
Note
Regarding pca package, please refer to https://github.com/erdogant/pca
covsirphy.science.ode_scenario module
- class ODEScenario(data, location_name, complement=True)[source]
Bases:
Term
Perform scenario analysis, changing ODE parameters.
- Parameters:
data (pandas.DataFrame) –
actual data of the number of cases Index
Date (pandas.Timestamp): observation dates
- Columns
Population (int): total population Confirmed (int): the number of confirmed cases Recovered (int): the number of recovered cases, must be over 0 Fatal (int): the number of fatal cases Susceptible (int): the number of susceptible cases, will be ignored because overwritten Infected (int): the number of currently infected cases, will be ignored because overwritten the other columns will be ignored
location_name (str) – name to identify the location to show in figure titles
complement (bool) – perform data complement with covsirphy.DataEngineer().subset(complement=True) or not
Note
Data cleaning will be performed with covsirphy.DataEngineer().clean() automatically.
- append(name=None, end=None, **kwargs)[source]
Append a new phase, specifying ODE parameter values.
- Parameters:
name (str or list[str] None) – scenario name(s) or None (all scenarios)
end (pandas.Timestamp or int or None) – end date or the number days of new phase or None (the max date of all scenarios and actual data)
**kwargs – keyword arguments of ODE parameter values (default: values of the last phase)
- Raises:
SubsetNotFoundError – scenario with the name is un-registered
UnExpectedValueRangeError – end_date - (the last date of the registered phases) < 3 and parameters were changed
- Returns:
covsirphy.ODEScenario – self
- classmethod auto_build(geo, model, complement=True)[source]
Prepare cleaned and subset data from recommended dataset, create instance, build baseline scenario.
- Parameters:
- Raises:
SubsetNotFoundError – actual data of the location was not included in the recommended dataset
- Returns:
covsirphy.ODEScenario – created instance
Note
geo=None means total values of all countries.
Note
geo=”Japan” and geo=(“Japan”,) means country level data of Japan, as an example.
Note
geo=(“Japan”, “Tokyo”) means prefecture (province) level data of Tokyo/Japan, as an example.
Note
geo=(“USA”, “Alabama”, “Baldwin”) means country level data of Baldwin/Alabama/USA, as an example.
Note
Complemented (if @complement is True) data with Recovered > 0 will be analyzed.
- build_with_dynamics(name, dynamics)[source]
Build a scenario with covsirphy.Dynamics() instance.
- Parameters:
name (str) – scenario name
dynamics (covsirphy.Dynamics) – covsirphy.Dynamics() instance which has ODE model, tau value and ODE parameter values
- Returns:
covsirphy.ODEScenario – self
- build_with_model(name, model, date_range=None, tau=None)[source]
Build a scenario with covsirphy.Dynamics() instance created with the actual data automatically.
- build_with_template(name, template)[source]
Build a scenario with a template scenario.
- Parameters:
- Raises:
SubsetNotFoundError – scenario with the name is un-registered
- Returns:
covsirphy.ODEScenario – self
- compare_cases(variable, date_range=None, ref=None, display=True, **kwargs)[source]
Compare the number of cases of scenarios.
- Parameters:
variable (str) – variable name or alias
date_range (tuple of (str, str)) – start date and end date to analyze
ref (str or None) – name of reference scenario to specify phases and dates or None (the first scenario)
display (bool) – whether display figure of the result or not
**kwargs – keyword arguments of covsirphy.line_plot() except for @df
- Returns:
pandas.DataFrame –
- Index
Date (pandas.Timestamp): dates
- Columns
Actual (numpy.int64): actual records {scenario name} (numpy.int64): values of the scenario
- compare_param(param, date_range=None, ref=None, display=True, **kwargs)[source]
Compare the number of cases of scenarios.
- Parameters:
param (str) – one of ODE parameters, “Rt”, dimensional parameters
date_range (tuple of (str, str)) – start date and end date to analyze
ref (str or None) – name of reference scenario to specify phases and dates or None (the first scenario)
display (bool) – whether display figure of the result or not
**kwargs – keyword arguments of covsirphy.line_plot() except for @df
- Returns:
pandas.DataFrame –
- Index
Date (pandas.Timestamp)
- Columns
{scenario name} (str): values of the scenario
- describe()[source]
Describe representative values.
- Returns:
pandas.DataFrame –
- Index
str: scenario name
- Columns
max(Infected) (numpy.int64): max value of Infected
argmax(Infected) (pandas.Timestamp): the date when Infected shows max value
Confirmed({date}) (numpy.int64): Confirmed on the last date
Infected({date} (numpy.int64)): Infected on the last date
Fatal({date}) (numpy.int64): Fatal on the last date
- classmethod from_json(filename)[source]
Create ODEScenario instance with a JSON file.
- Parameters:
filename (str or Path) – JSON filename
- Returns:
covsirphy.ODEScenario – self
- predict(days, name, seed=0, verbose=1, X=None, **kwargs)[source]
Create scenarios and append a phase, performing prediction ODE parameter prediction for given days.
- Parameters:
days (int) – days to predict
name (str) – scenario name
X (pandas.DataFrame or None) –
information for regression or None (no information) Index
pandas.Timestamp: Observation date
- Columns
observed and the target variables (int or float)
seed (int or None) – random seed
verbose (int) – verbosity
**kwargs – keyword arguments of autots.AutoTS() except for verbose, forecast_length (always the same as @days)
- Returns:
covsirphy.ODEScenario – self
Note
AutoTS package is developed at https://github.com/winedarksea/AutoTS
Note
Phases are determined with rounded reproduction number (one decimal place).
- represent(q, variable, date=None, included=None, excluded=None)[source]
Return the names of representative scenarios using quantiles of the variable on on the date.
- Parameters:
variable (str) – reference variable, Confirmed, Infected, Fatal or Recovered
date (str or None) – reference date or None (the last end date in the all scenarios)
included (list[str] or None) – included scenarios or None (all included)
excluded (list[str] or None) – excluded scenarios or None (no scenarios not excluded)
- Raises:
ValueError – the end dates of the last phase is not aligned
- Returns:
list[float] or float – the nearest scenarios which has the values at the given quantiles
Note
Dimension of returned object corresponds to the type of @q.
- simulate(name=None, variables=None, display=True, **kwargs)[source]
Perform simulation with phase-dependent ODE model.
- Parameters:
name (str or None) – scenario name registered or None (actual data)
variables (list of [str] or None) – variables/alias to return or None ([“Confirmed”, “Fatal”, “Recovered”])
display (bool) – whether display figure of the result or not
**kwargs – keyword arguments of covsirphy.line_plot() except for @df
- Returns:
pandas.DataFrame or pandas.Series –
- Index
Date (pd.Timestamp): dates
- Columns
Population (int): total population (if selected with @variables) Confirmed (int): the number of confirmed cases (if selected with @variables) Recovered (int): the number of recovered cases (if selected with @variables) Fatal (int): the number of fatal cases (if selected with @variables) Susceptible (int): the number of susceptible cases (if selected with @variables) Infected (int): the number of currently infected cases (if selected with @variables)
- summary()[source]
Summarize phase information of all scenarios.
- Returns:
pandas.DataFrame –
- Index
Scenario (str): scenario names Phase (str): phase names, 0th, 1st,…
- Columns
Start (pandas.Timestamp): start date of the phase End (pandas.Timestamp): end date of the phase ODE (str): ODE model name Rt (float): phase-dependent reproduction number (if parameters are available) (float): parameter values, including rho (if available) tau (int): tau value [min] (int or float): dimensional parameters, including 1/beta [days] (if tau and parameters are available)
- to_dynamics(name)[source]
Create covsirphy.Dynamics instance of the scenario.
- Parameters:
name (str) – scenario name
- Raises:
SubsetNotFoundError – scenario with the name is un-registered
- Returns:
covsirphy.Dynamics – instance which has ODE model, tau value and ODE parameter values
- to_json(filename)[source]
Write a JSON file which can usable for recreating ODEScenario instance with .from_json()
- Parameters:
filename (str or Path) – JSON filename
- Returns:
str – filename
- track()[source]
Track reproduction number, parameter value and dimensional parameter values.
- Returns:
- pandas.DataFrame
- Index
reset index
- Columns
Scenario (str): scenario names Phase (str): phase names Date (pandas.Timestamp): dates Rt (float): phase-dependent reproduction number (if parameters are available) (float): parameter values, including rho (if available) (int or float): dimensional parameters, including 1/beta [days] (if tau and parameters are available)