Phase-dependent SIR models
SIR-derived ODE models (SIR, SIR-D, SIR-F) assume that ODE parameter values are always constant. Can we apply this assumption to real data?
Kaggle: COVID-19 data with SIR model concluded this was No - ODE parameter values change with phases - because measures taken by people changed the situation of COVID-19 outbreak actually. Term “phase” means a sequential dates in which the parameters of SIR-derived models are constant. To remove fluctuations from analysis, “time-dependent” should be avoided.
The first section will show the dynamics of phase-dependent SIR models with sample data.
The Kaggle notebook also suggested detailed steps of applying phase-dependent SIR models to actual data. The second section demonstrates how to divide actual time-series data to phases (time-series segmentation with “S-R change point analysis”). The third section is to estimate ODE parameter values of phases (“ODE parameter estimation”).
With fourth section, we will perform them with actual data of Tokyo/Japan as an example.
[1]:
from copy import deepcopy
from pprint import pprint
import covsirphy as cs
import sympy
cs.__version__
[1]:
'3.1.2'
1. Dynamics of phase-dependent SIR models
Using Dynamics
class, we will simulate phase-dependent SIR model with sample data (two phase) as an example.
tau value: 1440 [min] (default)
1-1. Define the 0th phase
As the first step, define the 0th (initial) phase with preset ODE parameter values.
[2]:
dyn = cs.Dynamics.from_sample(cs.SIRModel, date_range=("01Jan2022", "30Jun2022"), tau=1440)
# Show summary
dyn.summary()
[2]:
Start | End | Rt | rho | sigma | 1/beta [day] | 1/gamma [day] | |
---|---|---|---|---|---|---|---|
Phase | |||||||
0th | 2022-01-01 | 2022-06-30 | 2.67 | 0.2 | 0.075 | 5 | 13 |
1-2. Define the 1st phase
Get the current settings of initial values of variables and ODE parameter values as a pandas.DataFrame
. Then, edit it and register it to the instance.
[3]:
before_df = dyn.register()
before_df
[3]:
Susceptible | Infected | Recovered | Fatal | rho | sigma | |
---|---|---|---|---|---|---|
Date | ||||||
2022-01-01 | 999000.0 | 1000.0 | 0.0 | 0.0 | 0.2 | 0.075 |
2022-01-02 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-01-03 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-01-04 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-01-05 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
... | ... | ... | ... | ... | ... | ... |
2022-06-26 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-06-27 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-06-28 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-06-29 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-06-30 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
181 rows × 6 columns
Without NAs:
[4]:
before_df.dropna(how="all")
[4]:
Susceptible | Infected | Recovered | Fatal | rho | sigma | |
---|---|---|---|---|---|---|
Date | ||||||
2022-01-01 | 999000.0 | 1000.0 | 0.0 | 0.0 | 0.2 | 0.075 |
Edit the dataframe to set the ODE parameter values of the 1st phase.
[5]:
setting_df = dyn.register()
setting_df.loc["01Mar2022", ["rho", "sigma"]] = [0.4, 0.075]
setting_df
[5]:
Susceptible | Infected | Recovered | Fatal | rho | sigma | |
---|---|---|---|---|---|---|
Date | ||||||
2022-01-01 | 999000.0 | 1000.0 | 0.0 | 0.0 | 0.2 | 0.075 |
2022-01-02 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-01-03 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-01-04 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-01-05 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
... | ... | ... | ... | ... | ... | ... |
2022-06-26 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-06-27 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-06-28 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-06-29 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
2022-06-30 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
181 rows × 6 columns
Without NAs:
[6]:
setting_df.dropna(how="all")
[6]:
Susceptible | Infected | Recovered | Fatal | rho | sigma | |
---|---|---|---|---|---|---|
Date | ||||||
2022-01-01 | 999000.0 | 1000.0 | 0.0 | 0.0 | 0.2 | 0.075 |
2022-03-01 | <NA> | <NA> | <NA> | <NA> | 0.4 | 0.075 |
Register the new settings. Because initial values on 01Mar2022 will be calculated with analytical solution of the 0th phase, they were not registered manually.
[7]:
dyn.register(setting_df)
# Show summary
dyn.summary()
[7]:
Start | End | Rt | rho | sigma | 1/beta [day] | 1/gamma [day] | |
---|---|---|---|---|---|---|---|
Phase | |||||||
0th | 2022-01-01 | 2022-02-28 | 2.67 | 0.2 | 0.075 | 5 | 13 |
1st | 2022-03-01 | 2022-06-30 | 5.33 | 0.4 | 0.075 | 2 | 13 |
High-lightened:
[8]:
dyn.summary().style.highlight_max(subset=["Rt", "rho", "1/beta [day]"])
[8]:
Start | End | Rt | rho | sigma | 1/beta [day] | 1/gamma [day] | |
---|---|---|---|---|---|---|---|
Phase | |||||||
0th | 2022-01-01 00:00:00 | 2022-02-28 00:00:00 | 2.670000 | 0.200000 | 0.075000 | 5 | 13 |
1st | 2022-03-01 00:00:00 | 2022-06-30 00:00:00 | 5.330000 | 0.400000 | 0.075000 | 2 | 13 |
With the same way, we can set additional phases, if necessary for your analysis.
1-3. Track ODE parameter values
Confirm the phase-dependent ODE parameter values and Rt (reproduction number) by plotting.
[9]:
df = dyn.track()
cs.line_plot(
df["Rt"],
title=f"Sample data of {dyn.model_name}: phase-dependent Rt values",
ylabel="Reproduction number",
show_legend=False,
v=dyn.start_dates()[1:]
)
df.loc["27Feb2022": "02Mar2022"].style.highlight_max(subset=["Rt", "rho", "1/beta [day]"])
[9]:
Rt | rho | sigma | 1/beta [day] | 1/gamma [day] | |
---|---|---|---|---|---|
Date | |||||
2022-02-27 00:00:00 | 2.670000 | 0.200000 | 0.075000 | 5 | 13 |
2022-02-28 00:00:00 | 2.670000 | 0.200000 | 0.075000 | 5 | 13 |
2022-03-01 00:00:00 | 5.330000 | 0.400000 | 0.075000 | 2 | 13 |
2022-03-02 00:00:00 | 5.330000 | 0.400000 | 0.075000 | 2 | 13 |
1-4. Simulation
With Dynamics().simulate()
, simulate the phase-dependent ODE model with numerical analysis of each phase. Because we did not specify the initial values on the start date of the 1st phase, simulated values on the 1st phase’s start date will be used as the initial values of the 1st phase.
[10]:
df = dyn.simulate(model_specific=True)
cs.line_plot(df, title=f"Two-phase {dyn.model_name}", v=dyn.start_dates()[1:])
When we need convert model-specific variables to model-free variables (Susceptible/Infected/Fatal/Recovered), we will set model_specific=False
(default). SIR model cannot separate “Fatal” cases and “Recovered” cases into different groups. Please use SIR-D model or SIR-F model if the following role is not acceptable for your analysis.
“Fatal” is always equal to 0
“Recovered” is equal to “Fatal or Recovered” of SIR model
[11]:
df = dyn.simulate(model_specific=False)
cs.line_plot(df, title=f"Two-phase {dyn.model_name}", v=dyn.start_dates()[1:])
1-5. Observable variables
Actually, observable variables are Population/Confirmed/Infected/Recovered. We can calculate Population and Confirmed as follows.
Confirmed = Infected + Fatal + Recovered
Population = Susceptible + Confirmed
[12]:
real_df = dyn.simulate(model_specific=False)
real_df["Confirmed"] = real_df[["Infected", "Fatal", "Recovered"]].sum(axis=1)
real_df["Population"] = real_df[["Susceptible", "Confirmed"]].sum(axis=1)
real_df = real_df.loc[:, ["Population", "Confirmed", "Recovered", "Fatal"]]
cs.line_plot(real_df, title=f"Two-phase {dyn.model_name} with observable variables", v=dyn.start_dates()[1:])
2. S-R change point analysis
S-R change point analysis (S-R trend analysis) is a technique to find the change points of phases with actual time-series data.
2-1. S-R plane
Author of the Kaggle Notebook considered S-R plane. This idea (S-R plane) is from Balkew, Teshome Mogessie, “The SIR Model When S(t) is a Multi-Exponential Function.” (2010).Electronic Theses and Dissertations.Paper 1747. This is for simplest SIR model, but we can apply it to SIR-F model.
We will focus on the following equations of SIR-derived ODE models.
\begin{align*} & \frac{\mathrm{d}S}{\mathrm{d}T}= - \frac{\beta}{N} S I \\ & \frac{\mathrm{d}R}{\mathrm{d}T}= \gamma I \\ \end{align*}
When \(I > 0\), this leads to \begin{align*} & \frac{\mathrm{d}S}{\mathrm{d}R}= - \frac{\beta}{N \gamma} S \\ & \beta, \gamma, N: const. \end{align*}
Because \(S(0) = N\), \(S\) is a function of \(R\) as follows.
[13]:
S = sympy.symbols("S", cls=sympy.Function)
N, R = sympy.symbols("N R", positive=True)
beta, gamma = sympy.symbols(r"\beta \gamma", positive=True)
dS = - beta / (N * gamma) * S(R)
sr = sympy.dsolve(S(R).diff(R) - dS, hint="separable", ics={S(0): N})
sr
[13]:
2-2. log(S) vs. R plane
Then,
[14]:
sympy.Eq(sympy.simplify(sympy.log(sr.lhs)), sympy.simplify(sympy.log(sr.rhs)))
[14]:
Where \(a=\cfrac{\beta}{N \gamma} \ (> 0)\), \begin{align*} & \log S_{(R)} = - a R + \log N \\ & N: const. \end{align*}
\(\log S\) decreases constantly with increase of \(R\), when the records follow a SIR-derived model and the parameter values of the model are constant. Plot of \((x, y) = (R, \log S)\) shows a line.
[15]:
model = deepcopy(cs.SIRFModel)
sirf = model.from_sample()
solved_df = sirf.solve()
sr_df = model.sr(model.inverse_transform(solved_df))
corr = sr_df["log10(S)"].corr(sr_df["R"]).round(5)
cs.line_plot(
sr_df.set_index("R"),
title=f"{model.name()}: logS - R plane (corr: {corr})",
xlabel="R", xlim=(0, None),
ylabel="log10(S)", ylim=(None, None),
show_legend=False
)
Initial values on the first date was set, but the summary shows just a phase with no set ODE parameter values.
2-3. Plane of phase-dependent models
Correlation coefficient is about -1.0 when ODE parameter values are constant The slope of the line may change when the ODE parameter values are changed.
Sample data of phase-dependent SIR-F model is here.
[16]:
dyn = cs.Dynamics.from_sample(cs.SIRFModel, date_range=("01Jan2022", "30Jun2022"))
# Set some phases by changing ODE parameter values
df = dyn.register()
parameters = ["theta", "kappa", "rho", "sigma"]
df.loc["18Feb2022", parameters] = [0.0002, 0.005, 0.40, 0.075]
df.loc["24Feb2022", parameters] = [0.0002, 0.005, 0.15, 0.075]
df.loc["01Mar2022", parameters] = [0.0002, 0.003, 0.40, 0.075]
df.loc["08Mar2022", parameters] = [0.0002, 0.005, 0.40, 0.150]
df.loc["15Mar2022", parameters] = [0.0000, 0.005, 0.40, 0.150]
dyn.register(df)
# Show summary
dyn.summary()[["Start", "End", "Rt", *parameters]].style.format(precision=4).background_gradient()
[16]:
Start | End | Rt | theta | kappa | rho | sigma | |
---|---|---|---|---|---|---|---|
Phase | |||||||
0th | 2022-01-01 00:00:00 | 2022-02-17 00:00:00 | 2.5000 | 0.0020 | 0.0050 | 0.2000 | 0.0750 |
1st | 2022-02-18 00:00:00 | 2022-02-23 00:00:00 | 5.0000 | 0.0002 | 0.0050 | 0.4000 | 0.0750 |
2nd | 2022-02-24 00:00:00 | 2022-02-28 00:00:00 | 1.8700 | 0.0002 | 0.0050 | 0.1500 | 0.0750 |
3rd | 2022-03-01 00:00:00 | 2022-03-07 00:00:00 | 5.1300 | 0.0002 | 0.0030 | 0.4000 | 0.0750 |
4th | 2022-03-08 00:00:00 | 2022-03-14 00:00:00 | 2.5800 | 0.0002 | 0.0050 | 0.4000 | 0.1500 |
5th | 2022-03-15 00:00:00 | 2022-06-30 00:00:00 | 2.5800 | 0.0000 | 0.0050 | 0.4000 | 0.1500 |
Dynamics:
[17]:
model = deepcopy(cs.SIRFModel)
df = dyn.simulate(model_specific=True)
cs.line_plot(
df[["Susceptible", "Recovered"]],
color_dict={"Susceptible": "blue", "Recovered": "green"},
title=f"Phase-dependent {model.name()}: dynamics",
v=dyn.start_dates()[1:],
bbox_to_anchor=(0.5, -0.3), bbox_loc="lower center",
)
Plot of \((R, \log _{10} S)\): with vertical lines on change points (= start dates of phases 1st - last):
[18]:
model = deepcopy(cs.SIRFModel)
# Get (R, log10S)
sr_df = model.sr(dyn.simulate())
# Plotting
cs.line_plot(
sr_df.set_index("R"),
title=f"Phase-dependent {model.name()}: logS - R plane\nwith vertical lines on change points",
xlabel="R", xlim=(0, None),
ylabel="log10(S)", ylim=(None, None),
show_legend=False,
v=[sr_df.loc[d, "R"] for d in dyn.start_dates()[1:]],
)
With the plotting, we can confirm most of the change points (except for 5th - 6th: difficult because the change was for \(\alpha_1\)).
2.4 Time-series segmentation
The previous sub-sections demonstrated that change points of S-R trend suggest change points of phases. Here, we will detect the change points of S-R trend with Dynamics().detect()
via Dynamics.segment()
. They uses ruptures library: change point detection in Python for off-line change point detection.
We will prepare a dataset which has the 0th - 5th phase.
[19]:
dyn = cs.Dynamics.from_sample(cs.SIRFModel, date_range=("01Jan2022", "30Jun2022"), tau=1440)
# Set some phases by changing ODE parameter values
df = dyn.register()
parameters = ["theta", "kappa", "rho", "sigma"]
df.loc["18Feb2022", parameters] = [0.0002, 0.005, 0.40, 0.075]
df.loc["24Feb2022", parameters] = [0.0002, 0.005, 0.15, 0.075]
df.loc["01Mar2022", parameters] = [0.0002, 0.003, 0.40, 0.075]
df.loc["08Mar2022", parameters] = [0.0002, 0.005, 0.40, 0.150]
df.loc["15Mar2022", parameters] = [0.0000, 0.005, 0.40, 0.150]
dyn.register(df)
# Simulation
sim_df = dyn.simulate(model_specific=False)
cs.line_plot(
sim_df,
title=f"Prepared dataset with phases and {dyn.model_name}",
bbox_to_anchor=(0.5, -0.3),
)
Create an instance of Dynamics
with the prepared data. Note that the instance does not know the change points of phases.
[20]:
dyn_sim = cs.Dynamics.from_data(model=cs.SIRFModel, data=sim_df, tau=None, name="Sample data")
# Show summary
display(dyn_sim.summary())
Start | End | |
---|---|---|
Phase | ||
0th | 2022-01-01 | 2022-06-30 |
Dynamics().detect()
performs S-R change point analysis (trend analysis) and show the trend on S-R plane. Dot lines are the change points of phases.
With algo
argument, we can select the algorithm and models to detect the change points from “Binseg-normal” (default), “Pelt-rbf”, “Binseg-rbf”, “BottomUp-rbf”, “BottomUp-normal”. Please refer to the documentation of ruptures library.
Additionally, we can specify the minimum size of phases with argument min_size=7
(as default, [days]).
[21]:
points, fit_df = dyn_sim.detect(algo="Binseg-rbf", min_size=7)
# List of change points
pprint(points)
# Plotted values
display(fit_df)
[Timestamp('2022-02-11 00:00:00'),
Timestamp('2022-02-20 00:00:00'),
Timestamp('2022-03-04 00:00:00'),
Timestamp('2022-03-14 00:00:00')]
Actual | 0th | 1st | 2nd | 3rd | 4th | Fitted | |
---|---|---|---|---|---|---|---|
R | |||||||
0 | 5.999565 | 5.999569 | NaN | NaN | NaN | NaN | 5.999569 |
80 | 5.999473 | 5.999476 | NaN | NaN | NaN | NaN | 5.999476 |
169 | 5.999369 | 5.999373 | NaN | NaN | NaN | NaN | 5.999373 |
271 | 5.999252 | 5.999255 | NaN | NaN | NaN | NaN | 5.999255 |
385 | 5.999120 | 5.999123 | NaN | NaN | NaN | NaN | 5.999123 |
... | ... | ... | ... | ... | ... | ... | ... |
910075 | 4.652981 | NaN | NaN | NaN | NaN | 4.652989 | 4.652989 |
910076 | 4.652981 | NaN | NaN | NaN | NaN | 4.652988 | 4.652988 |
910077 | 4.652981 | NaN | NaN | NaN | NaN | 4.652986 | 4.652986 |
910078 | 4.652981 | NaN | NaN | NaN | NaN | 4.652985 | 4.652985 |
910079 | 4.652981 | NaN | NaN | NaN | NaN | 4.652984 | 4.652984 |
153 rows × 7 columns
Note that detected phases are un-registered to the instance.
[22]:
dyn_sim.summary()
[22]:
Start | End | |
---|---|---|
Phase | ||
0th | 2022-01-01 | 2022-06-30 |
To perform time-series segmentation with S-R trend analysis, we will use Dynamics().segment()
, which accepts all keyword arguments of Dynamics().detect()
.
[23]:
dyn_sim.segment(algo="Binseg-rbf", min_size=7)
# Show summary
dyn_sim.summary()
[23]:
Start | End | |
---|---|---|
Phase | ||
0th | 2022-01-01 | 2022-02-10 |
1st | 2022-02-11 | 2022-02-19 |
2nd | 2022-02-20 | 2022-03-03 |
3rd | 2022-03-04 | 2022-03-13 |
4th | 2022-03-14 | 2022-06-30 |
If we need to add new change points or over-write change points, we will use Dynamics().segment(points=<list>, overwrite=<bool>)
.
[24]:
# Over-write
dyn_sim.segment(points=["01Mar2022", "01Jun2022"], overwrite=True)
# Add
dyn_sim.segment(points=["01May2022"], overwrite=False)
# Show summary
dyn_sim.summary()
[24]:
Start | End | |
---|---|---|
Phase | ||
0th | 2022-01-01 | 2022-02-28 |
1st | 2022-03-01 | 2022-04-30 |
2nd | 2022-05-01 | 2022-05-31 |
3rd | 2022-06-01 | 2022-06-30 |
3. ODE parameter estimation
This section demonstrates how to estimate tau value (constant for all phases) and the ODE parameter values of phases.
3.1 Preparation
Create an Dynamics()
instance which has a dataset with 6 phases (the 0th - 5th). The instance knows change points, but does not know tau value and ODE parameter values.
Create dataset.
[25]:
dyn = cs.Dynamics.from_sample(cs.SIRFModel, date_range=("01Jan2022", "30Jun2022"), tau=1440)
# Set some phases by changing ODE parameter values
df = dyn.register()
parameters = ["theta", "kappa", "rho", "sigma"]
df.loc["18Feb2022", parameters] = [0.0002, 0.005, 0.40, 0.075]
df.loc["24Feb2022", parameters] = [0.0002, 0.005, 0.15, 0.075]
df.loc["01Mar2022", parameters] = [0.0002, 0.003, 0.40, 0.075]
df.loc["08Mar2022", parameters] = [0.0002, 0.005, 0.40, 0.150]
df.loc["15Mar2022", parameters] = [0.0000, 0.005, 0.40, 0.150]
dyn.register(df)
# Simulation
sim_df = dyn.simulate(model_specific=False)
cs.line_plot(
sim_df,
title=f"Prepared dataset with phases and {dyn.model_name}",
bbox_to_anchor=(0.5, -0.3),
)
Create Dynamics()
instance.
[26]:
dyn_est = cs.Dynamics.from_data(cs.SIRFModel, data=sim_df, tau=None, name="Sample data")
dyn_est.segment(points=["18Feb2022", "24Feb2022", "01Mar2022", "08Mar2022", "15Mar2022"])
# Show tau value
print(f"Tau value [min]: {dyn_est.tau or 'un-set'}")
# Show summary
display(dyn_est.summary())
Tau value [min]: un-set
Start | End | |
---|---|---|
Phase | ||
0th | 2022-01-01 | 2022-02-17 |
1st | 2022-02-18 | 2022-02-23 |
2nd | 2022-02-24 | 2022-02-28 |
3rd | 2022-03-01 | 2022-03-07 |
4th | 2022-03-08 | 2022-03-14 |
5th | 2022-03-15 | 2022-06-30 |
3.2 Tau estimation
Before ODE parameter estimation, we need to select tau value [min]. Tau value is expected to be constant for all phases because tau value is a parameter used to convert actual time (with unit [min]) to time steps (without units) from divisors of 1440).
Dynamics().estimate_tau(metric="RMSLE", q=0.5, digits=None, n_jobs=None)
(as default) sets the best tau value for the registered data, estimating ODE parameters with quantiles.
Arguments:
metric (str): metric name for scoring when selecting best tau value
q (float): the quantiles to compute, values between (0, 1)
digits (int or None): effective digits of ODE parameter values or None (skip rounding)
n_jobs (int or None): the number of parallel jobs or None (CPU count)
Metrics (selectable candidates of metric
):
ME: maximum residual error
MAE: mean absolute error
MSE: mean square error
MSLE: mean squared logarithmic error
MAPE: mean absolute percentage error
RMSE: root mean squared error
RMSLE: root mean squared logarithmic error
R2: the coefficient of determination
For example, RMSLE score will be calculated with the following equation. \begin{align*} \mathrm{RMSLE} = \sqrt{\cfrac{1}{n}\sum_{i=1}^{n}(log_{10}(A_{i} + 1) - log_{10}(P_{i} + 1))^2} \end{align*} Where \(A\) is the observed (actual) values, \(P\) is estimated (predicted) values. Variables are \(S (i=1), I (i=2), R (i=3)\ \mathrm{and}\ F (i=n=4)\) for SIR-F model. When RMSLE score is low, hyperparameter estimation is highly accurate. Please refer to external sites, including Medium: What’s the Difference Between RMSE and RMSLE?.
[27]:
metric = "RMSLE"
tau, score_df = dyn_est.estimate_tau(metric=metric)
print(f"Tau value [min]: {dyn_est.tau or 'un-set'}")
score_df.plot.line(
title=f"Tau estimation: {metric} scores (selected tau: {tau} [min])",
xlabel="Tau value [min]", ylabel=metric, legend=False
);
Tau value [min]: 48
3.3 ODE parameter estimation
Dynamics().estimate_params(metric="RMSLE", digits=None, n_jobs=None, **kwargs)
(as default) sets ODE parameter values optimized for the registered data with hyperparameter optimization using Optuna library.
Arguments of Dynamics().estimate_params
:
metric (str): metric name for scoring when optimizing ODE parameter values of phases
digits (int or None): effective digits of ODE parameter values or None (skip rounding)
n_jobs (int or None): the number of parallel jobs or None (CPU count)
**kwargs: keyword arguments of optimization
quantiles (tuple(int, int)): quantiles to cut parameter range, like confidence interval, (0.1, 0.9) as default
timeout (int): timeout of optimization, 180 as default
timeout_iteration (int): timeout of one iteration, 1 as default
tail_n (int): the number of iterations to decide whether score did not change for the last iterations, 4 as default
allowance (tuple(float, float)): the allowance of the max predicted values, (0.99, 1.01) as default
pruner (str): kind of pruner (hyperband, median, threshold or percentile), “threshold” as default
upper (float): works for “threshold” pruner, intermediate score is larger than this value, it prunes, 0.5 as default
percentile (float): works for “Percentile” pruner, the best intermediate value is in the bottom percentile among trials, it prunes, 50 as default
constant_liar (bool): whether use constant liar to reduce search effort or not, False as default
[28]:
est_df = dyn_est.estimate_params()
# Show RMSLE scores pf phases
est_df = est_df.drop_duplicates()
est_df.index.name = "Start"
display(est_df)
2024-10-03 at 13:15:33 | INFO |
<SIR-F Model: parameter estimation>
2024-10-03 at 13:15:33 | INFO | Running optimization with 4 CPUs...
2024-10-03 at 13:16:00 | INFO | Completed optimization. Total: 0 min 27 sec
theta | kappa | rho | sigma | RMSLE | Trials | Runtime | |
---|---|---|---|---|---|---|---|
Start | |||||||
2022-01-01 | 0.000537 | 0.000176 | 0.006618 | 0.002462 | 0.010762 | 127 | 0 min 8 sec |
2022-02-24 | 0.000056 | 0.000212 | 0.006508 | 0.00317 | 0.037947 | 203 | 0 min 9 sec |
2022-03-01 | 0.000165 | 0.000115 | 0.016231 | 0.002884 | 0.054854 | 234 | 0 min 11 sec |
2022-03-15 | 0.263358 | 0.000016 | 0.013188 | 0.004979 | 0.015927 | 107 | 0 min 9 sec |
2022-03-08 | 0.000003 | 0.000205 | 0.016925 | 0.00611 | 0.061414 | 389 | 0 min 14 sec |
2022-02-18 | 0.000117 | 0.000186 | 0.015258 | 0.002826 | 0.047217 | 623 | 0 min 27 sec |
[29]:
dyn_est.summary()
[29]:
Start | End | Rt | theta | kappa | rho | sigma | alpha1 [-] | 1/alpha2 [day] | 1/beta [day] | 1/gamma [day] | |
---|---|---|---|---|---|---|---|---|---|---|---|
Phase | |||||||||||
0th | 2022-01-01 | 2022-02-17 | 2.51 | 0.000537 | 0.000176 | 0.006618 | 0.002462 | 0.001 | 189 | 5 | 14 |
1st | 2022-02-18 | 2022-02-23 | 5.06 | 0.000117 | 0.000186 | 0.015258 | 0.002826 | 0.0 | 179 | 2 | 12 |
2nd | 2022-02-24 | 2022-02-28 | 1.92 | 0.000056 | 0.000212 | 0.006508 | 0.00317 | 0.0 | 157 | 5 | 11 |
3rd | 2022-03-01 | 2022-03-07 | 5.41 | 0.000165 | 0.000115 | 0.016231 | 0.002884 | 0.0 | 289 | 2 | 12 |
4th | 2022-03-08 | 2022-03-14 | 2.68 | 0.000003 | 0.000205 | 0.016925 | 0.00611 | 0.0 | 163 | 2 | 5 |
5th | 2022-03-15 | 2022-06-30 | 1.94 | 0.263358 | 0.000016 | 0.013188 | 0.004979 | 0.263 | 2107 | 3 | 7 |
4. With actual data
This section demonstrates how to divide actual time-series data with phases (“S-R change point analysis”) and to estimate ODE parameter values of phases (“ODE parameter estimation”).
4-1. Data preparation
Prepare actual prefecture-level time-series data in Japan for demonstration.
Download the dataset.
[30]:
eng = cs.DataEngineer()
eng.download(country=None, databases=["japan", "covid19dh"]);
Data cleaning, transformation, subsetting, complement.
[31]:
eng.clean()
eng.transform()
actual_df, status, _ = eng.subset(geo=("Japan",), variables="SIRF", complement=True)
print(status)
actual_df.tail()
monotonic increasing complemented confirmed data and
monotonic increasing complemented fatal data and
fully complemented recovered data
[31]:
Susceptible | Infected | Recovered | Fatal | |
---|---|---|---|---|
Date | ||||
2023-05-04 | 92738009.0 | 183209 | 33533260 | 74622 |
2023-05-05 | 92732198.0 | 177397 | 33544864 | 74641 |
2023-05-06 | 92725964.0 | 171546 | 33556937 | 74653 |
2023-05-07 | 92711524.0 | 175456 | 33567458 | 74662 |
2023-05-08 | 92702197.0 | 174762 | 33577464 | 74677 |
Plotting: removed “Susceptible” from the figure because “Susceptible” >> max(“Infected”, “Infected”, “Recovered”).
[32]:
cs.line_plot(actual_df.drop("Susceptible", axis=1), "Actual data in Japan")
With the data, create an instance of Dynamics
class.
[33]:
dyn_act = cs.Dynamics.from_data(model=cs.SIRFModel, data=actual_df, name="Japan")
# Show registered values
dyn_act.register().tail()
[33]:
Susceptible | Infected | Recovered | Fatal | theta | kappa | rho | sigma | |
---|---|---|---|---|---|---|---|---|
Date | ||||||||
2023-05-04 | 92738009.0 | 183209.0 | 33533260.0 | 74622.0 | <NA> | <NA> | <NA> | <NA> |
2023-05-05 | 92732198.0 | 177397.0 | 33544864.0 | 74641.0 | <NA> | <NA> | <NA> | <NA> |
2023-05-06 | 92725964.0 | 171546.0 | 33556937.0 | 74653.0 | <NA> | <NA> | <NA> | <NA> |
2023-05-07 | 92711524.0 | 175456.0 | 33567458.0 | 74662.0 | <NA> | <NA> | <NA> | <NA> |
2023-05-08 | 92702197.0 | 174762.0 | 33577464.0 | 74677.0 | <NA> | <NA> | <NA> | <NA> |
[34]:
# Show summary
dyn_act.summary()
[34]:
Start | End | |
---|---|---|
Phase | ||
0th | 2020-02-06 | 2023-05-08 |
4-2. Time-series segmentation
With Dynamics.segment()
, we will apply S-R change point analysis (trend analysis) to the prepared actual data.
[35]:
dyn_act.segment();
# Show summary
dyn_act.summary().T
[35]:
Phase | 0th | 1st | 2nd | 3rd | 4th | 5th | 6th | 7th | 8th | 9th | ... | 58th | 59th | 60th | 61st | 62nd | 63rd | 64th | 65th | 66th | 67th |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Start | 2020-02-06 | 2020-08-11 | 2020-11-17 | 2020-12-25 | 2021-01-17 | 2021-02-11 | 2021-04-04 | 2021-04-22 | 2021-05-07 | 2021-05-25 | ... | 2023-01-24 | 2023-01-31 | 2023-02-08 | 2023-02-20 | 2023-03-01 | 2023-03-13 | 2023-03-28 | 2023-04-08 | 2023-04-19 | 2023-04-28 |
End | 2020-08-10 | 2020-11-16 | 2020-12-24 | 2021-01-16 | 2021-02-10 | 2021-04-03 | 2021-04-21 | 2021-05-06 | 2021-05-24 | 2021-07-01 | ... | 2023-01-30 | 2023-02-07 | 2023-02-19 | 2023-02-28 | 2023-03-12 | 2023-03-27 | 2023-04-07 | 2023-04-18 | 2023-04-27 | 2023-05-08 |
2 rows × 68 columns
4-3. Parameter estimation
Dynamics().estimate(**kwargs)
performs Dynamics().estimate_tau()
and Dynamics().estimate_params()
in order.
[36]:
dyn_act.estimate()
print(f"Tau value [min]: {dyn_act.tau or 'un-set'}")
# Show summary
dyn_act.summary()
2024-10-03 at 13:16:51 | INFO |
<SIR-F Model: parameter estimation>
2024-10-03 at 13:16:51 | INFO | Running optimization with 4 CPUs...
2024-10-03 at 13:16:51 | INFO | Running optimization with 4 CPUs...
2024-10-03 at 13:19:20 | INFO | Completed optimization. Total: 2 min 29 sec
Tau value [min]: 15
[36]:
Start | End | Rt | theta | kappa | rho | sigma | alpha1 [-] | 1/alpha2 [day] | 1/beta [day] | 1/gamma [day] | |
---|---|---|---|---|---|---|---|---|---|---|---|
Phase | |||||||||||
0th | 2020-02-06 | 2020-08-10 | 1.5 | 0.021396 | 0.000051 | 0.00117 | 0.000712 | 0.021 | 204 | 9 | 15 |
1st | 2020-08-11 | 2020-11-16 | 0.89 | 0.003129 | 0.000008 | 0.000747 | 0.000831 | 0.003 | 1343 | 14 | 13 |
2nd | 2020-11-17 | 2020-12-24 | 1.74 | 0.000366 | 0.000008 | 0.000755 | 0.000426 | 0.0 | 1359 | 14 | 24 |
3rd | 2020-12-25 | 2021-01-16 | 1.54 | 0.001955 | 0.000007 | 0.000782 | 0.0005 | 0.002 | 1498 | 13 | 21 |
4th | 2021-01-17 | 2021-02-10 | 0.69 | 0.002807 | 0.00001 | 0.000445 | 0.000638 | 0.003 | 993 | 23 | 16 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
63rd | 2023-03-13 | 2023-03-27 | 1.83 | 0.000503 | 0.000002 | 0.000704 | 0.000383 | 0.001 | 6296 | 15 | 27 |
64th | 2023-03-28 | 2023-04-07 | 0.66 | 0.000217 | 0.000002 | 0.000674 | 0.001014 | 0.0 | 4528 | 15 | 10 |
65th | 2023-04-08 | 2023-04-18 | 1.47 | 0.002585 | 0.000001 | 0.001081 | 0.000732 | 0.003 | 8188 | 10 | 14 |
66th | 2023-04-19 | 2023-04-27 | 1.81 | 0.000954 | 0.000002 | 0.000878 | 0.000483 | 0.001 | 6782 | 12 | 22 |
67th | 2023-04-28 | 2023-05-08 | 1.57 | 0.000112 | 0.000001 | 0.000951 | 0.000604 | 0.0 | 7027 | 11 | 17 |
68 rows × 11 columns
4-4. Accuracy evaluation
Compare the simulated results and actual records, and evaluate the differences.
Arguments: - date_range (tuple of (str or pandas.Timestamp or None) or None): range of dates to evaluate or None (the first and the last date) - metric (str): metric to evaluate the difference, “RMSLE” as default - display (bool): whether display figure of comparison or not - kwargs: keyword arguments of covsirphy.compare_plot()
[37]:
metric = "RMSLE"
score = dyn_act.evaluate(metric=metric)
print(f"{metric}: {score:.4} (overall score for time-series segmentation, tau estimation and ODE parameter estimation)")
RMSLE: 0.5651 (overall score for time-series segmentation, tau estimation and ODE parameter estimation)
4-5. Simulation
Simulate the dynamics with the estimated parameter sets of phases.
[38]:
simulated_df = dyn_act.simulate()
[39]:
cs.line_plot(
simulated_df.drop("Susceptible", axis=1),
title="Simulated data in Tokyo/Japan",
v=dyn_act.start_dates()[1:],
bbox_to_anchor=(0.5, -0.3),
)
When we need to simulate for the last 30 days, we can use Dynamics().parse_days(days=-30, ref="last")
.
[40]:
start, end = dyn_act.parse_days(days=-30, ref="last")
cs.line_plot(
simulated_df.drop("Susceptible", axis=1).loc[start: end],
title=f"Simulated: Japan, {start: %Y-%m-%d} - {end: %Y-%m-%d}",
bbox_to_anchor=(0.5, -0.3),
y_integer=True,
)
When we need to perform simulation for th 5th and 6th phase, We can use Dynamics().parse_phases(phases=["5th", "6th"])
).
[41]:
start, end = dyn_act.parse_phases(phases=["5th", "6th"])
cs.line_plot(
simulated_df.drop("Susceptible", axis=1).loc[start: end],
title=f"Simulated: Japan, {start: %Y-%m-%d} - {end: %Y-%m-%d}",
bbox_to_anchor=(0.5, -0.3),
y_integer=True,
)
4-6. Tracking of ODE parameters
Track the estimated ODE parameter values.
[42]:
track_df = dyn_act.track()
track_df.tail()
[42]:
Rt | theta | kappa | rho | sigma | alpha1 [-] | 1/alpha2 [day] | 1/beta [day] | 1/gamma [day] | |
---|---|---|---|---|---|---|---|---|---|
Date | |||||||||
2023-05-04 | 1.57 | 0.000112 | 0.000001 | 0.000951 | 0.000604 | 0.0 | 7027 | 11 | 17 |
2023-05-05 | 1.57 | 0.000112 | 0.000001 | 0.000951 | 0.000604 | 0.0 | 7027 | 11 | 17 |
2023-05-06 | 1.57 | 0.000112 | 0.000001 | 0.000951 | 0.000604 | 0.0 | 7027 | 11 | 17 |
2023-05-07 | 1.57 | 0.000112 | 0.000001 | 0.000951 | 0.000604 | 0.0 | 7027 | 11 | 17 |
2023-05-08 | 1.57 | 0.000112 | 0.000001 | 0.000951 | 0.000604 | 0.0 | 7027 | 11 | 17 |
[43]:
track_df.columns
[43]:
Index(['Rt', 'theta', 'kappa', 'rho', 'sigma', 'alpha1 [-]', '1/alpha2 [day]',
'1/beta [day]', '1/gamma [day]'],
dtype='object')
[44]:
def history(param_name, title=None, h=None):
"""Display the history of ODE parameter values tracked with global `track_df`.
Args:
param_name (str): name of the parameter
title (str or None): figure title or Nome (@param_name will be used)
h (list/tuple[int/float] or None): list of y values of horizontal lines or None
"""
global track_df
cs.line_plot(
track_df.loc[:, param_name],
title=title or f"Tracked {param_name}",
ylabel=param_name,
math_scale=False,
h=h,
show_legend=False,
)
[45]:
history("theta")
[46]:
history("alpha1 [-]")
[47]:
history("kappa")
[48]:
history("1/alpha2 [day]")
[49]:
history("rho")
[50]:
history("1/beta [day]")
[51]:
history("sigma")
[52]:
history("1/gamma [day]")
[53]:
history("Rt", title="Tracked reproduction number", h=1.0)
Thank you!