Installation

covsirphy library supports Python 3.7 or newer versions.

Please use covsirphy with a virtual environment (venv/poetry/conda etc.) because it has many dependencies as listed in “tool.poetry.dependencies” of pyproject.toml.

If you have any concerns, kindly create issues in CovsirPhy: GitHub Issues page. All discussions are recorded there.

Stable version

The latest stable version of CovsirPhy is available at PyPI (The Python Package Index): covsirphy.

pip install --upgrade covsirphy

Development version

You can find the latest development in GitHub repository: CovsirPhy and install it with pip command.

pip install --upgrade "git+https://github.com/lisphilar/covid19-sir.git#egg=covsirphy"

If you have a time to contribute CovsirPhy project, please refer to Guideline of contribution. Always welcome!

Installation with Anaconda

Anaconda users can install covsirphy in a conda environment (named “covid” for example). To avoid version conflicts of dependencies, fiona, ruptures and pip should be installed with conda command in advance.

conda create -n covid python=3 pip
conda activate covid
conda install -c conda-forge fiona ruptures
pip install --upgrade covsirphy

To exit this conda environment, please use conda deactivate.

Dataset preparation

With DataLoader class, we can download recommended datasets for analysis and save/update them in your local environment. Optionally, you can use your local dataset which is saved in a CSV file.

All raw datasets are retrieved from public databases. No confidential information is included. If you find any issues, please let us know via GitHub issue page.

2. How to request new data loader

If you want to use a new dataset for your analysis, please kindly inform us using GitHub Issues: Request new method of DataLoader class. Please read Guideline of contribution in advance.

3. Use a local CSV file which has the number of cases

We can replace jhu_data instance created by DataLoader class with your dataset saved in a CSV file. At this time, covsirphy supports country and province level data.

3.1. Create CountryData instance

Please create CountryData instance at first. Let’s say we have a CSV file (“oslo.csv”) with the following columns.

  • “date”: reported dates

  • “confirmed”: the number of confirmed cases

  • “recovered”: the number of recovered cases

  • “fatal”: the number of fatal cases

  • “province”: (optional) province names

Country level data will be set as total values of provinces with CountryData.register_total() method optionally.

# Create CountryData instance specifying filename and country name
country_data = cs.CountryData("oslo.csv", country="Norway")
# Specify column names
country_data.set_variables(
    date="date", confirmed="confirmed", recovered="recovered", fatal="fatal", province="province",
)
# (Optional) register total values of provinces as country level data
country_data.register_total()
# Check records -> pandas.DataFrame
# reset index, Date/Country/Province/Confirmed/Infected/Fatal/Recovered column
country_data.cleaned()

When we don’t have province column and the all records are for one province, we can specify the province name as follows.

# Create CountryData instance specifying filename and country/province name
country_data = cs.CountryData("oslo.csv", country="Norway", province="Oslo")
# Specify column names except for province
country_data.set_variables(
    date="date", confirmed="confirmed", recovered="recovered", fatal="fatal",
)
# Check records
country_data.cleaned()

When we don’t have province column and the all records are country level data, we can skip province name setting.

# Create CountryData instance specifying filename and country name
country_data = cs.CountryData("oslo.csv", country="Norway")
# Specify column names except for province
country_data.set_variables(
    date="date", confirmed="confirmed", recovered="recovered", fatal="fatal",
)
# Check records
country_data.cleaned()

3.2. Convert to JHUData instance

Then, convert the CountryData instance to a JHUData instance.

# Create JHUData instance using cleaned dataset (pandas.DataFrame)
jhu_data = cs.JHUData.from_dataframe(country_data.cleaned())
# Or, we can use and update the output of DataLoader.jhu()
# jhu_data = data_loader.jhu()
# jhu_data.replace(country_data)

3.3. Set population values

Additionally, you may need to register population values to PopulationData instance manually.

# Create PopulationData instance with empty dataset
population_data = cs.PopulationData()
# Or, we can use the output of DataLoader.population()
# population_data = data_loader.population()
# Update the population value: province is optional
population_data.update(693494, country="Norway", province="Oslo")

4. Data loading in Kaggle Notebook

We can use the recommended datasets in Kaggle Notebook. The datasets are saved in “/kaggle/input/” directory. Additionally, we can use Kaggle Datasets (CSV files) with covsirphy in Kaggle Notebook.

Note:
If you have Kaggle API, you can download Kaggle datasets to your local environment by updating and executing input.py script. CSV files will be saved in “/kaggle/input/” directory.

Kaggle API:
Move to account page of Kaggle and download “kaggle.json” by selecting “API > Create New API Token” button. Copy the json file to the top directory of the local repository or “~/.kaggle”. Please refer to How to Use Kaggle: Public API and stackoverflow: documentation for Kaggle API within python?

5. Acknowledgement

In Feb2020, CovsirPhy project started in Kaggle platform with COVID-19 data with SIR model notebook using the following datasets.

Best Regards.