Skip to content

Automatically scrape data and statistics on Coronavirus to make them easily accessible in CSV format

License

Notifications You must be signed in to change notification settings

alext234/coronavirus-stats

Repository files navigation

Top 10

CSV Data on Coronavirus (COVID-19)

This repository contains data in (CSV format) which are scraped from reliable sources (e.g. World Health Organisation).

  • Data are scraped a few times daily and pushed back to this repository together with generated charts (.PNG files).

  • Look for those CSV direct link below to get the scraped historical data.

Aggregate sites

NOTE: paused on 1 Jan 2021

Below are international stats, excluding China.

CSV direct link

Bar chart of the latest snapshot.

WHO & Government sites

From WHO (World Health Organisation) Situation reports

NOTE: paused on 16 Jun 2020 due to format changes from WHO.

Data are scraped from these reports which are in PDF formats. New reports are released daily.

Globally confirmed cases

CSV direct link

Stats from Australia

NOTE: paused on 10 July 2020

Data is pulled from Department of Health website.

Cases in Australia

CSV direct link

Stats from Singapore

Data are scraped from the MOH (Ministry of Health) local situation web page.

Cases in Singapore

NOTE: paused on 27 Jun 2020.

CSV direct link

From US CDC (Centers for Disease Control and Prevention)

NOTE: paused on 27 Jun 2020 due to format changes from CDC.

Cases in the US (data are scraped from here)

  • Till 18 Apr 2020:

CSV direct link

  • From 18 Apr 2020, the data format CDC website has been changed to include races and age groups.

CSV direct link

  • From 7 May 2020,

CSV direct link

Stats from China

This page has the realtime stats from China. Data are pulled several times a day by the pipeline.

All cases in China

CSV direct link

How it works

  • Jupyter notebooks are used for scraping data and output to CSV files
  • These notebooks are executed on a schedule by Github Actions pipeline to scrape new data
  • This pipeline also commits back new data to this repository

Development

  • Tools: Python3, Jupyter, Pandas, BeautifulSoup and related stuff (e.g. Selenium for web-scraping). It is recommended to start the development environment with this docker image, which is also used for the Github Actions build pipeline.
docker run  -p 8888:8888 -it -v $PWD:/stats -w /stats alext234/datascience:latest  bash 
pip install -r requirements.txt
  • Start Jupyter notebook from inside the container and then visit the browser at http://localhost:8888
jupyter notebook --allow-root --ip=0.0.0.0

Contributions

  • Feel free to create new issues for any potential data source worth scraping.
  • Pull requests are welcomed!

Repo status and stats

  • Stargazers

GitHub stars

  • Last update from pipeline

Last update

  • Pipeline status

Run notebooks and commit back data/charts