This repository contains data in (CSV format) which are scraped from reliable sources (e.g. World Health Organisation).
-
Data are scraped a few times daily and pushed back to this repository together with generated charts (.PNG files).
-
Look for those CSV direct link below to get the scraped historical data.
NOTE: paused on 1 Jan 2021
Below are international stats, excluding China.
Bar chart of the latest snapshot.
NOTE: paused on 16 Jun 2020 due to format changes from WHO.
Data are scraped from these reports which are in PDF formats. New reports are released daily.
NOTE: paused on 10 July 2020
Data is pulled from Department of Health website.
Data are scraped from the MOH (Ministry of Health) local situation web page.
NOTE: paused on 27 Jun 2020.
NOTE: paused on 27 Jun 2020 due to format changes from CDC.
Cases in the US (data are scraped from here)
- Till 18 Apr 2020:
- From 18 Apr 2020, the data format CDC website has been changed to include races and age groups.
- From 7 May 2020,
This page has the realtime stats from China. Data are pulled several times a day by the pipeline.
- Jupyter notebooks are used for scraping data and output to CSV files
- These notebooks are executed on a schedule by Github Actions pipeline to scrape new data
- This pipeline also commits back new data to this repository
- Tools: Python3, Jupyter, Pandas, BeautifulSoup and related stuff (e.g. Selenium for web-scraping). It is recommended to start the development environment with this docker image, which is also used for the Github Actions build pipeline.
docker run -p 8888:8888 -it -v $PWD:/stats -w /stats alext234/datascience:latest bash
- requirements.txt contains Python dependencies
pip install -r requirements.txt
- Start Jupyter notebook from inside the container and then visit the browser at
http://localhost:8888
jupyter notebook --allow-root --ip=0.0.0.0
- Feel free to create new issues for any potential data source worth scraping.
- Pull requests are welcomed!
- Stargazers
- Last update from pipeline
- Pipeline status