This project aims to extract from a collection of vulnerabilities report expressed in common English language various semantic information. These semantic information are encoded and retrieved using Name Entity recognition (NER) on the description and currently the available labels are the following:
- FUNCTION: Vulnerable function name.
- VERSION: Vulnerable version of the target program.
- SOURCECE: Path to the source code that contains the vulnerable function/functions.
- DRIVER: Driver that we the attacker needs to interact with to trigger the exploit.
- STRUCT: Malformed struct that contains the bug.
- VULNERABILITY: Type of the vulnerability (e.g. buffer overflow, etc...).
- CAPABILITY: Capability that the attacker gains after a successful exploitation of the vulnerability (e.g. remote code execution, etc...).
The dataset on which the initial state of the project has been developed and tested on is the list of Common Vulnerability Exposure (CVE) regarding the Linux kernel for the years 2017 and 2018 (for this first implementation). The dataset can be found on the website CVE detail
The dataset is formatted as a Comma Separated Values (CSV) but it has been simplified from it's original version and only the description fields has been taken into account.
Install the project and al its dependencies with:
pip install cve_analyzer