HPG-Methyl is an ultrafast and highly sensitive Next-Generation Sequencing (NGS) read mapper and methylation context extractor. Compared with other current mapping and methylation extraction tools, HPG-Methyl offers better sensitivity and shorter execution times even for long reads.
Since the files generated by HPG-Methyl are fully compatible with the files generated by other popular tools, it can be used as a drop-in replacement to accelerate existing methylation analysis pipelines.
This read-me file contains a short guide to get started quickly with HPG-Methyl. Check the manual pages for more information about building, debugging, extending and using the software.
HPG-Methyl requires a working installation of GCC 4.4+, SCons and the following packages:
Library | Ubuntu / Debian | Red Hat / Fedora / Centos |
---|---|---|
ZLib | zlib1g-dev | zlib-devel |
Curl | libcurl4-gnutls-dev | libcurl-devel |
libxml | libxml2-dev | libxml2-devel |
ncurses | libncurses5-dev | ncurses-devel |
GNU GSL | libgsl0-dev | gsl-devel |
check | check | check-devel |
When all the packages are installed build a release executable with:
$ scons
Or a debug executable with:
$ scons debug=1
The binary will be created on the /bin directory.
To run HPG-Methyl, first the BWT index must be created. This process must be done only once per reference genome, using a FASTA reference genome file:
$ hpg-methyl build-index -g <your-fasta-file> -i <index-output-directory> -r 10 --bs-index
When the BWT index building has finished, HPG-Methyl can be used to map reads from a FASTQ file:
$ hpg-methyl bs -i <index-directory> -f <fastq-file-path> -o <output-directory> --cpu-threads <thread-count>
Or to map the reads and extract the methylation context status simultaneously:
$ hpg-methyl bs -i <index-directory> -f <fastq-file-path> -o <output-directory> --cpu-threads <thread-count> --write-mcontext
In order to test the application, the following public datasets are available on the GREV's external SFTP server.
The login details are:
- Username:
anonymous
- Password:
anonymous
- Hostname:
clariano.uv.es
- Homo sapiens GRCh37 reference genome:
sftp://anonymous@clariano.uv.es/datasets/Homo_sapiens.GRCh37.68.dna.fa
- SRR309230 (75 nt, 15 million samples):
sftp://anonymous@clariano.uv.es/datasets/real/SRR309230_1_075nt_15M.fastq
- SRR837425 (100 nt, 15 million samples):
sftp://anonymous@clariano.uv.es/datasets/real/SRR837425_1_100nt_15M.fastq
- 100 nt, 4 million samples:
sftp://anonymous@clariano.uv.es/datasets/synthethic/test_4M_100nt_n3_r010.bwa.read1.fastq_convert.fastq
- 150 nt, 4 million samples:
sftp://anonymous@clariano.uv.es/datasets/synthethic/test_4M_150nt_n3_r010.bwa.read1.fastq_convert.fastq
- 400 nt, 4 million samples:
sftp://anonymous@clariano.uv.es/datasets/synthethic/test_4M_400nt_n3_r010.bwa.read1.fastq_convert.fastq
- 800 nt, 4 million samples:
sftp://anonymous@clariano.uv.es/datasets/synthethic/test_4M_800nt_n3_r010.bwa.read1.fastq_convert.fastq
- 1600 nt, 4 million samples:
sftp://anonymous@clariano.uv.es/datasets/synthethic/test_4M_1600nt_n3_r010.bwa.read1.fastq_convert.fastq
- 3200 nt, 4 million samples:
sftp://anonymous@clariano.uv.es/datasets/synthethic/test_4M_3200nt_n3_r010.bwa.read1.fastq_convert.fastq
HPG-Methyl is free software and licensed under the GNU General Public License version 2. Check the COPYING file for more information.
Contact any of the following developers for any enquiry:
- Juanma Orduña (juan.orduna@uv.es).
- Mariano Pérez (mariano.perez@uv.es).
- Ricardo Olanda (ricardo.olanda@uv.es).
- César González (cesar.gonzalez-segura@uv.es).