Skip to content

An ultrafast and highly sensitive Next-Generation Sequencing (NGS) read mapper and methylation extractor.

License

Notifications You must be signed in to change notification settings

grev-uv/hpg-methyl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HPG-Methyl

HPG-Methyl is an ultrafast and highly sensitive Next-Generation Sequencing (NGS) read mapper and methylation context extractor. Compared with other current mapping and methylation extraction tools, HPG-Methyl offers better sensitivity and shorter execution times even for long reads.

Since the files generated by HPG-Methyl are fully compatible with the files generated by other popular tools, it can be used as a drop-in replacement to accelerate existing methylation analysis pipelines.

This read-me file contains a short guide to get started quickly with HPG-Methyl. Check the manual pages for more information about building, debugging, extending and using the software.

Building

HPG-Methyl requires a working installation of GCC 4.9.2+, SCons and the following packages:

Library Ubuntu / Debian Red Hat / Fedora / Centos
ZLib zlib1g-dev zlib-devel
Curl libcurl4-gnutls-dev libcurl-devel
libxml libxml2-dev libxml2-devel
ncurses libncurses5-dev ncurses-devel
GNU GSL libgsl0-dev gsl-devel
check check check-devel

When all the packages are installed build a release executable with:

$ scons

Or a debug executable with:

$ scons debug=1

The binary will be created on the /bin directory.

Running

To run HPG-Methyl, first the BWT index must be created. This process must be done only once per reference genome, using a FASTA reference genome file:

$ hpg-methyl build-index -g <your-fasta-file> -i <index-output-directory> -r 10 --bs-index

When the BWT index building has finished, HPG-Methyl can be used to map reads from a FASTQ file:

$ hpg-methyl bs -i <index-directory> -f <fastq-file-path> -o <output-directory> --cpu-threads <thread-count>

Or to map the reads and extract the methylation context status simultaneously:

$ hpg-methyl bs -i <index-directory> -f <fastq-file-path> -o <output-directory> --cpu-threads <thread-count> --write-mcontext

Example datasets

In order to test the application, the following public datasets are available on the GREV's external SFTP server.

The login details are:

  • Username: anonymous
  • Password: anonymous
  • Hostname: clariano.uv.es

Reference genome

  • Homo sapiens GRCh37 reference genome: sftp://anonymous@clariano.uv.es/datasets/Homo_sapiens.GRCh37.68.dna.fa

Real bisulphite treated sequences

  • SRR309230 (75 nt, 15 million samples): sftp://anonymous@clariano.uv.es/datasets/real/SRR309230_1_075nt_15M.fastq
  • SRR837425 (100 nt, 15 million samples): sftp://anonymous@clariano.uv.es/datasets/real/SRR837425_1_100nt_15M.fastq

Synthethic bisulphite treated sequences

  • 100 nt, 4 million samples: sftp://anonymous@clariano.uv.es/datasets/synthethic/test_4M_100nt_n3_r010.bwa.read1.fastq_convert.fastq
  • 150 nt, 4 million samples: sftp://anonymous@clariano.uv.es/datasets/synthethic/test_4M_150nt_n3_r010.bwa.read1.fastq_convert.fastq
  • 400 nt, 4 million samples: sftp://anonymous@clariano.uv.es/datasets/synthethic/test_4M_400nt_n3_r010.bwa.read1.fastq_convert.fastq
  • 800 nt, 4 million samples: sftp://anonymous@clariano.uv.es/datasets/synthethic/test_4M_800nt_n3_r010.bwa.read1.fastq_convert.fastq
  • 1600 nt, 4 million samples: sftp://anonymous@clariano.uv.es/datasets/synthethic/test_4M_1600nt_n3_r010.bwa.read1.fastq_convert.fastq
  • 3200 nt, 4 million samples: sftp://anonymous@clariano.uv.es/datasets/synthethic/test_4M_3200nt_n3_r010.bwa.read1.fastq_convert.fastq

License

HPG-Methyl is free software and licensed under the GNU General Public License version 2. Check the COPYING file for more information.

Contact

Contact any of the following developers for any enquiry: