- C/C++
- MPI 3.0
- User-level checkpoint library
- ULFM (ver 1.0 support)
- GNU/Linux
- Unit test (cxxtest framework)
- head-2d - Laplace equation solver by Jacobi iteration method
- n-body - an n-body simulation approximates the motion of particles, often specifically particles that interact with one another through some type of physical forces.
- midpoint-rule
- monte-carlo
- nprimes
- Rollback recovery - checkpoint/restart based
- Failure detection - ULFM based
- Snapshot creation - hard drive based (in place/via NFS)
- Incremental chekpointing - delta encoding based (XOR operation)
- Aditional compress procedure - zlib based
- Survivability
- Fault-tollerance
- Compute redundancy
- Implementing alternative recovery fault tolerance methods
- Expanding test sample base
- Reducing overhead
- Improving impementation
This project has been implemented as a part of my graduate thesis in Computing Systems department of Siberian State University of Telecommunications and Information Scienses.
- Graduate student: Vladislav Markov
- Supervisor: Mikhail Kurnosov