Org Mode parser re-write in Rust
Org is probably the best and most complete plain text organizational system known to mankind. It has countless applications:
- authoring,
- publishing,
- technical documentation,
- literate programming,
- task and time tracking,
- journalling,
- blogging,
- agendas,
- wikis, and many more.
Org has a limited presence outside of GNU Emacs, and the standard implementation is that of GNU Emacs written in Emacs lisp. Despite the fact that GNU Emacs is widely available and widely used as a lisp interpreter, there are applications wherein using Emacs to properly handle and parse an Org file would be problematic. This package is intended to replicate the exact behaviour of Org as implemented in Emacs lisp, while decoupling the use of Org from Emacs, such that it can be used in a content management system, as a blogging backend, etc.
Many attempts have been made. The most faithful not written in a lisp is the implementaiton in pandoc
.
The reason why there have been relatively few attemps is that Org's syntax is not trivial. Most of Org's syntax is context-sensitive with only a few context-free elements. This results in a higher complexity and problematic testing of implementations, as unit testing of small chunks of Org code does not guarantee correct parsing of the Org file as a whole.
To this end, Org parsers org-ruby and pandoc, have chosen to focus on a restricted subset of Org's syntax. More ambitious projects try to cover all features but since Org does not have a formal specification1 they rely on observed Org's behavior in Emacs or author's intuition. As a result they rarely get finished.
This project aims to be a faithful one-to-one recreation of the Emacs lisp code translated into Rust. As such it eschews many of the problems with writing an ad-hoc parser, and allows for easy adaptation in case the official Org specification (that of the reference parser), changes.
Check out our FAQ for more information. Also feel free to open an issue and/or discussion.
org-rs
is guided by the following principles:
- Be a faithful recreation of the original implementation, not a competing standard or implementation.
- Be standalone, embeddable and reusable.
- Adapt to the users' needs, rather than impose adherence to specific ecosystems.
- Be fast.
These are the choices that were made to achieve the goals:
- Use Rust. It's fast, memory safe, and has a healthy package ecosystem. It also can be linked both statically and dynamically against C code.
- Adhere to the original emacs lisp implementation in terms of structure and organisation.
- Adhere to idiomatic Rust wherever else possible.
These decisions result in a clear scope, and completion criteria, as well as easily verifiable replication of the behaviour of the original lisp implementation.
element - parser crate is currently the main and only focus. It should perform just 2 tasks. Generate concrete syntax tree and serialize it back to canonical Org representation.
The rest of the roadmap is not fully flashed out. Feature-complete parser opens a lot of possibilities, here are just a few of my ideas:
-
Parse tree manipulation tools (like exporting to other formats).
-
Language server - a way to solve "the matrix" problem. Enabling other editors to have their own org-mode would be a logical next step.
-
CLI tools. I'd love to get integration with TaskWarrior and maybe even use Org as TaskWarrior's DOM.
Any contributions are welcome:
- Code
- Documentation
- Verification
- Spreading the word
- Using in your project in a cool way
If you want to contribute code, please check out the contribution guide.
Got a question? Open a discussion.
- vim-orgmode
- orgajs nodejs
- orgnode python
- org-ruby ruby
- and many others
- Org-Mode Is One of the Most Reasonable Markup Languages to Use for Text
- Awesome guide about org-mode
- teaser
Footnotes
-
Some attempts were made to formalize the syntax. This project uses them as supplementary materials. See contribution guide for details. ↩