rb-tokenizer

rb-tokenizer is a flexible, rule-based tokenizer written in Rust, designed to make text tokenization customizable and extendable. It supports a wide range of applications, from simple text parsing to complex programming language lexers.

Features

Customizable Tokenization: Easily define your own tokenization rules with regular expressions and symbols.
Extensible Architecture: Add new rule types to suit your specific tokenization needs.
Performance: Optimized for speed and efficiency, handling large texts swiftly.
Easy Integration: Designed to be integrated into larger parsing or text analysis projects.

Getting Started

Prerequisites

Ensure you have Rust installed on your system. You can download Rust and cargo via rustup.

Installation

Add rb-tokenizer to your Cargo.toml:

[dependencies]
rb-tokenizer = { git = "https://github.com/maniartech/rb-tokenizer.git" }

Basic Usage

To use rb-tokenizer in your project, start by creating a Tokenizer instance and adding rules:

use rb_tokenizer::Tokenizer;

let mut tokenizer = Tokenizer::new();

tokenizer.add_regex_rule(r"^\d+", "Number", None);
tokenizer.add_regex_rule(r"^[a-zA-Z_][a-zA-Z0-9_]*", "Identifier", None);
tokenizer.add_symbol_rule("(", "Operator", Some("OpenParen"));
tokenizer.add_symbol_rule(")", "Operator", Some("CloseParen"));
tokenizer.add_symbol_rule("+", "Operator", Some("Plus"));

let tokens = tokenizer.tokenize("ADD(2 + 2)").unwrap();
println!("{:?}", tokens);
// Output:
// Ok([
//  Token { token_type: "Identifier", token_sub_type: None, value: "ADD", line: 1, column: 1 },
//  Token { token_type: "Operator", token_sub_type: Some("OpenParen"), value: "(", line: 1, column: 4 },
//  Token { token_type: "Number", token_sub_type: None, value: "2", line: 1, column: 5 },
//  Token { token_type: "Operator", token_sub_type: Some("Plus"), value: "+", line: 1, column: 7 },
//  Token { token_type: "Number", token_sub_type: None, value: "2", line: 1, column: 12 },
//  Token { token_type: "Operator", token_sub_type: Some("CloseParen"), value: ")", line: 1, column: 13 }
// ])

Examples

You can find more examples in the tests/ directory of the repository, demonstrating various use cases and configurations.

Contributing

Contributions to rb-tokenizer are welcome! Here are a few ways you can help:

Reporting Issues: Found a bug or have a feature request? Please open an issue.
Pull Requests: Want to contribute code? Pull requests are warmly welcomed. Please ensure your code adheres to the project's coding standards and includes tests, if applicable.
Documentation: Improvements to documentation or new examples are always appreciated.

Before contributing, please read our CONTRIBUTING.md guide.

License

rb-tokenizer is distributed under the MIT License. See LICENSE for more information.

Acknowledgments

Inspired by the flexibility of rule-based tokenization in various programming languages and frameworks.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.vscode		.vscode
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rb-tokenizer

Features

Getting Started

Prerequisites

Installation

Basic Usage

Examples

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

License

maniartech/rb-tokenizer

Folders and files

Latest commit

History

Repository files navigation

rb-tokenizer

Features

Getting Started

Prerequisites

Installation

Basic Usage

Examples

Contributing

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages