Skip to content

A wrapper library for tokenizers from CodeNet Project.

License

Notifications You must be signed in to change notification settings

ashirafj/codenet-tokenizers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

codenet-tokenizers

A wrapper library for tokenizers from CodeNet Project.

Setup

Install the library from this repository.

pip install git+https://github.com/ashirafj/codenet-tokenizers

Usage

Initialize

For Python source code,

from codenet_tokenizers.tokenizers import PyTokenizer
tokenizer = PyTokenizer()

For C source code,

from codenet_tokenizers.tokenizers import CTokenizer
tokenizer = CTokenizer()

For C++ source code,

from codenet_tokenizers.tokenizers import CppTokenizer
tokenizer = CppTokenizer()

For Java source code,

from codenet_tokenizers.tokenizers import JavaTokenizer
tokenizer = JavaTokenizer()

Tokenize

To tokenize the source code, separate by each line, and remove unnecessary tokens,

normalized_tokens = tokenizer.normalize_separated(code)

Normalize

To normalize the source code based on tokenized results,

normalized_code = tokenizer.normalize(code)

About

A wrapper library for tokenizers from CodeNet Project.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages