Bitextor generates translation memories from multilingual websites
-
Updated
Nov 11, 2024 - Python
Bitextor generates translation memories from multilingual websites
AutoCorpus is a tool backed by a large language model (LLM) for automatically generating corpus files for fuzzing.
A parser for annotated MuseScore 3 files.
A full-text article retrieval pipeline for biomedical literature.
A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.
Augmentation scripts for the bAbI Dialog Tasks dataset
A set of corpus-based sampling & analysis M4L devices
A clean Fusha Arabic tagged corpus.
golden arabic corpus build for test Assem's arabicstemmer and other arabic stemmers
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
A corpus builder for evaluation of plagiarism detection tools
Scrimshaw parses IRC logs stored in the driftwood format for quotes attributable to a given user. Written in Rust.
The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing
Generate pseudo-English sentences for research in semantic composition
Natively log WeeChat channel and private messages, CTCP, and notices, in the driftwood standard. Written in Python.
Information Retrieval Lab
A prototype for generating language in a grounded simulation of a simple hunter-gatherer world
Create a corpus for fine-tuning an OpenAI model
Add a description, image, and links to the corpus-generator topic page so that developers can more easily learn about it.
To associate your repository with the corpus-generator topic, visit your repo's landing page and select "manage topics."