SelfExtend Attention for Mistral

Implementation of the Self-Extend paper that uses group attention to extend context windows of LLMs without fine-tuning/pre-training.

Overview

The SelfExtend mechanism modifies the standard attention mechanism in the Mistral model to improve its context capturing capabilities. This is achieved by extending the attention span of the model, allowing it to consider a broader context while making predictions. This enhancement is particularly useful in tasks involving long sequences of data.

Features

Compatibility: Designed to work with the Hugging Face Transformers library.
Extended Context: Currently it can take Mistral 7b's 8k context to 16k.
Grouped Attention: Utilize a novel attention mechanism that groups tokens to mitigate the positional O.O.D. issue

Requirements

To use this implementation, the following prerequisites must be met:

Python 3.10
PyTorch
Transformers Library

Installation

Clone the repository to your local machine and copy the modeling files into transformers/src/transformers/models/mistral

When initializing the weights specify the self_extend attention mechanism as such:

model = MistralForCausalLM.from_pretrained("hf_mistral-7B-v0.1", attn_implementation="self_extend")

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
bertviz_mistral.ipynb		bertviz_mistral.ipynb
configuration_mistral.py		configuration_mistral.py
modeling_mistral.py		modeling_mistral.py
modeling_utils.py		modeling_utils.py
readme.md		readme.md
selfextend.ipynb		selfextend.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SelfExtend Attention for Mistral

Overview

Features

Requirements

Installation

About

Releases

Packages

Languages

License

sdan/selfextend

Folders and files

Latest commit

History

Repository files navigation

SelfExtend Attention for Mistral

Overview

Features

Requirements

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages