Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Position Embedding with Seq > 512 #165

Open
Codys12 opened this issue Aug 5, 2024 · 1 comment
Open

Position Embedding with Seq > 512 #165

Codys12 opened this issue Aug 5, 2024 · 1 comment

Comments

@Codys12
Copy link

Codys12 commented Aug 5, 2024

I am trying to run Llama-3.1-8B with a seq > 512, and I get this error. Do I have to manually set position embeddings to get this to work?

from airllm import AutoModel
model = AutoModel.from_pretrained("meta-llama/Meta-Llama-3.1-8B", delete_original=True)

prompts = ["a " * 10000 for i in range(100)]
model.tokenizer.pad_token = model.tokenizer.eos_token

input_tokens = model.tokenizer(prompts,
    return_tensors="pt",
    truncation=True,
    padding=True,
    max_length=1024,

)

generation_output = model.forward(
    input_ids=input_tokens['input_ids'].cuda(),
    attention_mask=input_tokens['attention_mask'].cuda(),
    use_cache=False,
)
Fetching 13 files: 100%
 13/13 [00:00<00:00, 986.88it/s]
found_layers:{'model.embed_tokens.': True, 'model.layers.0.': True, 'model.layers.1.': True, 'model.layers.2.': True, 'model.layers.3.': True, 'model.layers.4.': True, 'model.layers.5.': True, 'model.layers.6.': True, 'model.layers.7.': True, 'model.layers.8.': True, 'model.layers.9.': True, 'model.layers.10.': True, 'model.layers.11.': True, 'model.layers.12.': True, 'model.layers.13.': True, 'model.layers.14.': True, 'model.layers.15.': True, 'model.layers.16.': True, 'model.layers.17.': True, 'model.layers.18.': True, 'model.layers.19.': True, 'model.layers.20.': True, 'model.layers.21.': True, 'model.layers.22.': True, 'model.layers.23.': True, 'model.layers.24.': True, 'model.layers.25.': True, 'model.layers.26.': True, 'model.layers.27.': True, 'model.layers.28.': True, 'model.layers.29.': True, 'model.layers.30.': True, 'model.layers.31.': True, 'model.norm.': True, 'lm_head.': True}
saved layers already found in /root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3.1-8B/snapshots/48d6d0fc4e02fb1269b36940650a1b7233035cbb/splitted_model
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
running layers(cuda:0):   3%|▎         | 1/35 [00:01<00:35,  1.04s/it]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-14-c76626e36e7d>](https://localhost:8080/#) in <cell line: 15>()
     13 )
     14 
---> 15 generation_output = model.forward(
     16     input_ids=input_tokens['input_ids'].cuda(),
     17     attention_mask=input_tokens['attention_mask'].cuda(),

7 frames
[/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py](https://localhost:8080/#) in apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim)
    217     cos = cos.unsqueeze(unsqueeze_dim)
    218     sin = sin.unsqueeze(unsqueeze_dim)
--> 219     q_embed = (q * cos) + (rotate_half(q) * sin)
    220     k_embed = (k * cos) + (rotate_half(k) * sin)
    221     return q_embed, k_embed

RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 2
@Codys12
Copy link
Author

Codys12 commented Aug 5, 2024

fixed with

model.max_seq_len = 1024
model.init_model()

maybe should be in init option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant