We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am trying to run Llama-3.1-8B with a seq > 512, and I get this error. Do I have to manually set position embeddings to get this to work?
from airllm import AutoModel model = AutoModel.from_pretrained("meta-llama/Meta-Llama-3.1-8B", delete_original=True) prompts = ["a " * 10000 for i in range(100)] model.tokenizer.pad_token = model.tokenizer.eos_token input_tokens = model.tokenizer(prompts, return_tensors="pt", truncation=True, padding=True, max_length=1024, ) generation_output = model.forward( input_ids=input_tokens['input_ids'].cuda(), attention_mask=input_tokens['attention_mask'].cuda(), use_cache=False, )
Fetching 13 files: 100% 13/13 [00:00<00:00, 986.88it/s] found_layers:{'model.embed_tokens.': True, 'model.layers.0.': True, 'model.layers.1.': True, 'model.layers.2.': True, 'model.layers.3.': True, 'model.layers.4.': True, 'model.layers.5.': True, 'model.layers.6.': True, 'model.layers.7.': True, 'model.layers.8.': True, 'model.layers.9.': True, 'model.layers.10.': True, 'model.layers.11.': True, 'model.layers.12.': True, 'model.layers.13.': True, 'model.layers.14.': True, 'model.layers.15.': True, 'model.layers.16.': True, 'model.layers.17.': True, 'model.layers.18.': True, 'model.layers.19.': True, 'model.layers.20.': True, 'model.layers.21.': True, 'model.layers.22.': True, 'model.layers.23.': True, 'model.layers.24.': True, 'model.layers.25.': True, 'model.layers.26.': True, 'model.layers.27.': True, 'model.layers.28.': True, 'model.layers.29.': True, 'model.layers.30.': True, 'model.layers.31.': True, 'model.norm.': True, 'lm_head.': True} saved layers already found in /root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3.1-8B/snapshots/48d6d0fc4e02fb1269b36940650a1b7233035cbb/splitted_model new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 3%|▎ | 1/35 [00:01<00:35, 1.04s/it] --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) [<ipython-input-14-c76626e36e7d>](https://localhost:8080/#) in <cell line: 15>() 13 ) 14 ---> 15 generation_output = model.forward( 16 input_ids=input_tokens['input_ids'].cuda(), 17 attention_mask=input_tokens['attention_mask'].cuda(), 7 frames [/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py](https://localhost:8080/#) in apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim) 217 cos = cos.unsqueeze(unsqueeze_dim) 218 sin = sin.unsqueeze(unsqueeze_dim) --> 219 q_embed = (q * cos) + (rotate_half(q) * sin) 220 k_embed = (k * cos) + (rotate_half(k) * sin) 221 return q_embed, k_embed RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 2
The text was updated successfully, but these errors were encountered:
fixed with
model.max_seq_len = 1024 model.init_model()
maybe should be in init option
Sorry, something went wrong.
No branches or pull requests
I am trying to run Llama-3.1-8B with a seq > 512, and I get this error. Do I have to manually set position embeddings to get this to work?
The text was updated successfully, but these errors were encountered: