Skip to content

The repo provides the "main" binary file generated with FastLLM for the latest mainstream ARMv8-based Android mobile devices (i.e. smartphones and tablets). The file has been tested compatible to Qualcomm Snapdragon 8+ Gen 1.

License

Notifications You must be signed in to change notification settings

henryyantq/mobile-on-device-GLM-inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Mobile on-Device Generative Language Model Inference

The repo provides the main binary file generated with FastLLM (https://github.com/ztxz16/fastllm.git) for the latest mainstream ARMv8-based Android mobile devices (i.e. smartphones and tablets). The file has been tested compatible to Qualcomm Snapdragon 8+ Gen 1, 8 Gen 2, and MediaTek Dimensity 8100 platform devices.

Step 1

Install Termux application on your Android device. Make sure your device has more than 6GB of RAM.

Step 2

Download a supported model file from HuggingFace. You only need to download the model file suffixed with ".flm". It is better downloaded directly with your target Android device.

Step 3

Download the main binary file. Copy or move the main file and the model file to a storage path of your target device, e.g. downloads.

Step 4

Open Termux, and execute the command: termux-setup-storage.

Step 5

Execute the command: mv storage/downloads/<model_filename>.flm . && mv storage/downloads/main . && chmod 777 main. Replace the <model_filename>.flm with the filename of your model, e.g. chatglm2-6b-int4.flm. (Notice that the example command works for you ONLY if you put the aforementioned 2 files under the downloads directory, which is STRONGLY recommended for common users!)

Step 6

Run the streamlined inference of the language model with the command: ./main -p <model_filename>.flm.

You're ready for the mobile on-device inference with the latest GLM(s)!

Hints:

  • If you encounter error like FORTIFY: read: count XXXXXXXX > SSIZE_MAX, you can try adding -l at the end of the command for low memory mode inference. For instance, it is observed that the Qwen-7B-Chat-int4.flm model cannot run without low memory mode on devices with ≤12GB RAM.

About

The repo provides the "main" binary file generated with FastLLM for the latest mainstream ARMv8-based Android mobile devices (i.e. smartphones and tablets). The file has been tested compatible to Qualcomm Snapdragon 8+ Gen 1.

Resources

License

Stars

Watchers

Forks

Packages

No packages published