TY - JOUR
T1 - MobiLLM
T2 - Enabling On-Device Fine-Tuning of Billion-Sized LLMs via Server-Assisted Side-Tuning
AU - Li, Liang
AU - Yang, Xingke
AU - Wu, Wen
AU - Wang, Hao
AU - Ohtsuki, Tomoaki
AU - Fu, Xin
AU - Pan, Miao
AU - Shen, Xuemin
N1 - Publisher Copyright:
© 2007-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - On-device fine-tuning of large language models (LLMs) has attracted a lot of attention because of its tailoring personalized models while retaining user data locally on the mobile device. However, it faces significant challenges due to prohibitive memory requirements and slow training speeds. In this paper, we propose MobiLLM, a novel scheme enabling memory-efficient LLM fine-tuning on a single mobile device via server-assisted side-tuning. Particularly, MobiLLM strategically offloads backpropagation computations to an edge server while allowing the resource-constrained mobile device to retain merely a pretrained backbone model with frozen parameters during finetuning. It constructs a backpropagation bypass via parallel adapters decoupled from the backbone. During forward propagation, the device employs low bitwidth quantization for transmitting intermediate activations to the server to reduce communication overhead. The advantage of MobiLLM lies in: 1) confining training data strictly to the mobile device, and 2) eliminating on-device backpropagation while overlapping local computations with server execution. Collectively, MobiLLM ensures the data never leaves the local mobile device while significantly reducing mobile memory and computational burdens. We implement MobiLLM on several popular mobile devices, including NVIDIA Jetson Xavier NX and CPU-only laptops. Extensive experimental results demonstrate that MobiLLM can enable a resource-constrained mobile device to fine-tune billion-sized LLMs, achieving up to 4x memory reduction and 2.3x faster convergence as compared to state-of-the-art baselines.
AB - On-device fine-tuning of large language models (LLMs) has attracted a lot of attention because of its tailoring personalized models while retaining user data locally on the mobile device. However, it faces significant challenges due to prohibitive memory requirements and slow training speeds. In this paper, we propose MobiLLM, a novel scheme enabling memory-efficient LLM fine-tuning on a single mobile device via server-assisted side-tuning. Particularly, MobiLLM strategically offloads backpropagation computations to an edge server while allowing the resource-constrained mobile device to retain merely a pretrained backbone model with frozen parameters during finetuning. It constructs a backpropagation bypass via parallel adapters decoupled from the backbone. During forward propagation, the device employs low bitwidth quantization for transmitting intermediate activations to the server to reduce communication overhead. The advantage of MobiLLM lies in: 1) confining training data strictly to the mobile device, and 2) eliminating on-device backpropagation while overlapping local computations with server execution. Collectively, MobiLLM ensures the data never leaves the local mobile device while significantly reducing mobile memory and computational burdens. We implement MobiLLM on several popular mobile devices, including NVIDIA Jetson Xavier NX and CPU-only laptops. Extensive experimental results demonstrate that MobiLLM can enable a resource-constrained mobile device to fine-tune billion-sized LLMs, achieving up to 4x memory reduction and 2.3x faster convergence as compared to state-of-the-art baselines.
KW - Large language model
KW - memory efficiency
KW - on-device fine-tuning
KW - transformer
UR - https://www.scopus.com/pages/publications/105022262178
UR - https://www.scopus.com/pages/publications/105022262178#tab=citedBy
U2 - 10.1109/JSTSP.2025.3633550
DO - 10.1109/JSTSP.2025.3633550
M3 - Article
AN - SCOPUS:105022262178
SN - 1932-4553
VL - 19
SP - 1251
EP - 1265
JO - IEEE Journal on Selected Topics in Signal Processing
JF - IEEE Journal on Selected Topics in Signal Processing
IS - 7
ER -