MobiLLM: Enabling On-Device Fine-Tuning of Billion-Sized LLMs via Server-Assisted Side-Tuning

  • Liang Li
  • , Xingke Yang
  • , Wen Wu
  • , Hao Wang
  • , Tomoaki Ohtsuki
  • , Xin Fu
  • , Miao Pan
  • , Xuemin Shen

Research output: Contribution to journalArticlepeer-review

Abstract

On-device fine-tuning of large language models (LLMs) has attracted a lot of attention because of its tailoring personalized models while retaining user data locally on the mobile device. However, it faces significant challenges due to prohibitive memory requirements and slow training speeds. In this paper, we propose MobiLLM, a novel scheme enabling memory-efficient LLM fine-tuning on a single mobile device via server-assisted side-tuning. Particularly, MobiLLM strategically offloads backpropagation computations to an edge server while allowing the resource-constrained mobile device to retain merely a pretrained backbone model with frozen parameters during finetuning. It constructs a backpropagation bypass via parallel adapters decoupled from the backbone. During forward propagation, the device employs low bitwidth quantization for transmitting intermediate activations to the server to reduce communication overhead. The advantage of MobiLLM lies in: 1) confining training data strictly to the mobile device, and 2) eliminating on-device backpropagation while overlapping local computations with server execution. Collectively, MobiLLM ensures the data never leaves the local mobile device while significantly reducing mobile memory and computational burdens. We implement MobiLLM on several popular mobile devices, including NVIDIA Jetson Xavier NX and CPU-only laptops. Extensive experimental results demonstrate that MobiLLM can enable a resource-constrained mobile device to fine-tune billion-sized LLMs, achieving up to 4x memory reduction and 2.3x faster convergence as compared to state-of-the-art baselines.

Original languageEnglish
Pages (from-to)1251-1265
Number of pages15
JournalIEEE Journal on Selected Topics in Signal Processing
Volume19
Issue number7
DOIs
StatePublished - 2025

Keywords

  • Large language model
  • memory efficiency
  • on-device fine-tuning
  • transformer

Fingerprint

Dive into the research topics of 'MobiLLM: Enabling On-Device Fine-Tuning of Billion-Sized LLMs via Server-Assisted Side-Tuning'. Together they form a unique fingerprint.

Cite this