TY - GEN
T1 - ZEROTH-ORDER FINE-TUNING OF LLMS WITH TRANSFERABLE STATIC SPARSITY
AU - Guo, Wentao
AU - Long, Jikai
AU - Zeng, Yimeng
AU - Liu, Zirui
AU - Yang, Xinyu
AU - Ran, Yide
AU - Gardner, Jacob
AU - Bastani, Osbert
AU - De Sa, Christopher
AU - Yu, Xiaodong
AU - Chen, Beidi
AU - Xu, Zhaozhuo
N1 - Publisher Copyright:
© 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, applying ZO fine-tuning in memory-constrained settings such as mobile phones and laptops remains challenging since these settings often involve weight quantization, while ZO requires full-precision perturbation and update. In this study, we address this limitation by combining static sparse ZO fine-tuning with quantization. Our approach transfers a small, static subset (0.1%) of "sensitive" parameters from pre-training to downstream tasks, focusing fine-tuning on this sparse set of parameters. The remaining untuned parameters are quantized, reducing memory demands. Our proposed workflow enables efficient ZO fine-tuning of an Llama2-7B model on a GPU device with less than 8GB of memory while outperforming full model ZO fine-tuning performance and in-context learning. We provide an open-source implementation at https://github.com/GarlGuo/SensZOQ.
AB - Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, applying ZO fine-tuning in memory-constrained settings such as mobile phones and laptops remains challenging since these settings often involve weight quantization, while ZO requires full-precision perturbation and update. In this study, we address this limitation by combining static sparse ZO fine-tuning with quantization. Our approach transfers a small, static subset (0.1%) of "sensitive" parameters from pre-training to downstream tasks, focusing fine-tuning on this sparse set of parameters. The remaining untuned parameters are quantized, reducing memory demands. Our proposed workflow enables efficient ZO fine-tuning of an Llama2-7B model on a GPU device with less than 8GB of memory while outperforming full model ZO fine-tuning performance and in-context learning. We provide an open-source implementation at https://github.com/GarlGuo/SensZOQ.
UR - https://www.scopus.com/pages/publications/105010216768
UR - https://www.scopus.com/pages/publications/105010216768#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:105010216768
T3 - 13th International Conference on Learning Representations, ICLR 2025
SP - 59924
EP - 59964
BT - 13th International Conference on Learning Representations, ICLR 2025
T2 - 13th International Conference on Learning Representations, ICLR 2025
Y2 - 24 April 2025 through 28 April 2025
ER -