ZEROTH-ORDER FINE-TUNING OF LLMS WITH TRANSFERABLE STATIC SPARSITY

  • Wentao Guo
  • , Jikai Long
  • , Yimeng Zeng
  • , Zirui Liu
  • , Xinyu Yang
  • , Yide Ran
  • , Jacob Gardner
  • , Osbert Bastani
  • , Christopher De Sa
  • , Xiaodong Yu
  • , Beidi Chen
  • , Zhaozhuo Xu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, applying ZO fine-tuning in memory-constrained settings such as mobile phones and laptops remains challenging since these settings often involve weight quantization, while ZO requires full-precision perturbation and update. In this study, we address this limitation by combining static sparse ZO fine-tuning with quantization. Our approach transfers a small, static subset (0.1%) of "sensitive" parameters from pre-training to downstream tasks, focusing fine-tuning on this sparse set of parameters. The remaining untuned parameters are quantized, reducing memory demands. Our proposed workflow enables efficient ZO fine-tuning of an Llama2-7B model on a GPU device with less than 8GB of memory while outperforming full model ZO fine-tuning performance and in-context learning. We provide an open-source implementation at https://github.com/GarlGuo/SensZOQ.

Original languageEnglish
Title of host publication13th International Conference on Learning Representations, ICLR 2025
Pages59924-59964
Number of pages41
ISBN (Electronic)9798331320850
StatePublished - 2025
Event13th International Conference on Learning Representations, ICLR 2025 - Singapore, Singapore
Duration: 24 Apr 202528 Apr 2025

Publication series

Name13th International Conference on Learning Representations, ICLR 2025

Conference

Conference13th International Conference on Learning Representations, ICLR 2025
Country/TerritorySingapore
CitySingapore
Period24/04/2528/04/25

Fingerprint

Dive into the research topics of 'ZEROTH-ORDER FINE-TUNING OF LLMS WITH TRANSFERABLE STATIC SPARSITY'. Together they form a unique fingerprint.

Cite this