Skip to main navigation Skip to search Skip to main content

KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization

  • Tianyi Zhang
  • , Jonah Yi
  • , Zhaozhuo Xu
  • , Anshumali Shrivastava
  • Rice University

Research output: Contribution to journalConference articlepeer-review

16 Scopus citations

Fingerprint

Dive into the research topics of 'KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization'. Together they form a unique fingerprint.
Sort by

Computer Science