KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
- Tianyi Zhang
- , Jonah Yi
- , Zhaozhuo Xu
- , Anshumali Shrivastava
- Rice University
Research output: Contribution to journal › Conference article › peer-review
16
Scopus
citations