Skip to main navigation Skip to search Skip to main content

Efficient Context Propagating Perceiver Architectures for Auto-Regressive Language Modeling

  • University of Rhode Island

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

One of the key challenges in Transformer architectures is the quadratic complexity of the attention mechanism, which limits the efficient processing of long sequences. Many recent research works have attempted to provide a reduction from the O(n2) time complexity of attention to semi-linear complexity. However, it remains an unsolved problem in the sense of maintaining high performance when complexity is reduced. One of the important works in this respect is the Perceiver class of architectures that have demonstrated excellent performance, while reducing the computation complexity. In this paper, we use the PerceiverAR as a basis and explore the design space of different trade-offs between preserving context and reducing attention complexity. To this end, we develop four new architectural paradigms, the best performing of which we denote as the Efficient Context propagating Perceiver (ECP). ECP has two major advantages over the PerceiverAR. First, the ECP architecture overcomes the main drawback of PercieverAR by utilizing both the context and the latent sequences in autoregressive training. Second, the ECP architecture operates with the same attention complexity as LongLoRA, making it computationally efficient. More importantly, via pairwise segment attention, it extracts better information resulting in improved language modeling. Empirically, we demonstrate that the ECP architecture significantly outperforms other state-of-the-art Transformer models on Wikitext-103, PG-19 and sCIFAR-10. Code available at: https://github.com/MetaMain/ECPTransformer.

Original languageEnglish
Title of host publicationECAI 2025 - 28th European Conference on Artificial Intelligence, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025 - Proceedings
EditorsInes Lynce, Nello Murano, Mauro Vallati, Serena Villata, Federico Chesani, Michela Milano, Andrea Omicini, Mehdi Dastani
Pages4428-4435
Number of pages8
ISBN (Electronic)9781643686318
DOIs
StatePublished - 21 Oct 2025
Event28th European Conference on Artificial Intelligence, ECAI 2025, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025 - Bologna, Italy
Duration: 25 Oct 202530 Oct 2025

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume413
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference28th European Conference on Artificial Intelligence, ECAI 2025, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025
Country/TerritoryItaly
CityBologna
Period25/10/2530/10/25

Fingerprint

Dive into the research topics of 'Efficient Context Propagating Perceiver Architectures for Auto-Regressive Language Modeling'. Together they form a unique fingerprint.

Cite this