Pre-Warming is Not Enough: Accelerating Serverless Inference With Opportunistic Pre-Loading

Yifan Sui, Hanfei Yu, Yitao Hu, Jianxun Li, Hao Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Serverless computing has rapidly prospered as a new cloud computing paradigm with agile scalability, pay-as-you-go pricing, and ease-to-use features for Machine Learning (ML) inference tasks. Users package their ML code into lightweight serverless functions and execute them using containers. Unfortunately, a notorious problem, called cold-starts, hinders serverless computing from providing low-latency function executions. To mitigate cold-starts, pre-warming, which keeps containers warm predictively, has been widely accepted by academia and industry. However, pre-warming fails to eliminate the unique latency incurred by loading ML artifacts. We observed that for ML inference functions, the loading of libraries and models takes significantly more time than container warming. Consequently, pre-warming alone is not enough to mitigate the ML inference function's cold-starts. This paper introduces InstaInfer, an opportunistic preloading technique to achieve instant inference by eliminating the latency associated with loading ML artifacts, thereby achieving minimal time cost in function execution. InstaInfer fully utilizes the memory of warmed containers to preload the function's libraries and model, striking a balance between maximum acceleration and resource wastage. We design InstaInfer to be transparent to providers and compatible with existing pre-warming solutions. Experiments on OpenWhisk with real-world workloads show that InstaInfer reduces up to 93% loading latency and achieves up to 8× speedup compared to state-of-the-art pre-warming solutions.

Original languageEnglish
Title of host publicationSoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing
Pages178-195
Number of pages18
ISBN (Electronic)9798400712869
DOIs
StatePublished - 20 Nov 2024
Event15th Annual ACM Symposium on Cloud Computing, SoCC 2024 - Redmond, United States
Duration: 20 Nov 202422 Nov 2024

Publication series

NameSoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing

Conference

Conference15th Annual ACM Symposium on Cloud Computing, SoCC 2024
Country/TerritoryUnited States
CityRedmond
Period20/11/2422/11/24

Keywords

  • Cloud Computing
  • Cold-Start
  • Machine Learning
  • Serverless Computing

Fingerprint

Dive into the research topics of 'Pre-Warming is Not Enough: Accelerating Serverless Inference With Opportunistic Pre-Loading'. Together they form a unique fingerprint.

Cite this