CRII: OAC: High-Efficiency Serverless Computing Systems for Deep Learning: A Hybrid CPU/GPU Architecture

Project: Research project

Project Details

Description

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2). Next-generation serverless cloud computing provides developers with simplified access to server management and administration, including event-driven execution, fine-grained resource provisioning, auto-scaling, and pay-as-you-go billing. The machine learning community is taking advantage of these benefits of serverless cloud computing to ease the development and deployment of deep learning (DL) applications. However, existing serverless computing platforms lack efficient support for GPUs, impeding DL practitioners from utilizing serverless computing for large-scale applications. This project will develop an efficient serverless computing platform with a hybrid CPU/GPU architecture to accelerate DL application development and deployment. The goal is to advance cutting-edge methodologies in both deep learning and serverless computing, which will result in a significant leap forward to benefit DL practitioners, DL users, and providers of cloud computing infrastructures, contributing to science advancement for society. The research findings will also enhance undergraduate and graduate education with exciting examples and demonstrations of real-world systems at the intersection of distributed computing, cloud computing, and deep learning. The project will develop a novel serverless computing platform with a hybrid CPU/GPU architecture that will provide DL applications with native GPU performance. Two core components constitute the hybrid serverless computing architecture, a shim virtualized GPU (vGPU) layer and a refactored container subsystem. The shim vGPU layer enables high-performance GPU sharing for concurrent serverless functions with low latency and high scalability. This layer provides fine-grained GPU resource provisioning and performance isolation by intercepting GPU calls from serverless functions using API remoting techniques. The vGPU layer optimizes GPU performance in serverless computing via GPU context caching and locality-aware scheduling to mitigate cold-starts and unnecessary data movement. The container subsystem accelerates the entire DL lifecycle by exploiting DL model structures and pipelined model loading to parallelize CPU-to-GPU memory copy and model execution. The subsystem exploits model partitioning techniques to accelerate the hybrid CPU/GPU architecture by dynamically distributing the DL model partitions to CPU and GPU. The scientific knowledge and tools designed and implemented from this research project will provide and enable innovations for next-generation cloud computing and deep learning. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date1/10/2431/12/25

Funding

  • National Science Foundation

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.