One-Pass Distribution Sketch for Measuring Data Heterogeneity in Federated Learning

Zichang Liu, Benjamin Coleman, Zhaozhuo Xu, Anshumali Shrivastava

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Federated learning (FL) is a machine learning paradigm where multiple client devices train models collaboratively without data exchange. Data heterogeneity problem is naturally inherited in FL since data in different clients follow diverse distributions. To mitigate the negative influence of data heterogeneity, we need to start by measuring it across clients. However, the efficient measurement between distributions is a challenging problem, especially in high dimensionality. In this paper, we propose a one-pass distribution sketch to represent the client data distribution. Our sketching algorithm only requires a single pass of the client data, which is efficient in terms of time and memory. Moreover, we show in both theory and practice that the distance between two distribution sketches represents the divergence between their corresponding distributions. Furthermore, we demonstrate with extensive experiments that our distribution sketch improves the client selection in the FL training. We also showcase that our distribution sketch is an efficient solution to the cold start problem in FL for new clients with unlabeled data.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
EditorsA. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
ISBN (Electronic)9781713899921
StatePublished - 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: 10 Dec 202316 Dec 2023

Publication series

NameAdvances in Neural Information Processing Systems
Volume36
ISSN (Print)1049-5258

Conference

Conference37th Conference on Neural Information Processing Systems, NeurIPS 2023
Country/TerritoryUnited States
CityNew Orleans
Period10/12/2316/12/23

Fingerprint

Dive into the research topics of 'One-Pass Distribution Sketch for Measuring Data Heterogeneity in Federated Learning'. Together they form a unique fingerprint.

Cite this