High performance big data clustering

Ankit Agrawal, Md Mostofa Ali Patwary, William Hendrix, Wei Keng Liao, Alok Choudhary

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

7 Scopus citations

Abstract

Scientific advances are collectively exploding the amount, diversity, and complexity of data becoming available. Our ability to collect huge amounts of data has greatly surpassed our analytical capacity to make sense of it. Efficient use of high performance computing techniques is critical for the success of the data-driven paradigm to scientific discovery. Data clustering is one of the fundamental analytics tasks heavily relied upon in many application domains, like astrohpysics, climate science, bioinformatics, etc. In this book chapter, we illustrate the challenges and opportunities in mining big data using two recently developed scalable parallel clustering algorithms. Experimental results on millions of high-dimensional data points clustered in parallel on thousands of processor cores are also presented.

Original languageEnglish
Title of host publicationCloud Computing and Big Data
Pages192-211
Number of pages20
DOIs
StatePublished - 2013

Publication series

NameAdvances in Parallel Computing
Volume23
ISSN (Print)0927-5452

Keywords

  • big data
  • clustering
  • density-based clustering
  • hierarchical clustering

Fingerprint

Dive into the research topics of 'High performance big data clustering'. Together they form a unique fingerprint.

Cite this