TY - GEN
T1 - Integrity verification of K-means clustering outsourced to infrastructure as a service (IaaS) providers
AU - Liu, Ruilin
AU - Wang, Hui
AU - Mordohai, Philippos
AU - Xiong, Hui
N1 - Publisher Copyright:
Copyright © SIAM.
PY - 2013
Y1 - 2013
N2 - The Cloud-based infrastructure-as-a-service (IaaS) paradigm (e.g., Amazon EC2) enables a client who lacks computational resources to outsource her dataset and data mining tasks to the Cloud. However, as the Cloud may not be fully trusted, it raises serious concerns about the integrity of the mining results returned by the Cloud. To this end, in this paper, we provide a focused study about how to perform integrity verification of the κ-means clustering task outsourced to an IaaS provider. We consider the untrusted sloppy IaaS service provider that intends to return wrong clustering results by terminating the iterations early to save computational cost. We develop both probabilistic and deterministic verification methods to catch the incorrect clustering result by the service provider. The deterministic method returns 100% integrity guarantee with cost that is much cheaper than executing κ-means clustering locally, while the probabilistic method returns a probabilistic integrity guarantee with computational cost even cheaper than the deterministic approach. Our experimental results show that our verification methods can effectively and efficiently capture the sloppy service provider.
AB - The Cloud-based infrastructure-as-a-service (IaaS) paradigm (e.g., Amazon EC2) enables a client who lacks computational resources to outsource her dataset and data mining tasks to the Cloud. However, as the Cloud may not be fully trusted, it raises serious concerns about the integrity of the mining results returned by the Cloud. To this end, in this paper, we provide a focused study about how to perform integrity verification of the κ-means clustering task outsourced to an IaaS provider. We consider the untrusted sloppy IaaS service provider that intends to return wrong clustering results by terminating the iterations early to save computational cost. We develop both probabilistic and deterministic verification methods to catch the incorrect clustering result by the service provider. The deterministic method returns 100% integrity guarantee with cost that is much cheaper than executing κ-means clustering locally, while the probabilistic method returns a probabilistic integrity guarantee with computational cost even cheaper than the deterministic approach. Our experimental results show that our verification methods can effectively and efficiently capture the sloppy service provider.
KW - Cloud computing
KW - Data-mining-as-a-service
KW - Infrastructure as a Service (IaaS)
KW - Integrity
KW - κ-means clustering
UR - http://www.scopus.com/inward/record.url?scp=84936944100&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84936944100&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972832.70
DO - 10.1137/1.9781611972832.70
M3 - Conference contribution
AN - SCOPUS:84936944100
T3 - Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013
SP - 632
EP - 640
BT - Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013
A2 - Ghosh, Joydeep
A2 - Obradovic, Zoran
A2 - Dy, Jennifer
A2 - Zhou, Zhi-Hua
A2 - Kamath, Chandrika
A2 - Parthasarathy, Srinivasan
T2 - SIAM International Conference on Data Mining, SDM 2013
Y2 - 2 May 2013 through 4 May 2013
ER -