TY - GEN
T1 - Memory-efficient query-driven community detection with application to complex disease associations
AU - Harenberg, Steve
AU - Seay, Ramona G.
AU - Ranshous, Stephen
AU - Padmanabhan, Kanchana
AU - Harlalka, Jitendra K.
AU - Schendel, Eric R.
AU - O'Brien, Michael P.
AU - Chirkova, Rada Y.
AU - Hendrix, William
AU - Choudhary, Alok N.
AU - Kumar, Vipin
AU - Doraiswamy, Murali
AU - Samatova, Nagiza F.
PY - 2014
Y1 - 2014
N2 - Community detection in real-world graphs presents a number of challenges. First, even if the number of detected communities grows linearly with the graph size, it becomes impossible to manually inspect each community for value added to the application knowledge base. Mining for communities with query nodes as knowledge priors could allow for filtering out irrelevant information and for enriching end-users knowledge associated with the problem of interest, such as discovery of genes functionally associated with the Alzheimer's (AD) biomarker genes. Second, the data-intensive nature of community enumeration challenges current approaches that often assume that the input graph and the detected communities fit in memory. As computer systems scale, DRAM memory sizes are not expected to increase linearly, while technologies such as SSD memories have the potential to provide much higher capacities at a lower power-cost point, and have a much lower latency than disks. Out-of-core algorithms and/or databaseinspired indexing could provide an opportunity for different design optimizations for query-driven community detection algorithms tuned for emerging architectures. Therefore, this work addresses the need for query-driven and memory-efficient community detection. Using maximal cliques as the community definition, due to their high signalto-noise ratio, we propose and systematically compare two contrasting methods: indexed-based and out-of-core. Both methods improve peak memory efficiency as much as 1000X compared to the state-of-the-art. However, the index-based method, which also has a 10-to-100-fold run time reduction, outperforms the out-of-core algorithm in most cases. The achieved scalability enables the discovery of diseases that are known to be or likely associated with Alzheimer's when the genome-scale network is mined with AD biomarker genes as knowledge priors.
AB - Community detection in real-world graphs presents a number of challenges. First, even if the number of detected communities grows linearly with the graph size, it becomes impossible to manually inspect each community for value added to the application knowledge base. Mining for communities with query nodes as knowledge priors could allow for filtering out irrelevant information and for enriching end-users knowledge associated with the problem of interest, such as discovery of genes functionally associated with the Alzheimer's (AD) biomarker genes. Second, the data-intensive nature of community enumeration challenges current approaches that often assume that the input graph and the detected communities fit in memory. As computer systems scale, DRAM memory sizes are not expected to increase linearly, while technologies such as SSD memories have the potential to provide much higher capacities at a lower power-cost point, and have a much lower latency than disks. Out-of-core algorithms and/or databaseinspired indexing could provide an opportunity for different design optimizations for query-driven community detection algorithms tuned for emerging architectures. Therefore, this work addresses the need for query-driven and memory-efficient community detection. Using maximal cliques as the community definition, due to their high signalto-noise ratio, we propose and systematically compare two contrasting methods: indexed-based and out-of-core. Both methods improve peak memory efficiency as much as 1000X compared to the state-of-the-art. However, the index-based method, which also has a 10-to-100-fold run time reduction, outperforms the out-of-core algorithm in most cases. The achieved scalability enables the discovery of diseases that are known to be or likely associated with Alzheimer's when the genome-scale network is mined with AD biomarker genes as knowledge priors.
UR - http://www.scopus.com/inward/record.url?scp=84959364960&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959364960&partnerID=8YFLogxK
U2 - 10.1137/1.9781611973440.115
DO - 10.1137/1.9781611973440.115
M3 - Conference contribution
AN - SCOPUS:84959364960
T3 - SIAM International Conference on Data Mining 2014, SDM 2014
SP - 1010
EP - 1018
BT - SIAM International Conference on Data Mining 2014, SDM 2014
A2 - Zaki, Mohammed
A2 - Obradovic, Zoran
A2 - Ning-Tan, Pang
A2 - Banerjee, Arindam
A2 - Kamath, Chandrika
A2 - Parthasarathy, Srinivasan
T2 - 14th SIAM International Conference on Data Mining, SDM 2014
Y2 - 24 April 2014 through 26 April 2014
ER -