TY - GEN
T1 - Automated estimation of food type and amount consumed from body-worn audio and motion sensors
AU - Mirtchouk, Mark
AU - Merck, Christopher
AU - Kleinberg, Samantha
PY - 2016/9/12
Y1 - 2016/9/12
N2 - Determining when an individual is eating can be useful for tracking behavior and identifying patterns, but to create nutrition logs automatically or provide real-time feedback to people with chronic disease, we need to identify both what they are consuming and in what quantity. However, food type and amount have mainly been estimated using image data (requiring user involvement) or acoustic sensors (tested with a restricted set of foods rather than representative meals). As a result, there is not yet a highly accurate automated nutrition monitoring method that can be used with a variety of foods. We propose that multi-modal sensing (in-ear audio plus head and wrist motion) can be used to more accurately classify food type, as audio and motion features provide complementary information. Further, we propose that knowing food type is critical for estimating amount consumed in combination with sensor data. To test this we use data from people wearing audio and motion sensors, with ground truth annotated from video and continuous scale data. With data from 40 unique foods we achieve a classification accuracy of 82.7% with a combination of sensors (versus 67.8% for audio alone and 76.2% for head and wrist motion). Weight estimation error was reduced from a baseline of 127.3% to 35.4% absolute relative error. Ultimately, our estimates of food type and amount can be linked to food databases to provide automated calorie estimates from continuously-collected data.
AB - Determining when an individual is eating can be useful for tracking behavior and identifying patterns, but to create nutrition logs automatically or provide real-time feedback to people with chronic disease, we need to identify both what they are consuming and in what quantity. However, food type and amount have mainly been estimated using image data (requiring user involvement) or acoustic sensors (tested with a restricted set of foods rather than representative meals). As a result, there is not yet a highly accurate automated nutrition monitoring method that can be used with a variety of foods. We propose that multi-modal sensing (in-ear audio plus head and wrist motion) can be used to more accurately classify food type, as audio and motion features provide complementary information. Further, we propose that knowing food type is critical for estimating amount consumed in combination with sensor data. To test this we use data from people wearing audio and motion sensors, with ground truth annotated from video and continuous scale data. With data from 40 unique foods we achieve a classification accuracy of 82.7% with a combination of sensors (versus 67.8% for audio alone and 76.2% for head and wrist motion). Weight estimation error was reduced from a baseline of 127.3% to 35.4% absolute relative error. Ultimately, our estimates of food type and amount can be linked to food databases to provide automated calorie estimates from continuously-collected data.
KW - Acoustic and motion sensing
KW - Eating recognition
KW - Nutrition
UR - http://www.scopus.com/inward/record.url?scp=84991508711&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84991508711&partnerID=8YFLogxK
U2 - 10.1145/2971648.2971677
DO - 10.1145/2971648.2971677
M3 - Conference contribution
AN - SCOPUS:84991508711
T3 - UbiComp 2016 - Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing
SP - 451
EP - 462
BT - UbiComp 2016 - Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing
T2 - 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp 2016
Y2 - 12 September 2016 through 16 September 2016
ER -