TY - JOUR
T1 - Mixture Statistic Metric Learning for Robust Human Action and Expression Recognition
AU - Dai, Shuanglu
AU - Man, Hong
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2018/10
Y1 - 2018/10
N2 - Background objects and textures in real-world video sequences often pose great challenges for human action and facial expression recognition. This paper proposes a mixture statistic metric learning for recognizing human actions and facial expressions in realistic 'in the wild' scenarios. In the proposed method, multiple statistics, including temporal means and covariance matrices, as well as parameters of spatial Gaussian mixture distributions, are explicitly mapped to or generated on symmetric positive definite Riemannian manifolds. An implicit mixture of Mahalanobis metrics is learned from the Riemannian manifolds. The learned metrics place similar pairs in local neighborhoods and dissimilar pairs in relatively orthogonal regions on a regularized manifold. The proposed metric learning method also explores the prior distributions within the multiple statistics in the video sequences. The proposed method is tested on five action video data sets and three facial expression data sets and is compared with various state-of-the-art methods. Recognition accuracy and computational efficiency are evaluated in terms of average recognition rates and computational times in seconds, respectively. Competitive performances achieved on both action and facial expression recognition tasks demonstrate the effectiveness of the proposed method.
AB - Background objects and textures in real-world video sequences often pose great challenges for human action and facial expression recognition. This paper proposes a mixture statistic metric learning for recognizing human actions and facial expressions in realistic 'in the wild' scenarios. In the proposed method, multiple statistics, including temporal means and covariance matrices, as well as parameters of spatial Gaussian mixture distributions, are explicitly mapped to or generated on symmetric positive definite Riemannian manifolds. An implicit mixture of Mahalanobis metrics is learned from the Riemannian manifolds. The learned metrics place similar pairs in local neighborhoods and dissimilar pairs in relatively orthogonal regions on a regularized manifold. The proposed metric learning method also explores the prior distributions within the multiple statistics in the video sequences. The proposed method is tested on five action video data sets and three facial expression data sets and is compared with various state-of-the-art methods. Recognition accuracy and computational efficiency are evaluated in terms of average recognition rates and computational times in seconds, respectively. Competitive performances achieved on both action and facial expression recognition tasks demonstrate the effectiveness of the proposed method.
KW - Action recognition
KW - facial expression recognition
KW - mixture statistical metric learning
UR - http://www.scopus.com/inward/record.url?scp=85033685425&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85033685425&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2017.2772026
DO - 10.1109/TCSVT.2017.2772026
M3 - Article
AN - SCOPUS:85033685425
SN - 1051-8215
VL - 28
SP - 2484
EP - 2499
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 10
M1 - 8103056
ER -