TY - JOUR
T1 - Self-Expressive Dictionary Learning for Dynamic 3D Reconstruction
AU - Zheng, Enliang
AU - Ji, Dinghuang
AU - Dunn, Enrique
AU - Frahm, Jan Michael
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2018/9/1
Y1 - 2018/9/1
N2 - We target the problem of sparse 3D reconstruction of dynamic objects observed by multiple unsynchronized video cameras with unknown temporal overlap. To this end, we develop a framework to recover the unknown structure without sequencing information across video sequences. Our proposed compressed sensing framework poses the estimation of 3D structure as the problem of dictionary learning, where the dictionary is defined as an aggregation of the temporally varying 3D structures. Given the smooth motion of dynamic objects, we observe any element in the dictionary can be well approximated by a sparse linear combination of other elements in the same dictionary (i.e., self-expression). Our formulation optimizes a biconvex cost function that leverages a compressed sensing formulation and enforces both structural dependency coherence across video streams, as well as motion smoothness across estimates from common video sources. We further analyze the reconstructability of our approach under different capture scenarios, and its comparison and relation to existing methods. Experimental results on large amounts of synthetic data as well as real imagery demonstrate the effectiveness of our approach.
AB - We target the problem of sparse 3D reconstruction of dynamic objects observed by multiple unsynchronized video cameras with unknown temporal overlap. To this end, we develop a framework to recover the unknown structure without sequencing information across video sequences. Our proposed compressed sensing framework poses the estimation of 3D structure as the problem of dictionary learning, where the dictionary is defined as an aggregation of the temporally varying 3D structures. Given the smooth motion of dynamic objects, we observe any element in the dictionary can be well approximated by a sparse linear combination of other elements in the same dictionary (i.e., self-expression). Our formulation optimizes a biconvex cost function that leverages a compressed sensing formulation and enforces both structural dependency coherence across video streams, as well as motion smoothness across estimates from common video sources. We further analyze the reconstructability of our approach under different capture scenarios, and its comparison and relation to existing methods. Experimental results on large amounts of synthetic data as well as real imagery demonstrate the effectiveness of our approach.
KW - Dictionary learning
KW - dynamic 3D reconstruction
KW - self-expression
KW - unsynchronized videos
UR - http://www.scopus.com/inward/record.url?scp=85028512510&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85028512510&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2017.2742950
DO - 10.1109/TPAMI.2017.2742950
M3 - Article
C2 - 28841551
AN - SCOPUS:85028512510
SN - 0162-8828
VL - 40
SP - 2223
EP - 2237
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 9
M1 - 8014489
ER -