Feature Extraction and Recognition for Human Action Recognition Jiajia Luo
How to automatically label videos containing human motions is the task of human action recognition. Traditional human action recognition algorithms use the RGB videos as input, and it is a challenging task because of the large intra-class variations of actions, cluttered background, possible camera movement, and illumination variations. Recently, the introduction of cost-effective depth cameras provides a new possibility to address difficult issues. However, it also brings new challenges such as noisy depth maps and time alignment. In this dissertation, effective and computationally efficient feature extraction and recognition algorithms are proposed for human action recognition.
At the feature extraction step, two novel spatial-temporal feature descriptors are proposed which can be combined with local feature detectors. The first proposed descriptor is the Shape and Motion Local Ternary Pattern (SMltp) descriptor which can dramatically reduced the number of features generated by dense sampling without sacrificing the accuracy. In addition, the Center-Symmetric Motion Local Ternary Pattern (CS-Mltp) descriptor is proposed, which describes the spatial and temporal gradients-like features. Both descriptors (SMltp and CS-Mltp) take advantage of the Local Binary Pattern (LBP) texture operator in terms of tolerance to illumination change, robustness in homogeneous region and computational efficiency.
For better feature representation, this dissertation presents a new Dictionary Learning (DL) method to learn an overcomplete set of representative vectors (atoms) so that any input feature can be approximated by a linear combination of these atoms with minimum reconstruction error. Instead of simultaneously learning one overcomplete dictionary for all classes, we learn class-specific sub-dictionaries to increase the discrimination. In addition, the group sparsity and the geometry constraint are added to the learning process to further increase the discriminative power, so that features are well reconstructed by atoms from the same class and features from the same class with high similarity will be forced to have similar coefficients.
To evaluate the proposed algorithms, three applications including single view action recognition, distributed multi-view action recognition, and RGB-D action recognition have been explored. Experimental results on benchmark datasets and comparative analyses with the state-of-the-art methods show the effectiveness and merits of the proposed algorithms.
Knoxville, Tennessee 37996 | 865-974-1000
The flagship campus of the University of Tennessee System