Abstract: This work proposes a key pose based intelligent system for recognition of human interactions from video streams. In addition to interaction recognition, the task is useful for some of other applications like content based video retrieval. The main idea is to use the shape of the bilateral silhouette between the persons and analyze it using shape context descriptor, which is one of the popular shape descriptors in object recognition and matching tasks. At first, a dictionary from random samples for the whole classes is collected and the bilateral silhouette image is extracted for all samples and classes to train the low level classifier named frame classifier. Then, the frames of test sequence are compared with these samples and labeled as one class using frame classifier. Finally, a high level classifier is used to categorize the interaction as a function of predefined labels of frame sequence. We call this classifier as the sequence classifier. Because of probable errors in foreground extraction, some faults may occur in frame classification. Moreover, each interaction sequence is composed of two types of frames, which contain related or unrelated information about interaction. To tackle the problem, a normalized histogram of the frame labels is used as the action descriptor, which is robust against misclassification of some frames. This histogram is applied to a sequence classifier like random decision forests (RDF), Probabilistic Neural Network (PNN) or Support Vector Machine (SVM) to perform interaction recognition. Experimental results on SBU and UT-interaction dataset emphasize the privileged performance of the proposed method.
Keywords: Human interaction recognition, bilateral silhouette, key pose, high level classifier, low level classifier