Oregon State University



Event Details

PhD Oral Preliminary Examination – Mohamed Amer

Friday, December 14, 2012 9:00 AM - 11:00 AM

Hierarchical Graphical Models for Activity Recognition in Videos
The goal of this research is to investigate the problem of human activity recognition in videos using Hierarchical Graphical Models. Given a video, we would like to recognize human activities, localize video parts where these activities occur, and detect actors involved in them. Human activities of interest have stochastic structure. This means that they are characterized by variable space-time arrangements of primitive actions, and conducted by a variable number of actors. We study two modeling paradigms for representing, learning, and inferring activities. The first model is Sum Product Networks (SPN). SPNs are deep models with exact inference. We use SPNs to model a mixture of bags-of-words (BoWs) with exponentially many mixture components, where sub-components are reused by larger ones. A BoW is a low level representation of visual cues extracted from video pixels. SPNs consists of terminal nodes representing BoWs, and product and sum nodes organized in a number of layers. The products are aimed at encoding particular configurations of primitive actions, and the sums serve to capture their alternative configurations. The second model is And-Or Graphs (AOG). AOGs are more expressive than SPNs, however, they have intractable inference.

We use AOGs to address the problem of multiscale activity recognition. Our goal is to detect and localize individual actions and group activities, which co-occur in high-resolution videos. The video resolution allows for digital zoom-in (or zoom-out) for examining fine details (or coarser scales), as needed for recognition. The key challenge is how to conduct cost sensitive inference of intractable AOG. The AOG allows efficient, cost-sensitive Explore-Exploit Strategy (EES). The two modeling paradigms have a number of shortcomings. It is not clear which model is appropriate for modeling complex spatio-temporal structures. Also, inference and learning could be challenging due to the large amount of data and the different activity classes with similar structure. We propose an extension to SPNs by replacing product nodes with union nodes, resulting in Sum Union Networks (SUN). We hypnotize that activity recognition can be significantly improved by appropriately fusing visual cues about the activity and background events. The motivation is to enable deep models to explicitly account for contextual events co-occurring with target activities. SUN is a deep model aimed at representing the posterior distribution of a union domain between the activity and other context events. Our model is a hierarchical network of sum and union operators. The union nodes compute the posterior distribution of the activity classes given visual cues from certain video parts, and the sum nodes serve as switches to select relevant video parts. In addition, we propose an extension to the EES for AOG's inference with a Branch and Bound Strategy (BBS) for learning the cost efficient sequence of inference steps.

Major Advisor: Sinisa Todorovic
Committee: Alan Fern
Committee: Bechir Hamdaoui
Committee: Huaping Liu
GCR: Alex Greaney 

Kelley Engineering Center (campus map)
Nicole Thompson
1 541 737 3617
Nicole.Thompson at oregonstate.edu
Sch Elect Engr/Comp Sci
This event appears on the following calendars: