Oregon State University

Can’t find an event? We’re busy migrating to a new event calendar. Try looking new calendar



Event Details

PhD Final Oral Examination – Behrooz Mahasseni

Wednesday, November 30, 2016 10:00 AM - 12:00 PM

Robust and Efficient Action Classification in Videos in the Wild
Recognizing human actions in videos is a long-standing problem in computer vision with a wide range of applications including video surveillance, content retrieval, and sports analysis. This thesis focuses on addressing efficiency and robustness of video classification in unconstrained real-world settings.

The thesis work can be broadly divided into four major parts:
•    The first line of work addresses view-invariant action recognition. This problem is formulated as a novel group multi-task learning framework, where the action model of each viewpoint is considered as a separate task and the overall model is trained jointly.

•    The second line of work addresses a large-scale action recognition in uncontrolled settings. The standard training video dataset is augmented by additional data from another modality, in particular, 3D skeleton sequences of human body motion. A variant of recurrent neural networks called long short-term memory (LSTM) is used to encode sequences from 3D skeleton data. A modified hybrid backpropagation through time algorithm is proposed to fuse the encoded representation to learn the LSTM for video classification.

•    Third work of this dissertation addresses the unsupervised and semi-supervised video summarization. We formulate the problem as a subset frame selection problem and propose a novel deep generative network to generate a summary of selected frames which preserve the original video's content.

•    Fourth work of the dissertation introduces the new problem of budget-aware semantic segmentation. In this line, we propose two different models. The first model replaces the standard inference steps for feature computation in a conditional random field (CRF) with a sequential policy which intelligently selects a subset of regions and their corresponding features. The second model is a deep recurrent policy which intelligently selects a subset of frames and uses a learned a shallow convolutional neural network to propagate the available labels to unlabeled frames.

This research has advanced the state of the art in computer vision because the approaches developed enabled meeting stringent runtime requirements arising in many applications, and working in less sanitized settings with small datasets or data coming from heterogeneous sources.

Major Advisor: Sinisa Todorovic
Committee: Alan Fern
Committee: Fuxin Li
Committee: Eugene Zhang
GCR: Brett Tyler

Kelley Engineering Center (campus map)
Nicole Thompson
1 541 737 3617
Nicole.Thompson at oregonstate.edu
Sch Elect Engr/Comp Sci
This event appears on the following calendars: