| [MAGIC] | Media and Graphics Interdisciplinary Center | [MAGIC Wiki] |
Scenario
To think of such a smart living room at your home:
When you walk into the room, the computer will sense your presence and display on a large screen your favourite TV program list, the weather forecast you need, the pictures taken in your recent trip. Then you begin to point, wave to the screen to express the intention of choosing a TV program, or zooming in your coolest picture.
Overview
We are working on a project that aims to create such a smart room environment where pre-authorized users can be identified as they walk into the room, and their locations in the room can be tracked. They are then able to interact with the media contents shown on the large display in the room via different gestures, including both bare-handed ones and gestures with a pointing device. These functionalities are based mainly on the integration of different computer vision technologies, such as face recognition, human tracking, action recognition and motion estimation. The project can not only serve as a testbed for those vision-based technologies, but also an exploration into the user experience of different interaction modalities with the large display.
System Setup
The system consistes of a large display, three network cameras for capturing video from different angles in the room, and three computers, each analysing one camera feed using image processing techniques. Pointing devices, such as mobile phones will be added to it later on. The following figure shows the system setup.
Face Recognition Based Tracking
Two core functionalities needs to be built in this application: Users need to be identified so authorized to see his own media content; their locations in the room need to be kept track of in order that their respective actions can be detected. We develope an algorithm that combines user IDing and tracking. Faces are used as the "identification feature" while color histogram are of the people in the bounding boxes are used as the "tracking feature". The following graph illustrates the main steps of the algorithm:

Some results:
where the yellow filled circle denotes the detection of faces, and the blue box shows the detection of people, and the numeral on top of the box tells their identities.
Simplistic Event Mechanism & Presense-based Application

We built a simple application on top of the trakcing functionality so that simple interaction with the large display can be realized. The large display is in a default "screen saver" state when nobody is detected in the room. When someone walks in, the camera and the computer sense it and the screen changes into a "ready" state. When the user's face is recognized, his personal media content will be display on the large display. Here is what's shown as screen saver, ready and personal content states respectively.

Gesture-based interaction with Mobile phones
In addtion, we implemented a motion detection based application on Nokia N80 mobile phone. The on-phone application calculates the changes in the series of images captured by the on-board camera and estimates the movement of the hand that holds it. The motion vectors are then sent to a server computer to guide the movement of the cursor on the computer screen / large screen. This application enables the user to use the mobile as a wireless mouse, and interact with the large display by simply waving the phone.
The following graphs are the scenario and a screenshot of the actual demo application.


People
- Amy Wei You
- Hao Jiang
- Sidney Fels
- Rodger Lea
| [MAGIC] | Media and Graphics Interdisciplinary Centre | [MAGIC Wiki] |
