Book Image

Kinect in Motion - Audio and Visual Tracking by Example

Book Image

Kinect in Motion - Audio and Visual Tracking by Example

Overview of this book

Kinect is a motion-sensing input device by Microsoft for the Xbox 360 video game console and Windows PCs. It provides capabilities to enhance human-machine interaction along with a zero-to-hero journey to engage the user in a multimodal interface dialog with your software solution. Kinect in Motion - Audio and Visual Tracking by Example guides you in developing more than five models you can use to capture gestures, movements, and voice spoken commands. The examples and the theory discussed provide you with the knowledge to let the user become a part of your application. Kinect in Motion - Audio and Visual Tracking by Example is a compact reference on how to master color, depth, skeleton, and audio data streams handled by Kinect for Windows.Starting with an introduction to Kinect and its characteristics, you will first be shown how to master the color data stream with no more than one page of lines of code. Learn how to manage the depth information and map them against the color ones. You will then learn how to define and manage gestures that enable the user to instruct the application simply by moving arms or any other type of natural action. Finally you will complete your journey through a multimodal interface, combining gestures with audio.The book will lead you through many detailed, real-world examples, and even guide you on how to test your application.
Table of Contents (12 chapters)

Motion computing and Kinect


Before getting Kinect in motion, let's try to understand what motion computing (or motion control computing) is and how Kinect built its success in this area.

Motion control computing is the discipline that processes, digitalizes, and detects the position and/or velocity of people and objects in order to interact with software systems.

Motion control computing has been establishing itself as one of the most relevant techniques for designing and implementing a Natural User Interface (NUI).

NUIs are human-machine interfaces that enable the user to interact in a natural way with software systems. The goals of NUIs are to be natural and intuitive. NUIs are built on the following two main principles:

  • The NUI has to be imperceptible, thanks to its intuitive characteristics: (a sensor able to capture our gestures, a microphone able to capture our voice, and a touch screen able to capture our hands' movements). All these interfaces are imperceptible to us because their use is intuitive. The interface is not distracting us from the core functionalities of our software system.

  • The NUI is based on nature or natural elements. (the slide gesture, the touch, the body movements, the voice commands—all these actions are natural and not diverting from our normal behavior).

NUIs are becoming crucial for increasing and enhancing the user accessibility for software solution. Programming a NUI is very important nowadays and it will continue to evolve in the future.

Kinect embraces the NUIs principle and provides a powerful multimodal interface to the user. We can interact with complex software applications and/or video games simply by using our voice and our natural gestures. Kinect can detect our body position, velocity of our movements, and our voice commands. It can detect objects' position too.

Microsoft started to develop Kinect as a secret project in 2006 within the Xbox division as a competitive Wii killer. In 2008, Microsoft started Project Natal, named after the Microsoft General Manager of Incubation Alex Kipman's hometown in Brazil. The project's goal was to develop a device including depth recognition, motion tracking, facial recognition, and speech recognition based on the video recognition technology developed by PrimeSense.

Kinect for Xbox was launched in November 2010 and its launch was indeed a success: it was and it is still a break-through in the gaming world and it holds the Guinness World Record for being the "fastest selling consumer electronics device" ahead of the iPhone and the iPad.

In December 2010, PrimeSense (primesense.com) released a set of open source drivers and APIs for Kinect that enabled software developers to develop Windows applications using the Kinect sensor.

Finally, on June 17 2011 Microsoft launched the Kinect SDK beta, which is a set of libraries and APIs that enable us to design and develop software applications on Microsoft platforms using the Kinect sensor as a multimodal interface.

With the launch of the Kinect for Windows device and the Kinect SDK, motion control computing is now a discipline that we can shape in our garages, writing simple and powerful software applications ourselves.

This book is written for all of us who want to develop market-ready software applications using Kinect for Windows that can track audio and video and control motion based on NUI. In an area where Kinect established itself in such a short span of time, there is the need to consolidate all the technical resources and develop them in an appropriate way: this is our zero-to-hero Kinect in motion journey. This is what this book is about.

This book assumes that you have a basic knowledge of C# and that we all have a great passion to learn about programming for Kinect devices. This book can be enjoyed by anybody interested in knowing more about the device and learning how to track audio and video using the Kinect for Windows Software Development Kit (SDK) 1.6. We deeply believe this book will help you to master how to process video depth and audio stream and build market-ready applications that control motion. This book has deliberately been kept simple and concise, which will aid you to quickly grasp the core and critical concepts.

Before jumping on the core of audio and visual tracking with Kinect for Windows, let's take the space of this introduction chapter to understand what the hardware and software architectures Kinect for Windows and its SDK 1.6 use.