Kinect – Possibly Our New PC Remote
The Kinect is a peripheral device which allows users to control a computer or console through gestures. It was launched for the XBOX 360 in 2010. The final version of the SDK for the PC was launched on 1 February 2012. This allows us to write applications for the PC which use the Kinect sensors. What is important here is that this SDK can only be used for non-commercial applications if a Kinect is used for an XBOX 360. If you buy this device for the PC then you can use it for any commercial application.
From a hardware point of view, the device contains:
- a 640×480 camera
- an infrared camera used to calculate distances between objects
- a microphone in order to capture sound
The infrared camera is what separates this device from everything else currently on the market. It allows to accurately know the location of each object and measure distances. The SDK available at this time does not allow us to capture images and calculate distances without the camera. In automatic mode it separately identifies every user in front of the device and offers data about their position.
In order to use the Kinect on a PC we need Visual Studio 2010 and the Kinect SDK which can be downloaded here: http://www.microsoft.com/en-us/kinectforwindows/develop/overview.aspx.
Once installed, we can add the Microsoft.Kinect reference to our project. We have to mention two things:
- you need an adapter to connect the Kinect to the PC (a normal USB does not provide enough power)
- if we need to connect more devices, they need to be connected to different USB controllers
In order to access the data on the Kinect, we need to gain access to one of its instances. There can be several devices connected to a system. Once we have an instance of the device it is necessary to initialize the sensors we will use. It is not mandatory to initialize just one sensor on the device. We can initialize as many sensors as necessary.
It is important that we use Stop() as soon as the application closes and we are done using the Kinect. If we don’t there is a chance that we will not be able to connect the Kinect the next time we start the application.
After initialization, we have to start every sensor we want to use. In the next example we will start the video stream and record ourselves at the thrown event when a new frame is available.
kinect.ColorFrameReady += new EventHandler(kinect_ColorFrameReady);
void kinect_ColorFrameReady(object sender, ColorImageFrameReadyEventArgs e)
ColorImageFrame frame = e.OpenColorImageFrame();
For each stream we register we will be notified if a new frame becomes available. What needs to be mentioned here is when we are working with frames for depth. The result will come as a string of bytes, just like the video stream, just that the first 13 give us the distance in millimeters of the point. We can see how to extract the distance at which an object is located in millimeters from a frame:
DepthImageFrame frame = e.OpenDepthImageFrame();
short pixelInfos=new short[frame.PixelDataLength];
int distanceInMillimeters =
pixelInfos[pixelPosition] >> DepthImageFrame.PlayerIndexBitmaskWidth;
Another thing the Kinect does automatically is that it identifies each person located in front of the sensor. We can thus isolate and identify each player. This information can be extracted from the frame in the following way:
int playeNumber = depthFrame[pixelPosition] & DepthImageFrame.PlayerIndexBitmask;
What makes the Kinect different from all other devices on the market right now is that it identifies each player separately along with image quality and the data returned to us about each player. The Kinect can identify several points for each player (“articulations”). Positions in space (x,y,z) are returned for each of these points. We can identify the position of a person without writing any algorithm even if the person’s hand is stretched out, if they are crouching or if they have lifted one leg. These are the points tracked for every player.
Just like we were receiving a new frame from the video sensor, we receive a SkeletonFrame which contains all the points. It has to be pointed out that each frame will contain all the points, even if some of them are not tracked. If tracking could be done for one point then the TrackingState property will be equal to SkeletonTrackingState.Tracked.
Each point of a player is identified uniquely through a JointType enum. For example the head has the value JointType.Head, and the spine will have the value JointType.Spine.
This is an example of a code which allows us to iterate through all the points being tracked and display their positions:
SkeletonFrame frame = e.OpenSkeletonFrame();
Skeleton skeletons=new Skeleton[frame.SkeletonArrayLength];
foreach (Skeleton skeleton in skeletons)
foreach (Joint joint in skeleton.Joints)
if (joint.TrackingState != JointTrackingState.Tracked)
The Kincet becomes very easy to use with this SDK. From capturing an image, to calculating the distance to an object or the position of a person’s hand, everything is done easily by using the API accompanying this device.
There are lots of games for the XBOX 360 which use the device but there are almost no applications for the PC. The official SDK was just launched for the PC along with the licensing requirements. The potential of this device is extremely high, which is why it should be studied closely. This is just the beginning and we undoubtedly hearing more about the Kinect.
Tags: Tech Thoughts