Book Image

Leap Motion Development Essentials

By : Mischa Spiegelmock
Book Image

Leap Motion Development Essentials

By: Mischa Spiegelmock

Overview of this book

Leap Motion is a company developing advanced motion sensing technology for human–computer interaction. Originally inspired by the level of difficulty of using a mouse and keyboard for 3D modeling, Leap Motion believe that moulding virtual clay should be as easy as moulding clay in your hands. Leap Motion now focus on bringing this motion sensing technology closer to the real world. Leap Motion Development Essentials explains the concepts and practical applications of gesture input for developers who want to take full advantage of Leap Motion technology. This guide explores the capabilities available to developers and gives you a clear overview of topics related to gesture input along with usable code samples. Leap Motion Development Essentials shows you everything you need to know about the Leap Motion SDK, from creating a working program with gesture input to more sophisticated applications covering a range of relevant topics. Sample code is provided and explained along with details of the most important and central API concepts. This book teaches you the essential information you need to design a gesture-enabled interface for your application, from specific gesture detection to best practices for this new input. You will be given guidance on practical considerations along with copious runnable demonstrations of API usage which are explained in step-by-step, reusable recipes.
Table of Contents (12 chapters)

Major SDK components


Now that we've written our first gesture-enabled program, let's talk about the major components of the Leap SDK. We'll visit each of these in more depth as we continue our journey.

Controller

The Leap::Controller class is a liaison between the controller and your code. Whenever you wish to do anything at all with the device you must first go through your controller. From a controller instance we can interact with the device configuration, detected displays, current and past frames, and set up event handling with our listener subclass.

Config

An instance of the Config class can be obtained from a controller. It provides a key/value interface to modify the operation of the Leap device and driver behavior. Some of the options available are:

  • Robust mode: Somewhat slower frame processing but works better with less light.

  • Low resource mode: Less accurate and responsive tracking, but uses less CPU and USB bandwidth.

  • Tracking priority: Can prioritize either precision of tracking data or the rate at which data is sampled (resulting in approximately 4x data frame-rate boost), or a balance between the two (approximately 2x faster than the precise mode).

  • Flip tracking: Allows you to use the controller with the USB cable coming out of either side. This setting simply flips the positive and negative coordinates on the X-axis.

Screen

A controller may have one or more calibratedScreens, which are computer displays in the field of view of the controller, which have a known position and dimensions. Given a pointable direction and a screen we can determine what the user is pointing at.

Math

Several math-related functions and types such as Leap::Vector, Leap::Matrix, and Leap::FloatArray are provided by LeapMath.h. All points in space, screen coordinates, directions, and normal are returned by the API as three-element vectors representing X, Y, and Z coordinates or unit vectors.

Frame

The real juicy information is stored inside each Frame. A Frame instance represents a point in time in which the driver was able to generate an updated view of its world and detect where screens, your hands, and pointables are.

Hand

At present the only body parts you can use with the controller are your hands. Given a frame instance we can inspect the number of hands in the frame, their position and rotation, normal vectors, and gestures. The hand motion API allows you to compare two frames and determine if the user has performed a translation, rotation, or scaling gesture with their hands in that time interval. The methods we can call to check for these interactions are:

  • Leap::Hand::translation(sinceFrame): Translation (also known as movement) returned as a Leap::Vector including the direction of the movement of the hand and the distance travelled in millimeters.

  • Leap::Hand::rotationMatrix(sinceFrame), ::rotationAxis(sinceFrame), ::rotationAngle(sinceFrame, axisVector): Hand rotation, either described as a rotation matrix, vector around an axis or float angle around a vector between –π and π radians (that's -180° to 180° for those of you who are a little rusty with your trigonometry).

  • Leap::Hand::scaleFactor(sinceFrame): Scaling represents the distance between two hands. If the hands are closer together in the current frame compared to sinceFrame, the return value will be less than 1.0 but greater than 0.0. If the hands are further apart the return value will be greater than 1.0 to indicate the factor by which the distance has increased.

Pointable

A Hand also can contain information about Pointable objects that were recognized in the frame as being attached to the hand. A distinction is made between the two different subclasses of pointable objects, Tool, which can be any slender, long object such as a chopstick or a pencil, and Finger, whose meaning should be apparent. You can request either fingers or tools from a Hand, or a list of pointables to get both if you don't care.

Finger positioning

Suppose we want to know where a user's fingertips are in space. Here's a short snippet of code to output the spatial coordinates of the tips of the fingers on a hand that is being tracked by the controller:

    if (frame.hands().empty()) return;

    const Leap::Hand firstHand = frame.hands()[0];
    const Leap::FingerList fingers = firstHand.fingers();

Here we obtain a list of the fingers on the first hand of the frame. For an enjoyable diversion let's output the locations of the fingertips on the hand, given in the Leap coordinate system:

for (int i = 0; i < fingers.count(); i++) {
    const Leap::Finger finger = fingers[i];
        
    std::cout << "Detected finger " << i << " at position (" <<
        finger.tipPosition().x << ", " <<
        finger.tipPosition().y << ", " <<
        finger.tipPosition().z << ")" << std::endl;
}

This demonstrates how to get the position of the fingertips of the first hand that is recognized in the current frame. If you hold three fingers out the following dazzling output is printed:

Detected finger 0 at position (-119.867, 213.155, -65.763)
Detected finger 1 at position (-90.5347, 208.877, -61.1673)
Detected finger 2 at position (-142.919, 211.565, -48.6942)

While this is clearly totally awesome, the exact meaning of these numbers may not be immediately apparent. For points in space returned by the SDK the Leap coordinate system is used. Much like our forefathers believed the Earth to be the cornerstone of our solar system, your Leap device has similar notions of centricity. It measures locations by their distance from the Leap origin, a point centered on the top of the device. Negative X values represent a point in space to the left of the device, positive values are to the right. The Z coordinates work in much the same way, with positive values extending towards the user and negative values in the direction of the display. The Y coordinate is the distance from the top of the device, starting 25 millimeters above it and extending to about 600 millimeters (two feet) upwards. Note that the device cannot see below itself, so all Y coordinates will be positive.

An example of cursor control

By now we are feeling pretty saucy, having diligently run the sample code thus far and controlling our computer in a way never before possible. While there is certain utility and endless amusement afforded by printing out finger coordinates while waving your hands in the air and pretending to be a magician, there are even more exciting applications waiting to be written, so let's continue onwards and upwards.

Tip

Until computer-gesture interaction is commonplace, pretending to be a magician while you test the functionality of Leap SDK is not recommended in public places such as coffee shops.

In some cultures it is considered impolite to point at people. Fortunately your computer doesn't have feelings and won't mind if we use a pointing gesture to move its cursor around (you can even use a customarily offensive finger if you so choose). In order to determine where to move the cursor, we must first locate the position on the display that the user is pointing at. To accomplish this we will make use of the screen calibration and detection API in the SDK.

If you happen to leave your controller near a computer monitor it will do its best to try and determine the location and dimensions of the monitor by looking for a large, flat surface in its field of view. In addition you can use the complementary Leap calibration functionality to improve its accuracy if you are willing to take a couple of minutes to point at various dots on your screen. Note that once you have calibrated your screen, you should ensure that the relative positions of the Leap and the screen do not change.

Once your controller has oriented itself within your surroundings, hands and display, you can ask your trusty controller instance for a list of detected screens:

    // get list of detected screens
    const Leap::ScreenList screens = controller.calibratedScreens();
    
    // make sure we have a detected screen
    if (screens.empty()) return;
    const Leap::Screen screen = screens[0];

We now have a screen instance that we can use to find out the physical location in space of the screen as well as its boundaries and resolution. Who cares about all that though, when we can use the SDK to compute where we're pointing to with the intersect() method?

    // find the first finger or tool
    const Leap::Frame frame = controller.frame();
    const Leap::HandList hands = frame.hands();
    if (hands.empty()) return;
    const Leap::PointableList pointables = hands[0].pointables();
    if (pointables.empty()) return;
    const Leap::Pointable firstPointable = pointables[0];

    // get x, y coordinates on the first screen
    const Leap::Vector intersection = screen.intersect(
         firstPointable,
         true,  // normalize
         1.0f   // clampRatio
    );	

The vector intersection contains what we want to know here; the pixel pointed at by our pointable. If the pointable argument to intersect() is not actually pointing at the screen then the return value will be (NaN, NaN, NaN). NaN stands for not a number. We can easily check for the presence of non-finite values in a vector with the isValid() method:

    if (! intersection.isValid()) return;
    // print intersection coordinates
    std::cout << "You are pointing at (" <<
        intersection.x << ", " <<
        intersection.y << ", " <<
        intersection.z << ")" << std::endl;

Prepare to be astounded when you point at the middle of your screen and the transfixing message You are pointing at (0.519522, 0.483496, 0) is revealed. Assuming your screen resolution is larger than one pixel on either side, this output may be somewhat unexpected, so let's talk about what screen.intersect(const Pointable &pointable, bool normalize, float clampRatio=1.0f) is returning.

The intersect() method draws an imaginary ray from the tip of pointable extending in the same direction as your finger or tool and returns a three-element vector containing the coordinates of the point of intersection between the ray and the screen. If the second parameter normalize is set to false then intersect() will return the location in the leap coordinate system. Since we have no interest in the real world we have set normalize to true, which causes the coordinates of the returned intersection vector to be fractions of the screen width and height.

Tip

When intersect() returns normalized coordinates, (0, 0, 0) is considered the bottom-left pixel, and (1, 1, 0) is the top-right pixel.

It is worth noting that many computer graphics coordinate systems define the top-left pixel as (0, 0) so use caution when using these coordinates with other libraries.

There is one last (optional) parameter to the intersect() method, clampRatio, which is used to expand or contract the boundaries of the area at which the user can point, should you want to allow pointing beyond the edges of the screen.

Now that we have our normalized screen position, we can easily work out the pixel coordinate in the direction of the user's rude gesticulations:

    unsigned int x = screen.widthPixels() * intersection.x;
    // flip y coordinate to standard top-left origin
    unsigned int y = screen.heightPixels() * (1.0f - intersection.y);
    
    std::cout << "You are offending the pixel at (" <<
        x << ", " << y << std::endl;

Since intersection.x and intersection.y are fractions of the screen dimensions, simply multiply by the boundary sizes to get our intersection coordinates on the screen. We'll go ahead and leave out the Z-coordinate since it's usually (OK, always) zero.

Now for the coup de grace—moving the cursor location, here's how to do it on Mac OS X:

    CGPoint destPoint = CGPointMake(x, y);
    CGDisplayMoveCursorToPoint(kCGDirectMainDisplay, de.stPoint);

Note

You will need to #include <CoreGraphics/CoreGraphics.h> and link it ( –framework CoreGraphics) to make use of CGDisplayMoveCursorToPoint().

Now all of our hard efforts are rewarded, and we can while away the rest of our days making the cursor zip around with nothing more than a twitch of the finger. At least until our arm gets tired. After a few seconds (or minutes, for the easily-amused) it may become apparent that the utility of such an application is severely limited, as we can't actually click on anything.

So maybe you shouldn't throw your mouse away just yet, but read on if you are ready to escape from the shackles of such an antiquated input device.

A gesture-triggered action

Let's go all the way here and implement our first proper gesture—a mouse click. The first question to ask is, what sort of gesture should trigger a click? One's initial response might be a twitch of your pointing finger, perhaps by making a dipping or curling motion. This feels natural and similar enough to using a mouse or trackpad, but there is a major flaw—in the movement of the fingertip to execute the gesture we would end up moving the cursor, resulting in the click taking place somewhere different from where we intended. A different solution is needed.

If we take full advantage of our limbs, and assuming we are not an amputee, we can utilize not just one but both hands, using one as a "pointer" hand and one as a "clicker" hand. We'll retain the outstretched finger as the cursor movement gesture for the pointer hand, and define a "click" gesture to be the touching of two fingers together on the clicker hand.

Let's create a true Leap mouse application to support our newly defined clicking gesture. An important first step would be to choose a distance that represents two fingers touching. While at first blush a value of 0 mm would seem to be a reasonable definition of touching together, consider the fact that the controller is not always perfect in recognizing two touching fingers as being distinct from each other, or even existing at all. If we choose a suitably small distance we can call "touching" then the gesture will be triggered in one of the frames generated as the user closes their fingers together.

We'll begin with the obligatory listener class to handle frame events and keep track of our input state.

class MouseListener : public Leap::Listener {
public:
    MouseListener();
    const float clickActivationDistance = 40;
    virtual void onFrame(const Leap::Controller &);
    virtual void postMouseDown(unsigned x, unsigned y);
    virtual void postMouseUp(unsigned x, unsigned y);
…
};

On my revision 3 device, a distance of 40mm seems to work reasonably well.

For our onFrame handler we can build on our previous code. However now we need to keep track of not just one hand but two, which introduces quite a bit of extra complexity.

For starters, the method Leap::Frame::hands() is defined as returning the hands detected in a frame in an arbitrary order, meaning we cannot always expect the same hand to correspond to the same index in the returned HandList. This makes sense, because some frames will likely fail to recognize both hands and a new list of hands will need to be constructed as the detected hands are unrecognized and recognized again, and there is no guarantee that the ordering will be the same.

A further problem is that we will need to work out which is the user's left and right hands, because we should probably use the most dexterous hand as the pointer hand and the inferior, the clicker.

Indeed, even determining the primary and secondary hands is not quite as simple as one might think, because the primary and secondary hands will be reversed for left-handed people. Left-handed people have had it hard enough for thousands of years, so it would not be right for us to make assumptions.

Note

The English word "dexterity" comes from the Latin root dexter, relating to the right or right hand, and also meaning "skillful", "fortunate", or "proper" and often having a positive connotation. Contrast this with to the word for left—"sinister".

We'll start by adding some instance variables and initializers:

protected:
    bool clickActive; // currently clicking?
    bool leftHanded;  // user setting
    int32_t clickerHandID, pointerHandID; // last recognized

leftHanded will act as a flag which we can use when we determine which hand is the pointer and which is the clicker. clickerHandID and pointerHandID will be used to keep track of which detected hand from a given frame corresponds to the pointer and clicker.

We can create an initializing constructor like so:

MouseListener::MouseListener()
  : clickActive(false), leftHanded(false),
    clickerHandID(0), pointerHandID(0) {}

Explicitly initializing variables is good practice, in particular because the rules for which types are initialized in various situations in C++ are so multitudinous that memorizing them is discouraged. Using the initializer list syntax is considered good style because it can save unnecessary constructor calls when member objects are assigned new values, although since we are only initializing primitive types, we get no such reduction in overhead here.

    Leap::Hand pointerHand, clickerHand;
    
    if (pointerHandID) {
        pointerHand = frame.hand(pointerHandID);
        if (! pointerHand.isValid())
            pointerHand = hands[0];
    }

If hands are detected, we will always at least have a pointer hand defined. If we've already decided on which hand to use (pointerHandID is set) then we should see if that hand is available in the current frame. When Leap::Frame::hand(int32_t id) is called with a previously detected hand's identifier, it will return a corresponding hand instance. If the controller has lost track of the hand it was following, then you'll still get a Hand back, but isValid() will be false. If we fail to locate our old hand or one hasn't been set yet, we'll assign the first detected hand for the case where we only have one hand in the frame.

if (clickerHandID)
        clickerHand = frame.hand(clickerHandID);

We attempt to locate the previously detected clicker hand if possible.

    if (! clickerHand.isValid() && hands.count() == 2) {
        // figure out clicker and pointer hand
                
        // which hand is on the left and which is on the right?
        Leap::Hand leftHand, rightHand;
        if (hands[0].palmPosition()[0] <= hands[1].palmPosition()[0]) {
            leftHand = hands[0];
            rightHand = hands[1];
        } else {
            leftHand = hands[1];
            rightHand = hands[0];
        }

Before we try to work out which hand is the clicker and which should be the pointer, we'll need to know which is the left hand and which is the right hand. A simple comparison of the X coordinates will do the trick nicely for setting the leftHanded flag.

        if (leftHanded) {
            pointerHand = leftHand;
            clickerHand = rightHand;
        } else {
            pointerHand = rightHand;
            clickerHand = leftHand;
        }

Here we assign the primary hand to be the pointer, and the secondary to be the clicker.

        clickerHandID = clickerHand.id();
        pointerHandID = pointerHand.id();

Now that we've decided the hands, we need to retain references to those particular hands for as long as the controller can keep track.

    const Leap::PointableList pointables = pointerHand.pointables();

Instead of hands[0].pointables() as before, now we'll want to use pointerHand for the screen intersection. The rest of the pointer manipulation code remains the same.

if (! clickerHand.isValid()) return;

Now it is time to handle the detection of a click, but only if there are two hands.

    const Leap::PointableList clickerFingers =clickerHand.pointables();
    if (clickerFingers.count() != 2) return;

If we don't find exactly two fingers on the clicker hand, then there is not going to be much we can do in terms of determining how far apart they are. We want to know if the user has touched two fingers together or not.

float clickFingerDistance = clickerFingers[0].tipPosition().distanceTo(
clickerFingers[1].tipPosition()
);

The Leap::Vector class has a handy distanceTo() method that tells us how far apart two points in space are.

    if (! clickActive && clickFingerDistance < clickActivationDistance) {
        clickActive = true;
        cout << "mouseDown\n";
        postMouseDown(x, y);

If we have not already posted a mouse down event and if the clicker hand's two fingers are touching, then we will simulate a click with postMouseDown().

    } else if (clickActive && clickFingerDistance > clickActivationDistance) {
        cout << "mouseUp\n";
        clickActive = false;
        postMouseUp(x, y);
    }

And likewise for when the two fingers come apart, we finish the click and release the button. Unfortunately, just as with the cursor movement code, there is no simple cross-platform way to synthesize mouse events, but the OSX code is provided as follows for completeness:

void MouseListener::postMouseDown(unsigned x, unsigned y) {
    CGEventRef mouseDownEvent = CGEventCreateMouseEvent(
                                       NULL, kCGEventLeftMouseDown,
                                       CGPointMake(x, y),
                                       kCGMouseButtonLeft
                                                        );
    CGEventPost(kCGHIDEventTap, mouseDownEvent);
    CFRelease(mouseDownEvent);
}

void MouseListener::postMouseUp(unsigned x, unsigned y) {
    CGEventRef mouseUpEvent = CGEventCreateMouseEvent(
                                       NULL, kCGEventLeftMouseUp,
                                       CGPointMake(x, y),
                                       kCGMouseButtonLeft
                                                        );
    CGEventPost(kCGHIDEventTap, mouseUpEvent);
    CFRelease(mouseUpEvent);
}

And now you can throw away your mouse for good! Actually, don't do that. First be sure to run the screen calibration tool.

Truth be told, there are plenty of improvements that could be made to our simple, modest mouse replacement application. Implementing right-click, a scroll wheel and click-and-drag are left as an exercise for the reader.