Book Image

Learning Apache Thrift

Book Image

Learning Apache Thrift

Overview of this book

With modern software systems being increasingly complex, providing a scalable communication architecture for applications in different languages is tedious. The Apache Thrift framework is the solution to this problem! It helps build efficient and easy-to-maintain services and offers a plethora of options matching your application type by supporting several popular programming languages, including C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml, and Delphi. This book will help you set aside the basics of service-oriented systems through your first Apache Thrift-powered app. Then, progressing to more complex examples, it will provide you with tips for running large-scale applications in production environments. You will learn how to assess when Apache Thrift is the best tool to be used. To start with, you will run a simple example application, learning the framework's structure along the way; you will quickly advance to more complex systems that will help you solve various real-life problems. Moreover, you will be able to add a communication layer to every application written in one of the popular programming languages, with support for various data types and error handling. Further, you will learn how pre-eminent companies use Apache Thrift in their popular applications. This book is a great starting point if you want to use one of the best tools available to develop cross-language applications in service-oriented architectures.
Table of Contents (15 chapters)
Learning Apache Thrift
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
5
Generating and Running Code in Different Languages
Index

An introduction to Apache Thrift


You probably know Facebook, the popular social network. A small website started in 2004 as a funny side project by a Harvard student, Mark Zuckerberg, gained huge popularity, having more and more users. In its early years it faced rapid growth in terms of traffic, system, and network structure. Their engineering culture allowed choosing any solution that was deemed optimal for a given task without any constraints or standards. This led to a situation when they had lots of different services, but no reliable way to connect them together. Describing Apache Thrift, Facebook's engineers stated in the white paper (you can download it from https://thrift.apache.org/static/files/thrift-20070401.pdf):

"(...) we were presented with the challenge of building a transparent, high-performance bridge across many programming languages."

They tested solutions available in the market and came to the conclusion that none of them fulfilled the requirements of high performance, flexibility, and simplicity. The result of their work was Thrift—a piece of software that was later open sourced and handed over to the Apache Foundation.

Apache Thrift's simplicity comes from the fact that the code for different programming languages is generated automatically from a single file written in the interface definition language (IDL). In other similar solutions, data has to be prepared before it is transferred to meet the limitations of the method of transport—not all structures are easily transferred. In most cases, simple data types such as strings are integers and transferrable. Due to this, a developer has to translate every structure more complex than that to the text form in a process called serialization. This has to be done on both ends (deserialization being the reverse process), which needs extra work, testing, and debugging. In the case of Apache Thrift, the developer can use data types native to their programming language of choice using the methods dedicated to this language. All serialization and deserialization is made by the Apache Thrift itself and is not visible to the developer. This architecture of the solution allows programmers to focus on working on the actual services, and not having to care about how the data is going to be transferred from one application to another.

Let's have a quick glance at the pillars of Apache Thrift. Some of the topics will be covered in much more detail in Chapter 4, Understanding How Apache Thrift Works, so here are just the basics that you will need to understand our first code examples.

Supported programming languages

Before starting any work with Apache Thrift, you should probably check whether it supports the programming language that you use. Of course, there is a great chance that it does—most of the popular languages are supported. The complete list for version 0.9.3 is as follows:

  • ActionScript 3

  • C++

  • C#

  • D

  • Delphi

  • Erlang

  • Haskell

  • Java

  • JavaScript

  • Node.js

  • Objective-C/Cocoa

  • OCaml

  • Perl

  • PHP

  • Python

  • Ruby

  • Smalltalk

Note

Note that Apache Thrift is still in the pre-1.0 version, so some of the languages may be not fully supported. It is best to check on the Apache Thrift website (https://thrift.apache.org/docs/features), in the source code, or try to learn the current status of support for your favorite programming language yourself.

If your language of choice is on the list (especially if it is a popular one), you can be sure that you will be able to generate all the code necessary to work with Apache Thrift.

Data types

One of the basic features of every programming language is their data types. Although the basic ones may be very similar, that is, integer or string, it may not be that easy for the rest of them. Some of the languages (for example, C++) are statically typed. This means that the type of the variable has to be known at the compile time. Thus, it has to be defined in the source code when the program is written. After that, the variable can be of only this type. For example, consider the following line from C++:

int x = 42;

It initializes the variable x, which is an integer. This variable has to stay an integer through the execution of the program. If later on you would like to assign a value of some other type, it will produce an error as soon as you compile your program. Let's take a look at the following example:

int main()
{
   int x = 42;
   // this line will produce compilation error
   x = "forty two";
   return 0;
}

If you try to compile this simple code, you will end up with the following compile error:

$ g++ -o example example.cpp
example.cpp: In function 'int main()':
example.cpp:4:6: error: invalid conversion from 'const char*' to 'int' [-fpermissive]
    x = "forty two";
      ^

Other languages are dynamically typed, that is, the type of the variable is checked in the runtime, but in the source code it might be anything, any time. Consider this example from PHP:

if (rand(0,1) == 1) {
    $x = 42;
} else {
    $x = "forty two";
}
var_dump($x); // var_dump() function prints type of specified
              // variable and its value

Depending on the random outcome of the condition, the value of the variable may be either integer or string. Let's take a look at the following example:

$ php -f example.php

The result of running this program would be either string(9) "forty-two" or int(42).

As you can see, both values are permitted as PHP interpreter changes the type of the variable during the runtime.

Programming language allows that and, moreover, later on, you can assign values of different types to the same variable.

Without Apache Thrift, developer would have to serialize the variables. It means that before the variables are transferred, they should be mapped to the most basic data types that are understood by every programming language (most probably, integers and strings of characters). After the transmission, those serialized variables have to be translated back to the structures available in the programming language at the receiving end.

Apache Thrift does all that dirty work for the developer. It provides its own data types that are then mapped to the ones native to the given programming language, thereby allowing the developer to focus on creating the application, not the communication interface.

Transports

Transports are a part of Apache Thrift's network stack. They allow you to transmit data over different channels, that is, HTTP protocol, sockets, or files. Decoupling the transport layer lets you to easily choose the transport that best fits your solution without many changes in the code.

The choice of transport should be dictated by the architecture of your solution.

Protocols

Protocols prepare data to be transmitted over transports. The name of the process is called serialization (when sending data) and deserialization (when receiving data). There are different protocols that can be used: JSON, binary, plain text, and so on. It means that depending on what data you want to transfer, you can use different methods of serialization. For example, if you expect to transmit images or other binary data, choosing the binary protocol is the best option as there would be almost zero overhead. If you chose JSON for this purpose, binary data would be converted to text, thereby increasing the payload by a third or more.

The choice of protocol should be dictated by the data you wish to transfer using Apache Thrift.

Versioning

Versioning is an approach for managing changes in the service's API (and in the software in general). As software is being developed, it changes. Sometimes the changes are miniscule, and sometimes great. They are often manifested by modification of the methods or parameters exposed by the API.

When developing client and server software, you shouldn't assume that clients will be updated to the newest version instantly. It is not possible, even if you have total control of the environment. It is also wise to allow the older versions of the client to work with the newer versions of the server.

Changes in the APIs, libraries, and other externally available components pose a big challenge for the developers, leading to problems often referred to as dependency hell—when different applications are compatible with different versions of the same library or API, leading to difficulties with managing those dependencies.
To alleviate this inconvenience, most of the software developers adopt a convention of marking the version of the application with decimal numbers, according to the template, MAJOR.MINOR.PATCH, where PATCH means miniscule changes (that is, fixing some bugs), MINOR is a larger change but backward-compatible with the previous versions, and MAJOR means a major release that might break the compatibility with the previous versions of the software.

Apache Thrift's feature is soft versioning. It means that there are no formal requirements as to how the changes between the subsequent versions should be handled or announced. However, it delivers a set of tools that allows users to easily keep backward compatibility with the new versions of the service. It is achieved by the following properties:

  • The method's arguments are numbered. You can add or remove them. As long as the same number is not reused, the new versions of methods may function without removed arguments. Those numbers shouldn't be changed for any existing argument.

  • You can set default values for the arguments, so if the older version of the client has a method without a new variable, the service doesn't receive any value for such an argument and the default value is set. This is useful when you want to add some fields.

  • While manipulation with fields is relatively easy, you shouldn't rename methods or services. This makes them unavailable for the older clients.

Security

Security is essential to every service. Although you definitely need to take extra care when exposing services to the Internet, it is also important when they are available in private networks.

Apache Thrift allows you to use TSSLTransportFactory to utilize RSA key pairs, providing security for the connection.

Another way of securing your Apache Thrift connection (although a little bit more complicated) is tunneling it over SSH.

We will discuss this in the detail in Chapter 8, Advanced Usage of Apache Thrift.

Interface description language

Apache Thrift's core feature is its own IDL, one that shapes its simplicity and usability. It will be familiar at first sight to anyone who has programmed in contemporary programming languages. Using IDL, you are able to define the service and all the variables that it uses in one file. It is an unambiguous description of what the service will look similar to without going into the implementation details.

Let's consider a very simple service, which allows you to add two integers:

namespace py thrift.example1
namespace php thrift.example1

service AddService {
    i32 add(1: i32 a, 2: i32 b),
}

This example code defines AddService service, which contains the add method. This method takes two 32-bit signed integers (i32) as parameters and also returns such an integer as a result. We will want to have the code generated for Python and PHP languages, but of course Apache Thrift is able to do it for a far greater spectrum of languages.

Now the Thrift's magic begins; if you save this code to the file (let's say, example1.thrift) and run the following commands, you will get the code of client and server for this service in desired languages (Python and PHP in this example) in the newly-created folders, gen-py and gen-php:

$ thrift --gen py example1.thrift
$ thrift --gen php example1.thrift

In the simplest solution, it is enough to fill the code of the add method, and voilà, you have a fully-functional client and server.

This example is, of course, oversimplified, but shows the major advantage of Apache Thrift—the ability to define in one place and then instantly generate services and the corresponding client code without the need of writing code in every language from scratch. It is a great tool not only for final solutions, but also for rapid prototyping for different programming languages.

To see how much work Apache Thrift just spared you, examine the generated files that are saved in the gen-py and gen-php folders.

IDL is a very powerful tool. It has a lot of options and gives you a great deal of flexibility. We will discuss it in greater detail in Chapter 4, Understanding How Apache Thrift Works.