Book Image

Learning Apache Thrift

Book Image

Learning Apache Thrift

Overview of this book

With modern software systems being increasingly complex, providing a scalable communication architecture for applications in different languages is tedious. The Apache Thrift framework is the solution to this problem! It helps build efficient and easy-to-maintain services and offers a plethora of options matching your application type by supporting several popular programming languages, including C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml, and Delphi. This book will help you set aside the basics of service-oriented systems through your first Apache Thrift-powered app. Then, progressing to more complex examples, it will provide you with tips for running large-scale applications in production environments. You will learn how to assess when Apache Thrift is the best tool to be used. To start with, you will run a simple example application, learning the framework's structure along the way; you will quickly advance to more complex systems that will help you solve various real-life problems. Moreover, you will be able to add a communication layer to every application written in one of the popular programming languages, with support for various data types and error handling. Further, you will learn how pre-eminent companies use Apache Thrift in their popular applications. This book is a great starting point if you want to use one of the best tools available to develop cross-language applications in service-oriented architectures.
Table of Contents (15 chapters)
Learning Apache Thrift
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
5
Generating and Running Code in Different Languages
Index

Apache Thrift and others


Until now, you may have come to the conclusion that Apache Thrift is the best solution for all your needs when dealing with distributed systems. Surprisingly, it is not always true. In this section, we will review similar tools so that you are able to understand how Apache Thrift compares to them and when to use which tool.

Custom protocols

Frequently, inventing your own custom protocol is the first idea that comes to a developer's mind when he/she needs to transfer data between two applications. Very often, it works surprisingly well in small solutions, which are not expected to scale or be modified frequently.

Examples of such solutions are popular in web applications. Creating your own custom protocol is as simple as generating output with some text: just plain or formatted according to JSON or XML specification, and serving it through HTTP. On the client side, we need to connect to this service, get the content, and parse it.

To imagine such a solution better, consider a very simple example of a service adding two numbers. The request may be the following GET call:

GET /add?number1=30&number2=12

The response in the JSON format may be the following:

{"result":42}

Unfortunately, the only advantage of such solutions is that they are quick and easy to implement, both on the server- and client-side, on a small scale. Besides that, there are some disadvantages:

  • Text-based protocols have significant overhead. This is especially true for XML, which encapsulates everything with lots of tags.

  • They transfer binary data (that is, images), adding additional overhead to the payload. As those protocols are text-based, binary data has to be converted to text. One of the popular techniques is Base64, which encodes the message byte by byte into a printable text character. The outcome of such an operation is that the string that is ready to be transferred is around 37% larger than the original binary data. There is also extra processing required on both client's and server's end.

  • There are really no standards for such protocols; everything has to be invented by the developer. It poses not only difficulty when designing such a service, but also is a complication when the client's applications have to be maintained; for every service, there need to be custom tools prepared. And no standards means that debugging is a lot more difficult.

  • Maintenance is another problem with such protocols. When there is a change needed, both server and client code needs to be modified separately and deployed at the same time. There is no way to modify the code once and have it working on both client and server.

Of course, the spectrum of possibilities when designing custom protocols is much wider than those examples that are typical for web applications. One can design their own binary protocols working on sockets, files, queues, or another medium. This gets rid of some of the disadvantages of text-based protocols, but still leaves lots of other problems to deal with.

XML-RPC and JSON-RPC

XML-RPC is one of the early remote procedure call (RPC) protocols, which uses XML-encoded messages transferred over HTTP. JSON-RPC is its much younger cousin, which is based on the same principle, but uses JSON instead of XML.

Both protocols allow you to call remote services with handful of data types in the relevant format. The exchanged messages are plain XML or JSON without any overhead.

Here is an example of a typical XML-RPC request:

<?xml version="1.0"?>
<methodCall>
    <methodName>add</methodName>
    <params>
        <param>
            <value>
                <int>30</int>
            </value>
        </param>
        <param>
            <value>
                <int>12</int>
            </value>
        </param>
    </params>
</methodCall>

And, the corresponding response is:

<?xml version="1.0">
<methodResponse>
    <params>
        <param>
            <value>
                <int>42</int>
            </value>
        </param>
    </params>
</methodResponse>

JSON-RPC request is much more verbose:

{"method": "add", "params": [30, 12], "id": 1}

The service will return the following response:

{"result": 42, "error": null, "id": 1}

The simplicity of both of these protocols comes at a price. While they may be easily implemented, they share disadvantages of custom protocols, such as lack of standards and need for maintenance of both server and client codes, and they may not be best suited for transferring binary data.

SOAP and WSDL

Simple Object Access Protocol (SOAP) is a solution for some problems with customarily designed protocols, which evolved from XML-RPC. It is used mainly for web services (over HTTP) to exchange structured information between them and clients.

SOAP is a protocol based on XML. It is rather complicated with several layers of specification. The messages are structured according to this specification.

Every SOAP message consists of the following elements:

  • Envelope: This is the root element of the message that identifies the message as SOAP and defines its structure.

  • Header: This is an optional field that may contain extra application-specific control information for identifying the message.

  • Body: This contains the actual payload of the message (call or response).

  • Fault: This is an optional element that is used to pass information about errors. It contains error code, description, and other application-specific information.

Web services over the Internet are commonly provided with SOAP as a method of calling operations described in the Web Services Description Language (WSDL) file. In this file, the available messages are described in the XML schema form.

Due to SOAP's standardization it is easy to debug, and there are many tools that help to do that. It is enough to parse the WSDL file to be able to communicate with the given web service.

Unfortunately, SOAP still has disadvantages discussed previously: a large overhead connected to XML processing and the need to encode binary data into text form.

RESTful APIs

WSDL-based web services using SOAP were considered cumbersome and complex, so Representational State Transfer (REST) was introduced as a simpler alternative. Web services that are developed in accordance with REST's architecture constraints are called RESTful APIs.

Features of REST can be perceived as a mix of two previously discussed topics: custom protocols and SOAP.

RESTful APIs are simpler and a lot lighter than SOAP. They make use of HTTP methods to manipulate the data (collections of elements):

  • GET: This is used to retrieve information about some collection or its elements

  • PUT: This is used to create or replace the collection or element

  • POST: This is used to create a new element in the collection

  • DELETE: This is used to delete entire collection or a specific element

Every collection or its element has its own, unique Universal Resource Identifier (URI).

The advantages of RESTful APIs are their simplicity and efficiency. They are also scalable and cacheable.

On the side of disadvantages, there is a lack of standardization (each service's message and response format may be different), no built-in error handling, and no standardized authentication mechanisms.

CORBA

Common Object Request Broker Architecture (CORBA), http://www.corba.org/, dates back to 1991, and is the oldest of the standards presented in this chapter. However, its concepts are quite similar to Apache Thrift (for example, it uses its own IDL).

It is considered a bit cumbersome; instead of using a language's native code, a developer needs to use a CORBA-specific one. It's hard to install and heavy to run. There are different implementations and they are inconsistent.

Apache Avro

Apache Avro (https://avro.apache.org/) is another remote procedure call and data serialization framework developed with the support of the Apache Foundation. It was developed as a tool for the Apache Hadoop framework.

Lots of similarities to Apache Thrift include describing the interface with IDL, support for many programming languages (Java, C, C++, C#, Scala, Python, and Ruby), and a compact, fast binary format.

The main difference is that Avro's code doesn't have to be generated when the service is defined and later on, when it changes. It could be, for statically typed languages, but for dynamically typed languages, it is not necessary. It is possible because Avro uses the dynamic schema, which accompanies data when it is being transferred.

As a disadvantage in comparison with Apache Thrift, Apache Avro doesn't offer such a wide selection of serialization formats (protocols, in Thrift's terminology) and transports.

Protocol Buffers

Protocol Buffers are an older brother of Apache Thrift, and they share lots of similarities. They were developed as an internal proprietary software in Google and are used in most of the inter-machine communication. Since their release to open source in 2008, they have gained support not only for officially implemented languages (C++, Java, and Python), but also a lot more (JavaScript, Go, PHP, Ruby, Perl, and Scala).

Apart from IDL syntax and implementation details, Protocol Buffers differ from Apache Thrift in that they have less language support, different base types, a lack of constants and containers, and no built-in exception handling. In the open source version, there is also no RPC implementation for services (you need to implement it yourself).

On the other hand, Protocol Buffers are a little bit faster than Apache Thrift and their objects are smaller. Also the documentation and availability of tutorial is considered better and more complete.