Book Image

Learning Apache Thrift

Book Image

Learning Apache Thrift

Overview of this book

With modern software systems being increasingly complex, providing a scalable communication architecture for applications in different languages is tedious. The Apache Thrift framework is the solution to this problem! It helps build efficient and easy-to-maintain services and offers a plethora of options matching your application type by supporting several popular programming languages, including C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml, and Delphi. This book will help you set aside the basics of service-oriented systems through your first Apache Thrift-powered app. Then, progressing to more complex examples, it will provide you with tips for running large-scale applications in production environments. You will learn how to assess when Apache Thrift is the best tool to be used. To start with, you will run a simple example application, learning the framework's structure along the way; you will quickly advance to more complex systems that will help you solve various real-life problems. Moreover, you will be able to add a communication layer to every application written in one of the popular programming languages, with support for various data types and error handling. Further, you will learn how pre-eminent companies use Apache Thrift in their popular applications. This book is a great starting point if you want to use one of the best tools available to develop cross-language applications in service-oriented architectures.
Table of Contents (15 chapters)
Learning Apache Thrift
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
5
Generating and Running Code in Different Languages
Index

Distributed systems and their services


Imagine typical web applications that you use every day, such as search engines, messaging platforms, or social networks. Under one web address, they deliver different services. For example, a social network delivers people search, messaging, and users' profile pages. While you access them by one user interface—a web page written in HTML and JavaScript—what you see in your browser is only a gateway. Your request to message a friend is being relayed by the underlying application to the messaging service—an application which is specifically designed to deal with exchange of messages between the social network's users.

Service-oriented architecture

Messaging service, which we use as an example here, may be written in a completely different programming language than web application. It is a design decision. The system architect may decide that interface of your social network; the web pages that you see every time you log in will be easier to manage and maintain when they are written in, let's say, PHP or Ruby on Rails. However, messaging systems may be written in Python as the architect may decide that this language offers better libraries for this task. On the other hand, search engines or other tools that need superb performance are often written in C++. There may be also some internal corporate applications in Java or C#.

Those applications, of course, need to communicate with each other. But how to do that? There is a concept in software design called service-oriented architecture (SOA). We just discussed the first part of this principle. It focuses on creating applications around distinct tasks. If every task is performed by a different application, there is a need for some means of communication between them. To achieve this goal, applications expose services that are used by other applications. Typically, they are accessible over some medium, that is, an internal network or the Internet. They are self-contained and autonomous, which means they are independent of other services and are able to deliver complete response when queried. They should also be well documented so that any developer can use them.

Distributed systems

When—as in our example of social network—we have a system that consists of many autonomous services, we call such systems distributed systems. Depending on the scale, business needs, or technical constraints, the systems may be spread over lots of computers in a local network, the Internet, or just on a single machine. Benefitting from the SOA principles, you may run and test on your desktop computer distributed system of the same logical architecture, which will be then used on hundreds of servers in the production environment.

There are many advantages of SOA in distributed systems over monolithic applications. Let's discuss some of them.

Maintainability

The greatest advantage of distributed systems in SOA is their maintainability, which means ease of performing all the tasks related to the caretaking of the software. If the system consists of many applications, each dedicated to one task or type of tasks instead of one big monolith, some of the actions can be performed a lot easier:

  • You can select tools (that is, programming languages, libraries, and services) that are best for a given task. You can use different toolsets for search engine, message queues, or data-intensive calculations.

  • Instead of having all the developers working on one application (that means one code base), you can split the team to work on many applications separately. You can even outsource some of the work to external teams or companies. This way, they won't get in each other's way. Smaller teams are more agile and yield better results.

  • Communication between the different components of the system is narrowed to only one specified interface, which is easier to comprehend, monitor, and debug than lots of convoluted classes and methods.

  • It is easier to respond to failures and fix bugs. Let's say there's some bug introduced that causes whole application to crash. In distributed systems, only one service may be down, while the whole system is operational. System operators or developers are able to replace the service with the stable version and do some tests to identify the bug or perform other actions without affecting the rest of the system.

  • Introducing changes is a lot easier too. In the common workflow, if a new version of a service is to be deployed, it can be run as a separate instance with the old version simultaneously. System operators can switch the client application from the old to the new service and see whether everything performs correctly. If it does, the old service is turned off; otherwise, it is easy to switch back to the old service and fix the new one. It is even easier in the cloud environments.

Scalability

Many systems are required to perform well under a high load. It is not only the domain of web applications, but it is best pictured here: popular websites receive hundreds of millions of page views per day, which constitutes a high traffic load. To withstand such increasing stress, systems need to scale. The most obvious way, known by every computer user, is to add RAM or switch to a better CPU if applications don't run smoothly. But there is a limit to such scaling (called vertical scaling). You don't expect Google to run on a single powerful computer, do you?

The other type of scalability is horizontal scaling, which means adding more computers (called nodes) to the system. For example, our imaginary social network system may consist of several web application nodes, a few database nodes, and also some user search nodes. In properly designed systems, operators can add or remove nodes depending on the expected load and other circumstances. More sophisticated systems can even scale themselves, starting or stopping nodes in the cloud automatically, based on the traffic analysis.

SOA allows multiple nodes of the same function to be accessible to the clients. As services are self-contained, independent of the state of other services, and documented, developers can prepare their software without much care if they will be dealing with one or hundred nodes. In most scenarios, traffic to the services is managed by software or hardware load balancers, making it completely invisible for the client.

Testability

Another advantage of distributed systems is the easiness of testing them and finding and fixing bugs. Independence of services means that they can be tested in isolation from the whole system. Only a particular service's operation is being tested without any influence from other components. Because services should be well documented, it is easy to predict the desired output for a given input. If bugs are found, they can be evaluated and fixed without the need to consider them in the scope of whole system.