Book Image

Storm Real-time Processing Cookbook

By : Quinton Anderson
Book Image

Storm Real-time Processing Cookbook

By: Quinton Anderson

Overview of this book

<p>Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!<br />Storm Real Time Processing Cookbook will have basic to advanced recipes on Storm for real-time computation.<br /><br />The book begins with setting up the development environment and then teaches log stream processing. This will be followed by real-time payments workflow, distributed RPC, integrating it with other software such as Hadoop and Apache Camel, and more.</p>
Table of Contents (16 chapters)
Storm Real-time Processing Cookbook
Credits
About the Author
About the Reviewers
www.packtpub.com
Preface
Index

Distributed version control


Traditional version control systems are centralized. Each client contains a checkout of the files at their current version, depending on what branch the client is using. All previous versions are stored on the server. This has worked well, in such a way that it allows teams to collaborate closely and know to some degree what other members of the team are doing.

Centralized servers have some distinct downfalls that have led to the rise of distributed control systems. Firstly, the centralized server represents a single point of failure; if the server goes down or becomes unavailable for any reason, it becomes difficult for developers to work using their existing workflows. Secondly, if the data on the server is corrupt or lost for any reason, the history of the code base is lost.

Open source projects have been a large driver of distributed version controls, for both reasons, but mostly because of the collaboration models that distribution enables. Developers can follow a disciplined set of workflows on their local environments and then distribute these changes to one or many remote repositories when it is convenient to do so, in both a flat and hierarchical manner.

The obvious additional advantage is that there naturally exist many backups of the repository because each client has a complete mirror of the repository; therefore, if any client or server dies, it can simply be replicated back, once it has been restored.

How to do it…

Git is used in this book as the distributed version control system. In order to create a repository, you need to either clone or initialize a repository. For a new project that you create, the repository should be initialized.

  1. First, let's create our project directory, as follows:

    mkdir FirstGitProject
    cd FirstGitProject
    git init
    
  2. In order to test if the workflow is working, we need some files in our repository.

    touch README.txt
    vim README.txt
    

    Using vim, or any other text editor, simply add some descriptive text and press the Insert key. Once you have finished typing, simply hit the Esc key and then a colon, followed by wq; hit the Enter key.

  3. Before you commit, review the status of the repository.

    git status
    

    This should give you an output that looks similar to the following:

    # On branch master
    # Initial commit
    # Untracked files:
    #    README.txt
    
  4. Git requires that you add all files and folders manually; you can do it as follows:

    git add README.txt
    
  5. Then commit the file using the following:

    git commit –a
    
  6. This will open a vim editor and allow you to add your comments.

    Tip

    You can specify the commit message directly while issuing the command, using the –m flag.

Without pushing this repository to a remote host, you will essentially be placing it under the same risk as that of a centralized host. It is therefore important to push the repository to a remote host. Both www.github.com and www.bitbucket.org are good options for free-hosted Git services, providing that you aren't pushing your corporate intellectual property there for public consumption. This book uses bitbucket.org. In order to push your repository to this remote host, simply navigate there in your browser and sign up for an account.

Once the registration process is complete, create a new repository using the menu system.

Enter the following values in order to create the repository:

Once the repository is created, you need to add the remote repository to your local repository and push the changes to the remote repository.

git remote add origin https://[user]@bitbucket.org/[user]/firstgitproject.git
git push origin master

You must replace [user] in the preceding command with your registered username.

Tip

Cloning of a repository will be covered in later recipes, as will some standard version control workflows.