Book Image

Learning Continuous Integration with Jenkins

By : Nikhil Pathania
Book Image

Learning Continuous Integration with Jenkins

By: Nikhil Pathania

Overview of this book

In past few years, Agile software development has seen tremendous growth across the world. There is huge demand for software delivery solutions that are fast yet flexible to frequent amendments. As a result, CI and continuous delivery methodologies are gaining popularity. Jenkins’ core functionality and flexibility allows it to fit in a variety of environments and can help streamline the development process for all stakeholders. This book starts off by explaining the concepts of CI and its significance in the Agile world with a whole chapter dedicated to it. Next, you’ll learn to configure and set up Jenkins. You’ll gain a foothold in implementing CI and continuous delivery methods. We dive into the various features offered by Jenkins one by one exploiting them for CI. After that, you’ll find out how to use the built-in pipeline feature of Jenkins. You’ll see how to integrate Jenkins with code analysis tools and test automation tools in order to achieve continuous delivery. Next, you’ll be introduced to continuous deployment and learn to achieve it using Jenkins. Through this book’s wealth of best practices and real-world tips, you'll discover how easy it is to implement a CI service with Jenkins.
Table of Contents (15 chapters)
Learning Continuous Integration with Jenkins
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

The best practices of Continuous Integration


Simply having a Continuous Integration tool doesn't mean Continuous Integration is achieved. A considerable amount of time needs to be spent in the configuration of configuring Integration tool.

A tool such as Jenkins works in collaboration with many other tools to achieve Continuous Integration. Let's take a look at some of the best practices of Continuous Integration.

Developers should work in their private workspace

In a Continuous Integration world, working in a private work area is always advisable. The reason is simple: isolation. One can do anything on their private branch or to simply say with their private copy of the code. Once branched, the private copy remains isolated from the changes happening on the mainline branch. And in this way, developers get the freedom to experiment with their code and try new stuff.

If the code on developer A's branch fails due to some reason, it will never affect the code present on the branches belonging to the other developers. Working in a private workspace either through branching or through cloning repos is a great way to organize your code.

For example, let's assume that a bug fix requires changes to be made to the A.java, B.java, and C.java files. So, a developer takes the latest version of the files and starts working on them. The files after modification are let's say version 56 of the A.java file, version 20 of the B.java file, and version 98 of the C.java file. The developer then creates a package out of those latest files and performs a build and then performs a test. The build and testing run successfully and everything is good.

Now consider a situation where after several months, another bug requires the same changes. The developer will usually search for the respective files with particular versions that contain the code fix. However, these files with the respective versions might have been lost in the huge oceans of versions by now.

Instead, it would have been better if the file changes were brought to a separate branch long back (with the branch name reflecting the defect number). In this way, it would have been easy to reproduce the fix using the defect number to track the code containing the required fix.

Rebase frequently from the mainline

We all know about the time dilation phenomena (relativity). It is explained with a beautiful example called the twin paradox, which is easy to understand but hard to digest. I have modified the example a little bit to suit our current topic. The example goes like this; imagine three developers: developers A, B, and C. Each developer is sent into space in his own spacecraft that travels at the speed of light. All are given atomic clocks that show exactly the same time. Developer B is supposed to travel to planet Mars to sync the date and time on a computer, which is on Mars. Developer C is supposed to travel to Pluto for a server installation and to sync the clock with that of Earth.

Developer A has to stay on Earth to monitor the communication between the server that is present on Earth with the servers on Mars and Pluto. So, all start at morning 6 AM one fine day.

After a while, developers B and C finish their jobs and return to Earth. On meeting each other, to their surprise, they find their clocks measuring a different time (of course, they find each other aged differently). They all are totally confused as to how this happened. Then, developer A confirms that all the three servers that are on Earth, Mars, and Pluto, respectively are not in sync.

Then, developer A recalls that while all the three atomic clocks were in sync back then on Earth, they forgot to consider the time dilation factor. If they would have included it keeping in mind the speed and distance of travel, the out-of-sync issue could have been avoided.

This is the same situation with developers who clone the Integration branch and work on their private branch, each one indulging in their own assignment and at their own speed. At the time of merging, each one will definitely find their code different from the others and the Integration branch, and will end up with something called as Merge Hell. The question is how do we fix it? The answer is frequent rebase.

In the previous example (developers with the task of syncing clocks on computers located across the solar system), the cause of the issue was to neglect the "time dilation factor". In the latter example (developers working on their individual branch), the cause of the issue was neglecting the frequent rebase. Rebase is nothing but updating your private branch with the latest version on the Integration branch.

While working on a private repository or a private branch surely has its advantages; it also has the potential to cause lots of merge issues. In a software development project containing 10 to 20 developers, each developer working by creating a private clone of the main repository completely changes the way the main repository looked over time.

In an environment where code is frequently merged and frequently rebased, such situations are rare. This is the advantage of using continuous integration. We integrate continuously and frequently.

The other situations where rebasing frequently helps are:

  • You branched out from a wrong version of the integration branch, and now you have realized that it should have been version 55 and not 66.

  • You might want to know the merge issues that occur when including code developed on some other branch belonging to a different developer.

  • Also, too much merging messes up the history. So rather than frequently merging, it's better to rebase frequently and merge less. This trick also works in avoiding merge issues.

  • While frequent rebase means less frequent merges on the Integration branch, which, in turn, means less number of versions on the Integration branch and more on the private, there is an advantage. This makes the integration clear and easy to follow.

Check-in frequently

While rebase should be frequent, so should check-in, at least once a day on his/her working branch. Checking in once a week or more is dangerous. The one whole week of code that is not checked-in runs the risk of merge issues. And these can be tedious to resolve. By committing or merging once a day, conflicts are quickly discovered and can be resolved instantly.

Frequent build

Continuous Integration tools need to make sure that every commit or merge is built to see the impact of the change on the system. This can be achieved by constantly polling the Integration branch for changes. And if changes are found, build and test them. Afterwards quickly share the results with the team. Also, builds can run nightly. The idea is to get instant feedback on the changes they have made.

Automate the testing as much as possible

While a continuous build can give instant feedback on build failures, continuous testing, on the other hand, can help in quickly deciding whether the build is ready to go to the production. We should try to include as many test cases as we can, but this again increases the complexity of the Continuous Integration system. Tests that are difficult to automate are the ones that reflect the real-world scenarios closely. There is a huge amount of scripting involved and so the cost of maintaining it rises. However, the more automated testing we have, the better and sooner we get to know the results.

Don't check-in when the build is broken

How can we do that? The answer is simple; before checking in your code, perform a build on your local machine, and if the build breaks, do not proceed with the check-in operation. There is another way of doing it. The version control system can be programmed to immediately trigger a build using the Continuous Integration tool, and if the tool returns positive results, only then the code is checked-in. Version control tools, such as TFS have a built in feature called a gated check-in mechanism that does the same.

There are other things that can be added to the gated check-in mechanism. For example, you can add a step to perform a static code analysis on the code. This again can be achieved by integrating the version control system with the Continuous Integration tool, which again is integrated with the tool that performs a static code analysis. In the upcoming chapters, we will see how this can be achieved using Jenkins in collaboration with SonarQube.

Automate the deployment

In many organizations, there is a separate team to perform deployments. The process is as follows. Once the developer has successfully created a build, he raises a ticket or composes a mail asking for a deployment in the respective testing environment. The deployment team then checks with the testing team if the environment is free. In other words, can the testing work be halted for a few hours to accommodate a deployment? After a brief discussion, a certain time slot is decided and the package is deployed.

The deployment is mostly manual and there are many manual checks that take the time. Therefore, for a small piece of code to go to the testing environment, the developer has to wait a whole day. And if for some reasons, the manual deployment fails due to a human error or due to some technical issues, it takes a whole day in some cases for the code to get into the testing area.

This is a painful thing for a developer. Nevertheless, this can be avoided by carefully automating the deployment process. The moment a developer tries to check-in the code, it goes through an automated compilation check, then it goes through an automated code analysis, and then it's checked-in to the Integration branch. Here the code is again picked along with the latest code on the Integration branch and then built. After a successful build, the code is automatically packaged and deployed in the testing environment.

Have a labeling strategy for releases

In my experience, some of the best practices of Continuous Integration are the same as those of software configuration management. For example, labels and baselines. While both are similar technically, they are not the same from the usage perspective. Labeling is the task of applying a tag to a particular version of a file or a set of files. We take the same concept a little bit further. For example, what if I apply a label to particular versions of all the files? Then, it would simply describe a state of the whole system. A version of the whole collective system. This is called a baseline. And why it is important?

Labels or baselines have many advantages. Imagine that a particular version of your private code fixed a production issue, say "defect number 1234". You can label that version on your private code as the defect number for later use. Labels can also be used to mark sprints, releases, and hotfixes.

The one that is used widely is shown in the following image:

Here, the first two digits are the release numbers. For example, 00 can be beta, 01 can be alpha, and 02 can represent the commercial release. The next two digits are the bug fix numbers. Let's say release 02.00.00 is in production and few bugs or improvements arise, then the developer who is working on fixing those issues can name his branch or label his code as 02.01.00.

Similarly, consider another scenario, where the release version in production is 03.02.00, and all of a sudden something fails and the issue needs to be fixed immediately. Then, the release containing the fix can be labeled as 03.02.01, which says that this was a hotfix on 03.02.00.

Instant notifications

They say communication is incomplete without feedback. Imagine a Continuous Integration system that has an automated build and deployment solution, a state-of-the-art automated testing platform, a good branching strategy, and everything else. However, it does not have a notification system that automatically emails or messages the status of a build. What if a nightly build fails and the developers are unaware of it?

What if you check-in code and leave early, without waiting for the automated build and deployment to complete? And the next day you find that the build failed due to a simple issue, which occurred just 10 minutes after you departed from the office.

If by some chance, you would have been informed through an SMS popping upon your mobile phone, then you could have fixed the issue.

Therefore, instant notifications are important. All the Continuous Integration tools have it, including Jenkins. It is good to have notifications of build failures, deployment failures, and testing results. We will see in the upcoming chapters how this can be achieved using Jenkins and the various options Jenkins provides to make life easy.