Book Image

Git: Version Control for Everyone

By : Ravishankar Somasundaram
Book Image

Git: Version Control for Everyone

By: Ravishankar Somasundaram

Overview of this book

<div> <div>Git – is free software which enables you to maintain different versions of single or multiple files present inside a directory(folder), and allows you to switch back and forth between them at any given point of time. It also allows multiple people to work on the same file collaboratively or in parallel, without being connected to a server or any other centralized system continuously.<br /><br />This book is a step by step, practical guide, helping you learn the routine of version controlling all your content, every day. <br /><br />If you are an average computer user who wants to be able to maintain multiple versions of files and folders, or to go back and forth in time with respect to the files content – look no further. The workflow explained in this book will benefit anyone, no matter what kind of text or documentation they work on.<br /><br />This book will also benefit developers, administrators, analysts, architects and anyone else who wishes to perform simultaneous, collaborative work, or work in parallel on the same set of files. Git's advanced features are there to make your life easier.<br /><br /><br /><br /><br /><br /></div> </div>
Table of Contents (16 chapters)
Git: Version Control for Everyone Beginner's Guide
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Index

Falling for Git


We came across different types of version control systems in the previous section, from which we clearly understood that a distributed version control system is what will make our lives easy, safe, and secure.

Now, there are lots of distributed systems available in the market, so which one to choose?

Git is a relatively new software package (April 7, 2005 with its first prototype) that was designed from the ground up to avoid flaws that existed in many other version control systems.

Linus Torvalds, the man who gave us the Linux kernel, is the proud initiator of this project as well. The very architecture of GIT is tailored for better speed, performance, flexibility, and usability. When I first heard the previous sentence I had the same thought that you have in mind right now: "It talks the talk; can it walk the walk?"

As a matter of fact there are several live case studies; I got convinced when I saw Git handling the complex Linux kernel source code so gracefully.

For those of you who don't have any idea about Linux kernel or why it's tagged complex, just think about approximately 9 million lines of content spread across 25,000 files subjected to all kinds of content manipulation, travelling back and forth, numerous times every day by several hundred developers across the world. And still the response time of Git's operations are in seconds.

Why they trust Git for such challenging tasks and how Git meets their expectations is through the following:

  • Atomicity

  • Performance

  • Security

Atomicity

Atomicity is nothing but a property of an operation that appears to occur at a single instant between its invocation and its response.

As an example let's take a banking system. When you transfer money from your account to another account, the operation is either completed fully or rejected meaning either the money gets debited from your account and gets credited to the recipient's account or the entire operation gets dropped and no money is debited from your account in the first place.

These systems avoid partial completions such as the amount getting debited from your account but not getting credited to recipient's account.

Another example would be a seat reservation system in which the following are the possible states:

  • Both pay and reserve a seat

  • Neither pay nor reserve a seat

Git creators understood the value of our data, and implemented the same when handling content with Git. It ensures there is no data loss or version mismatch happening due to partial operations, which increases reliability.

Performance

No matter how good a car's interiors are, if it isn't quick enough, it isn't fit enough for racing against time. Git is proven to be manyfold faster than its competitors.

Even when handling several million files, an operation performed using Git takes only seconds to complete. One of the main reasons for this would be the way Git handles your files. Conceptually most other systems (CVS, Subversion, Perforce, Bazaar, and so on) look at your data as a set of files and changes made to each of them as the version proceeds.

The following is a pictorial representation of how other systems handle files and their versions:

In contrast, Git sees a relation between your files and works upon it. It takes a snapshot of the entire set of files instead of storing the difference between versions of each file; this contributes to the lightning speed of Git in certain operations like reverting your file's contents to earlier versions (which we will see in later chapters). Each time a version is created, a snapshot is taken. This doesn't mean that Git stores multiple replicas of your files; if Git finds that there is no change in any of your files' content, just a reference to that file that points to the previous snapshot is stored in the new snapshot, as shown in the following figure:

The best part is that Git tries to occupy as little space (again, several times less when compared to other version control systems) as possible to maintain version histories of your files. A live case study in handling the source code of Mozilla Firefox published by Keith P. (http://keithp.com/blogs/Repository_Formats_Matter/) showed how effectively version control systems utilize space when it comes to maintaining the history of your files.

Mozilla's CVS repository was 2.7 GB in size; when imported to Subversion the size grew to 8.2 GB, and when put under Git the size got shrunk to 450 MB. For a source code of size 350 MB it's fairly nice to have the whole project history (from 1998) with just 100 MB more space.

Security

When you use Git, you can be sure that no one is tampering with your files' content. Everything that goes into Git is check-summed using an SHA-1 hash before it's stored, and after that it is referred to using that checksum.

This means it's impossible to change the contents of any file or directory without Git knowing about it. The SHA-1 hash used here is a collection of 40 hexadecimal characters (a-f and 0-9) which is generated based on the contents of a file or directory structure. The following is an example of a hash:

9e79e3e9dd9672b37ac9412e9a926714306551fe

For those of you who would like to know more about it, you can hear from the very creator, Linus Torvalds, who gives a presentation at Google's tech talk event.