Git Version Control Cookbook

We have seen the different objects in Git but how do we create them? In this example, we'll see how to create a blob, tree, and commit object in the repository. We'll learn about the three stages of creating a commit.

Getting ready

We'll use the same data-model repository as seen in the last recipe:

$ git clone https://github.com/dvaske/data-model.git
$ cd data-model

How to do it…

First, we'll make a small change to the file and check git status:

$ echo "Another line" >> another-file.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

  modified:   another-file.txt

no changes added to commit (use "git add" and/or "git commit -a")

This, of course, just tells us that we have modified another-file.txt and we need to use git add to stage it. Let's add the another-file.txt file and run git status again:

$ git add another-file.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

  modified:   another-file.txt

The file is now ready to be committed, just as you have probably seen before. But what happened during the add command? The add command, generally speaking, moves files from the working directory to the staging area, but more than this actually happens, though you don't see it. When a file is moved to the staging area, the SHA-1 hash of the file is created and the blob object is written to Git's database. This happens for all the files added and every time a file is added, but if nothing changes for a file, this means it is already stored in the database. At first, this might seem that the database is growing quickly, but this is not the case. Garbage collection kicks in at times, compressing and cleaning up the database and keeping only the objects that are required.

We can edit the file again and run git status:

$ echo 'Whoops almost forgot this' >> another-file.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

  modified:   another-file.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

  modified:   another-file.txt

Now, the file shows up both in the Changes to be committed and Changes not staged for commit sections. This looks a bit weird at first, but there is of course an explanation. When we added the file the first time, the content of it was hashed and stored in Git's database. The changes from the second change of the file have not yet been hashed and written to the database; it only exists in the working directory. Therefore, the file shows up in both the Changes to be committed and Changes not staged for commit sections; the first change is ready to be committed, the second is not. Let's also add the second change:

$ git add another-file.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

  modified:   another-file.txt

Now, all the changes we have made to the file are ready to be committed and we can record a commit:

$ git commit -m 'Another change to another file'
[master 55e29e4] Another change to another file
 1 file changed, 2 insertions(+)

How it works…

As we learned previously, the add command creates the blob object, the tree, and commit objects; however, they are created when we run the commit command. We can view these objects with the cat-file command, as we saw in the previous recipe:

$ git cat-file -p HEAD
tree 162201200b5223d48ea8267940c8090b23cbfb60
parent 34acc370b4d6ae53f051255680feaefaf7f7850d
author Aske Olsson <[email protected]> 1401744547 +0200
committer Aske Olsson <[email protected]> 1401744547 +0200

Another change to another file

The root-tree object from the commit is:

$ git cat-file -p HEAD^{tree}
100644 blob f21dc2804e888fee6014d7e5b1ceee533b222c15  README.md
040000 tree abc267d04fb803760b75be7e665d3d69eeed32f8  a_sub_directory
100644 blob 35d31106c5d6fdb38c6b1a6fb43a90b183011a4b  another-file.txt
100644 blob 92f046f17079aa82c924a9acf28d623fcb6ca727  cat-me.txt
100644 blob bb2fe940924c65b4a1cefcbdbe88c74d39eb23cd  hello_world.c

From the previous recipe, we know the SHA-1 of the root tree was 34fa038544bcd9aed660c08320214bafff94150b and of the another-file.txt file was b50f80ac4d0a36780f9c0636f43472962154a11a, and as expected, they changed in our latest commit when we updated the another-file.txt file. We added the same file, another-file.txt, twice before we created the commit, recording the changes to the history of the repository. We also learned that the add command creates a blob object when called. So in the Git database, there must be an object similar to the content of another-file.txt the first time we added the file to the staging area. We can use the git fsck command to check for dangling objects, that is, objects that are not referred by other objects or references:

$ git fsck --dangling
Checking object directories: 100% (256/256), done.
dangling blob ad46f2da274ed6c79a16577571a604d3281cd6d9

Let's check the contents of the blob using the following command:

$ git cat-file -p ad46f2da274ed6c79a16577571a604d3281cd6d9
This is just another file
Another line

The blob is, as expected, similar to the content of another-file.txt when we added it to the staging area the first time.

The following diagram describes the tree stages and the commands used to move between the stages:

Git Version Control Cookbook

Git Version Control Cookbook

Overview of this book

Related Content you might be interested in

Current Title:

Git Version Control Cookbook

The three stages

Getting ready

How to do it…

How it works…

See also