We have seen the different objects in Git, but how do we create them? In this example, we'll see how to create a blob
, tree
, and commit
object in the repository. We'll also learn about the three stages of creating a commit.
We'll use the same Git-Version-Control-Cookbook-Second-Edition
repository that we saw in the last recipe:
$ git clone https://github.com/PacktPublishing/Git-Version-Control-Cookbook-Second-Edition.git
$ cd Git-Version-Control-Cookbook-Second-Edition
- First, we'll make a small change to the file and check
git status
:
$ echo "Another line" >> another-file.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: another-file.txt
no changes added to commit (use "git add" and/or "git commit -a")
This, of course, just tells us that we have modified another-file.txt
and we need to use git add
to stage it.
- Let's add the
another-file.txt
file and rungit status
again:
$ git add another-file.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: another-file.txt
The file is now ready to be committed, just as you have probably seen before. But what happens during the add
command? The add
command, generally speaking, moves files from the working directory to the staging area; however, this is not all that actually happens, though you don't see it. When a file is moved to the staging area, the SHA-1 hash of the file is created and the blob
object is written to Git's database. This happens every time a file is added, but if nothing changes for a file, it means that it is already stored in the database. At first, this might seem that the database will grow quickly, but this is not the case. Garbage collection kicks in at times, compressing, and cleaning up the database and keeping only the objects that are required.
- We can edit the file again and run
git status
:
$ echo 'Whoops almost forgot this' >> another-file.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: another-file.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: another-file.txt
Now, the file shows up in both the Changes to be committed
and Changes not staged for commit
sections. This looks a bit weird at first, but there is, of course, a reason for this. When we added the file the first time, the content of it was hashed and stored in Git's database. The changes arising from the second change to the file have not yet been hashed and written to the database; it only exists in the working directory. Therefore, the file shows up in both the Changes to be committed
and Changes not staged for commit
sections; the first change is ready to be committed, the second is not. Let's also add the second change:
$ git add another-file.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: another-file.txt
- Now, all the changes we have made to the file are ready to be committed, and we can record a commit:
$ git commit -m 'Another change to another file'
[master 99fac83] Another change to another file
1 file changed, 2 insertions(+)
As we learned previously, the add
command creates the blob
, tree
, and commit
objects; however, they are also created when we run the commit
command. We can view these objects using the cat-file
command, as we saw in the previous recipe:
$ git cat-file -p HEAD
tree 162201200b5223d48ea8267940c8090b23cbfb60
parent 13dcada077e446d3a05ea9cdbc8ecc261a94e42d
author John Doe <[email protected]> 1524163792 +0200
committer John Doe <[email protected]> 1524163792 +0200
Making changes to another file.
The root-tree
object from the commit is as follows:
$ git cat-file -p HEAD^{tree}
100644 blob f21dc2804e888fee6014d7e5b1ceee533b222c15 README.md
040000 tree abc267d04fb803760b75be7e665d3d69eeed32f8 a_sub_directory
100644 blob 35d31106c5d6fdb38c6b1a6fb43a90b183011a4b another-file.txt
100644 blob 92f046f17079aa82c924a9acf28d623fcb6ca727 cat-me.txt
100644 blob bb2fe940924c65b4a1cefcbdbe88c74d39eb23cd hello_world.c
From the previous recipe, we know that the SHA-1 of the root tree was 34fa038544bcd9aed660c08320214bafff94150b
and the SHA-1 of the another-file.txt
file was b50f80ac4d0a36780f9c0636f43472962154a11a
, and, as expected, they changed in our latest commit when we updated the another-file.txt
file. We added the same file, another-file.txt
, twice before we created the commit, recording the changes to the history of the repository. We also learned that the add
command creates a blob object when called. So, in the Git database, there must have been an object similar to the content of another-file.txt
the first time we added the file to the staging area. We can use the git fsck
command to check for dangling objects—that is, objects that are not referred to by other objects or references:
$ git fsck --dangling
Checking object directories: 100% (256/256), done.
dangling blob ad46f2da274ed6c79a16577571a604d3281cd6d9
Let's check the content of the blob using the following command:
$ git cat-file -p ad46f2da274ed6c79a16577571a604d3281cd6d9
This is just another file
Another line
The blob was, as expected, similar to the content of another-file.txt
when we added it to the staging area the first time.
The following diagram describes the tree stages and the commands used to move between the stages:
For more examples and information on the cat-file
and fsck
commands, please consult the Git documentation at https://git-scm.com/docs/git-cat-file and https://git-scm.com/docs/git-fsck.