What you have now is a Git repository—or rather, two very similar repositories; we'll just think of it as one for now—with just one branch name, my-feature
. This is not an error situation. There is no requirement that any Git repository have a branch name master
in it.
Still, if you'd like to have a branch name master
, all you have to do is create one. There's nothing special about the name master
.1 A branch name in Git is just a way to find one specific commit. Git is not about branches. Git is all about commits.
1Well, almost nothing: there are a bunch of little random bits of stuff here and there. For instance, most humans will assume that the name master
means something. ??
Git is about commits
To understand what's going on here, and why this is OK as is but you can create a master
whenever you like, let's look at how Git really works. Again, Git is all about commits. So the main thing to know is what a commit is, how we find them, and how they accumulate in a repository.
The first thing to know is that every commit is numbered. These numbers are really big and ugly looking and weird: they don't simply count up like 1-2-3. For instance, this commit is numbered 71ca53e8125e36efbda17293c50027d31681a41f
. The number on any given commit is totally unique to that one commit. If you had this commit in your Git repository, it would have this same number. If you don't have this commit—and you don't because it's a commit for Git, in a Git repository for Git—then you don't have any commit with this number.
The uniqueness property is why these numbers are so big and ugly. They're computed by running a cryptographic hash function over the contents of the commit. This has a consequence: the number is deeply attached to the contents, so the contents can never change. No part of any commit—any internal Git object, really—can ever change, because its number depends on its content. This is the magic: this is how two different Git programs can agree that only this commit gets this number.2
Because the numbers come out of a hashing function, they are called hash IDs. Because the original hash function was (and still is for now) SHA-1, they're also called SHA-1 IDs or SHA-1s. Because Git is in the process of going to still-larger hashes, Git is changing the internal name from SHA1 to OID, or Object ID. (Commits are one of four internal object types, and all of them use this same hashing system.)
I mostly call these hash IDs myself, but be aware of the other names.
The other thing to know is that each commit stores two parts:
The main data of a commit is a snapshot of all the files Git knew about at the time the commit was made. We won't go into the details of the source of the snapshot here, but it's not your working tree, it's the thing that Git uses three names for: the index, the staging area, or the cache. All names refer to the same thing.
Besides the snapshot, each commit contains some metadata, or information about the commit itself. This includes the name and email address of the author of the commit, for instance, with a date-and-time-stamp for when the commit was made. The log message you enter, explaining why you made the commit, goes here; you'll see that log message in git log
output.
Crucially for Git, Git sticks something in this metadata for its own use. Each commit stores a list of the hash IDs of some earlier commit or commits.
Most commits store just one hash ID, for one earlier commit. We can call these ordinary commits to distinguish them from commits that have no earlier hash IDs, or two-or-more earlier hash IDs. At least one commit—the very first one—literally can't store any earlier commit hash ID, so it just doesn't; we call that one the root commit.3
2The pigeonhole principal tells us that this scheme must eventually fail. The number of bits in the hash ID is designed to make it such that the failure is so many trillions of years in the future that we don't care about it. There's a small flaw in this idea, but it's fine for now.
3A repository can have more than one root commit, but this is at least a little bit unusual. We won't get into the details here.
Hash IDs are too klunky: enter branch names
Let's draw a simple repository that has just three commits in it. Rather than their actual big ugly hash IDs, we'll call these three commits A
, B
, and C
—and we'll draw them like this:
A <-B <-C
Remember that each one holds a snapshot and some metadata. Commit C
is the last of these three commits, so it's the most recent and, in a way, the most important. Inside commit C
, we have the latest snapshot, in a read-only form. We also have the metadata, including the hash ID of earlier commit B
. Let's be "on" commit C
, but use this hash ID.
Inside commit B
, we have the snapshot and metadata, including the hash ID of earlier commit A
. Without yet going to commit A
, we can compare the files saved in both B
and C
. All of the files that are the same are uninteresting, but for files that have changed, we can show the changes. That's pretty useful—so that's what git show
or git log -p
will do, if we're on/using commit C
: it will show the changes from B
to C
.
If we use git log
, we can now have Git go back one step, from C
to B
. Now we have a snapshot and metadata, including the hash ID of commit A
, but this time we'll go ahead and look up commit A
. By comparing its snapshot to that in B
, we can see what changed. So git log -p
can print the log message for commit B
, then show the changes from A
to B
.
Once again, we can have Git step back one hop, to commit A
. Commit A
, being the very first commit, has no earlier commit: its list of previous commit hash IDs is empty. So commit A
is our root commit, and all the files in A
are "new". The git log -p
command will just show them as new files, and since there's no earlier commit, it will stop here.
Note that Git works backwards. This is generally true of all things in Git: they always work backwards, from latest towards earliest. The reason for that is those embedded hash IDs: they look like arrows pointing backwards. We start with commit C
because it's the most recent commit. We do, however, have to know the hash ID of commit C
.
We could write down the hash ID of the latest commit. We could keep it on a scrap of paper, or a whiteboard, or whatever. Starting from the end, we tell Git to look at C
, and Git can find all the earlier commits on its own. But this seems silly. We have a computer. Why not have the computer keep the hash ID of commit C
somewhere?
This is what a branch name does. A branch name simply holds the hash ID of the latest commit that is part of that branch. We can draw that like this:
A <-B <-C <--branch
To make a new commit, we have Git package up a snapshot and metadata. The metadata for our new commit, which we'll call D
, will include the actual hash ID of commit C
, as found by reading the hash ID stored under the branch name branch
. So new commit D
will point back to existing commit C
:
A--B--C
D
(I've gone to lines instead of arrows because I don't have good graphics for arrows here. We know nothing inside any commit can change, and the arrows coming out of commits always point backwards, and come out of the commits, and hence can't change where they point. So the lines work just as well, as long as we remember that Git can't follow them forwards, only backwards.)
Now that commit D
exists and has a hash ID—computed by hashing all the stuff in the commit, including the date-and-time-stamp for when we created commit D
—now we can have Git write that hash ID into the name branch
, so that the name points to commit D
instead of commit C
:
A--B--C
D <-- branch
and now we can straighten the whole thing back out again:
A--B--C--D <-- branch
Branch names find commits, regardless of how many branch names there are
Let's start with our three-commit setup again, without having made D
yet. Let's call the first name main
(as GitHub usually do now) instead of master
, though in fact, any name will do fine. Let's draw that, but this time I want to add one more thing to our drawing:
A--B--C <-- main (HEAD)
The new thing is this HEAD
, in parentheses. We have this to mark which branch name we are using. Right now there's only one name, so there is only one name we can use, but we're about to change that by adding a new name.
Now let's create a new name, develop
. We must pick some existing commit to make this name exist, because a branch name is required to point to some existing commit. So let's pick commit C
, which is the latest on main
and is the commit we're using right now. We run:
git branch develop
and get:
A--B--C <-- develop, main (HEAD)
Note now both branch names point to commit C
. This means commit C
is the last commit on both branches. That's perfectly fine, in Git; it means all three commits are on both branches, too.
The special name HEAD
is still attached to main
, so we're still actually using the name main
. Let's run git checkout develop
, which does this:
A--B--C <-- develop (HEAD), main
We're no longer using the name main
as our current name. It still exists and still points to commit C
, but now HEAD
is attached to the name develop
. That name also points to commit C
, so nothing else has to change, and nothing else does change. We're still "on" commit C
, but now, we're "on" it because we're "