TL;DR
Given that you have an existing --depth 1
repository cloned from branch B and you'd like Git to act as if you removed and re-cloned, you can use this sequence of commands:
git fetch --depth 1
git reset --hard origin/B
git clean -dfx
(e.g., git reset --hard origin/master
—I cannot put italics in the code-literal section above). You should be able to do the git clean
step at any point before or after the other two commands, but the git reset
must come after the git fetch
.
Long
[slightly reworded and formatted] Given a clone created with git clone --single-branch --depth 1 url directory
, how can I update it to achieve the same result as rm -rf directory; git clone --single-branch --depth 1 url directory
?
Note that --single-branch
is the default when using --depth 1
. The (single) branch is the one you give with -b
. There's a long aside that goes here about using -b
with tags but I will leave that for later. If you don't use -b
, your Git asks the "upstream" Git—the Git at url—which branch it has checked-out, and pretends you used -b thatbranch
. This means that it is important to be careful when using --single-branch
without -b
to make sure that this upstream repository's current branch is sensible, and of course, when you do use -b
, to make sure that the branch argument you give really does name a branch, not a tag.
The simple answer is basically this one, with two slight changes:
After https://stackoverflow.com/a/20508591/279335, I tried git fetch --depth 1; git reset --hard origin/master
, but two things: first I don't understand why git reset
is needed, second, although the files seems to be up to date, some old files remains, and git clean -df
does not delete these files.
The two slight changes are: make sure you use origin/branchname
instead, and add -x
(git clean -d -f -x
or git clean -dfx
) to the git clean
step. As for why, that gets a bit more complicated.
What's going on
Without --depth 1
, the git fetch
step calls up the other Git and gets from it a list of branch names and corresponding commit hash IDs. That is, it finds a list of all the upstream's branches and their current commits. Then, because you have a --single-branch
repository, your Git throws out all but the single branch, and brings over everything Git needs to connect that current commit back to the commit(s) you already have in your repository.
With --depth 1
, your Git doesn't bother connecting the new commit to older historical commits at all. Instead, it obtains just the one commit and the other Git objects needed to complete that one commit. It then writes an additional "shallow graft" entry to mark that one commit as a new pseudo-root commit.
Regular (non-shallow) clone and fetch
These are all related to how Git behaves when you're using a normal (non-shallow, non-single-branch) clone: git fetch
calls up the upstream Git, gets a list of everything, and then brings over whatever you don't already have. This is why an initial clone is so slow, and a fetch-to-update is usually so fast: once you get a full clone, the updates rarely have very much to bring over: maybe a few commits, maybe a few hundred, and most of those commits don't need much else either.
The history of a repository is formed from the commits. Each commit names its parent commit (or for merges, parent commits, plural), in a chain that goes backwards from "the latest commit", to the previous commit, to some more-ancestral commit, and so on. The chain eventually stops when it reaches a commit that has no parent, such as the first commit ever made in the repository. This kind of commit is a root commit.
That is, we can draw a graph of commits. In a really simple repository the graph is just a straight line, with all the arrows pointing backwards:
o <- o <- o <- o <-- master
The name master
points to the fourth and latest commit, which points back to the third, which points back to the second, which points back to the first.
Each commit carries with it a complete snapshot of all the files that go in that commit. Files that are not at all changed are shared across these commits: the fourth commit just "borrows" the unchanged version from the third commit, which "borrows" it from the second, and so on. Hence, each commit names all the "Git objects" that it needs, and Git either finds those objects locally—because it already has them—or uses the fetch
protocol to bring them over from the other, upstream Git. There's a compression format called "packing", and a special variant for network transfer called "thin packs", that allows Git to do this even better / fancier, but the principle is simple: Git needs all, and only, those objects that go with the new commits it's picking up. Your Git decides whether it has those objects, and if not, obtains them from their Git.
A more-complicated, more-complete graph generally has several points where it branches, some where it merges, and multiple branch names pointing to different branch tips:
o--o <-- feature/tall
/
o--o--o---o <-- master
/
o--o <-- bug/short
Here branch bug/short
is merged back into master
, while branch feature/tall
is still undergoing development. The name bug/short
can (probably) now be deleted entirely: we don't need it anymore if we are done making commits on it. The commit at the tip of master
names two previous commits, including the commit at the tip of bug/short
, so by fetching master
we will fetch the bug/short
commits.
Note that both the simple and slightly-more-complicated graph each have just one root commit. That's pretty typical: all repositories that have commits have at least one root commit, since the very first commit is always a root commit; but most repositories have only one root commit as well. You can, however, have different root commits, as with this graph:
o--o
o--o--o <-- master
or this one:
o--o <-- orphan
o--o <-- master
In fact, the one with just the one master
was probably made by merging orphan
into master
, then deleting the name orphan
.
Grafts and replacements
Git has for a long time had (possibly shaky) support for grafts, which was replaced with (much better, actually-solid) support for generic replacements. To grasp them concretely we need to add, to the above, the notion that each commit has its own unique ID. These IDs are the big ugly 40-character SHA-1 hashes, face0ff...
and so on. In fact, every Git object has a unique ID, though for graph purposes, all we care about are the commits.
For drawing graphs, those big hash IDs are too painful to use, so we can use one-letter names A
through Z
instead. Let's use this graph again but put in one-letter names:
E--H <-- feature/tall
/
A--B--D---G <-- master
/
C--F <-- bug/short
Commit H
refers back to commit E
(E
is H
's parent). Commit G
, which is a merge commit—meaning it has at least two parents—refers back to both D
and F
, and so on.
Note that the branch names, feature/tall
, master
, and bug/short
, each point to one single commit. The name bug/short
points to commit F
. This is why commit F
is on branch bug/short
... but so is commit C
. Commit C
is on bug/short
because it is reachable from the name. The name gets us to F
, and F
gets us to C
, so C
is on branch bug/short
.
Note, however, that commit G
, the tip of master
, gets us to commit F
. This means that commit F
is also on branch master
. This is a key concept in Git: commits may be on one, many, or even no branches. A branch name is merely a way to get started within a commit graph. There are other ways, such as tag names, refs/stash
(which gets you to the current stash: each stash is actually a couple of commits), and the reflogs (which are normally hidden from view as they are normally just clutter).
This also, however, gets us to grafts and replacements. A graft is just a limited kind of replacement, and shallow repositories use a limited form of graft.1 I won't describe replacements fully here as they are a bit more complicated, but in general, what Git does for all of these is to use the graft or replacement as an "instead-of". For the specific case of commits, what we want here is to be able to change—or at least, pretend to change—the parent ID or IDs of any commit ... and for shallow repositories, we want to be able to pretend that the commit in question has no parents.
1The way shallow repositories use the graft code is not shaky. For the more general case, I recommended using git replace
instead, as that also was and is not shaky. The only recommended use for grafts is—or at least was, years ago—to put them in place just long enough to run git filter-branch
to copy an altered—grafted—history, after which you should just discard the grafted history entirely. You can use git replace
for this purpose as well, but unlike grafts, you can use git replace
permanently or semi-permanently, without needing git filter-branch
.
Making a shallow clone
To make a depth-1 shallow clone of the current state of the upstream repository, we will pick one of the three branch names—feature/tall
, master
, or bug/short
—and translate it to a commit ID. Then we will write a special graft entry that says: "When you see that commit, pretend that it has no parent commits, i.e., is a root commit."
Let's say we pick master
. The name master
points to commit G
, so to make a shallow clone of commit G
, we obtain commit G
from the upstream Git as usual, but then write a special graft entry that claims commit G
has no parents. We put that into our repository, and now our graph looks like this:
G