so I am new to this whole GitHub and Git thing. I recently learned the basics of Git (adding, pushing, pulling, cloning, etc). My intro to Java professor asked me to make a git hub repository for all my class homework. She told me to organize it in such a way that there are separate folders for each homework and each homework folder contains multiple source files.
So I set up my files like this:
Java(main folder) -> Hw1 + Hw2 + Hw3 etc. How would I do this using git? All of these folders should be on my local and git hub repositories and I should be able to make changes to them separately.
Thank You in advance. I am stuck.
Let's start with some basics. You already understand that your computer uses a tree-structured file system: that is, a directory (or folder—the terms are now interchangeable) holds files and/or more directories/folders, which in turn hold more files and/or folders, etc. Windows natively uses a backwards slash \ to separate the various components, so that you might have:
java\hw1\main.java
java\hw1\sub.java
java\hw2\main.java
and so on. Windows can use forward slashes (some commands may use them for other purposes, but they do work in file names), and all non-Windows OSes tend to use forward slashes, which are easier to type. Git also uses forward slashes so that's what I'll do here.
(Aside: Windows and macOS by default use "case insensitive but case preserving" rules, so that if you create a file named readme.txt, you can later open it using the name ReadMe.txt or README.TXT but it remains named in all-lowercase. Git, by contrast, is usually case-sensitive and thinks that readme.txt, ReadMe.txt, and README.TXT are three different file names. This causes endless grief on such systems1 and sometimes the best, or at least easiest, way to avoid all problems here is to completely avoid uppercase letters everywhere. To the extent that you can use java instead of Java, hw1 instead of Hw1, and so on, I would encourage you to do so.)
When you ask Git to create a new, empty repository using git init,2 Git creates a hidden folder named .git. This hidden folder will contain all of Git's files: here, Git will store its main two databases. We'll talk about those in just a moment. The place where Git creates .git is whatever your current working directory is, so if you are in java/hw1 and run git init, Git creates java/hw1/.git. If you are in java and run git init, Git creates java/.git.
Note that java/.git and java/hw1/.git are different folder path names, and therefore you can create two repositories. You do not want to do this, but that's what you did. (I base this claim on this comment.) We'll come back to "how to fix this" soon.
1In particular, someone using Linux can literally create three different files that differ only in case, stuff all three into a commit in a Git repository, and leave you with a problem when you go to check out this commit on a Windows system. If you're used to the system mapping from typed-in-lowercase to the matching case, and you ask an editor to create java/hw1/thing.java on Linux, it might actually create a java and hw1 right next to your existing Java and Hw1. Since those are different directories they can store different files with the exact same names as those in Java/Hw1/, including name-case. Git will happily store all these files, and Windows often cannot extract such a commit properly.
2Note that git init will first check to see if you're already in some existing repository. In this case, rather than creating a new repository, Git will "reinitialize" the existing repository. In most cases "reinitializing" like this has no effect at all.
The main thing to know about Git and a Git repository
A Git repository—or what I sometimes call the repository proper—consists mainly of two databases. One is usually much bigger. It contains commits and other supporting Git objects. These objects all have hash IDs (or more formally, object IDs or OIDs) that Git must have in order to retrieve the objects from the database. This could force humans to memorize Git commit hash IDs, but that's a bad plan: hash IDs are very large, very random-looking, and impossible for humans to remember in general.
For this reason, a Git repository contains that second, usually much smaller, database. In this database, Git stores names: branch names, tag names, remote-tracking names, and many other kinds of names. These names are for you (and other humans) to use. Each name stores one hash ID, but that's enough to make everything work. So you'll use a branch name, like main or master. This name holds the hash ID of the latest commit, which allows Git to retrieve that commit.
Each commit stores two things:
A commit stores a full snapshot of every file (that Git knew about, that is) at the time you, or whoever, made that commit. The files inside the commit are stored in a special, read-only, Git-only, compressed and de-duplicated form, that only Git can read, and literally nothing can write. (This uses some of those "supporting objects" I mentioned; the files are actually stored in the objects database as "Git objects".) Because nothing but Git can use these files, the files in a commit are useless on their own. We'll see in just a moment how we work with these files.
Meanwhile, that same commit that's storing a snapshot, also stores some metadata, or information about the commit itself: who made it (you, probably), and when, for instance. To make "branches"—a poorly-defined word in Git (see What exactly do we mean by "branch"?)—work, the commit's metadata contains the hash ID of the previous commit.
This "contains previous commit's hash ID" is how Git stores history: the branch name, e.g., main, lets Git find the last commit you made, and then by reading that commit, Git can find the hash ID of the second-to-last commit. For instance, suppose the hash ID of the last commit is H (it's actually some big ugly hexadecimal number so we're just using H to stand in for it). Then we say that the name main points to commit H. But commit H contains the hash ID of an earlier, or parent, commit: let's call that one G. We say that H points to G, and we can draw that:
<-G <-H <--main
Since G is a commit, it has one of these points-to pointers sticking out of it, too. By reading commit G's metadata, Git can find the raw hash ID of its parent; let's call that commit F:
... <-F <-G <-H <--main
So main points to H, which points to G, which points to F, which points to ... well, this goes on until we get back to the very first commit ever—commit A perhaps—which, being first, can't point backwards and therefore simply doesn't.
What this means is that instead of one hash ID, each commit stores, in its metadata, a list of previous-commit hash IDs. The list can be empty, and is for that first commit. It can also have more than one hash ID, but we won't cover this case here. Most commits in most repositories are "ordinary" commits and have exactly one parent, though.
Your "working tree"
A repository, then, stores names—branch names for instance—that help Git find commits for us (we only have to remember the branch names), and stores commits that then store files. But the stuff in the commits (along with the actual commits themselves) is all completely read-only. Git must do this to make the hashing scheme work. What good are stored files if we can't write on them? Moreover, only Git can read them, so what good are they if we can't even read them?
This is where your working tree comes in. Most Git repositories have a working tree.3 The working tree of a repository is, quite simply, where you do your work. And, as we saw earlier, if you use git init in some directory to create a new, totally-empty repository and then make an initial commit:4
mkdir new
cd new
echo example > README.txt
git init
git add README.txt
git commit
you will wind up with a hidden .git folder here in the new/ folder we just made (mkdir new) and entered (cd new). The working tree for this Git repository in new/.git is new/, and the file we created—README.txt—in that working directory is now also stored in the first (and so far only) commit in that repository.
If we now modify the one file, and/or add a new file, and use git add and git commit appropriately, we'll get a second commit that stores (forever5) the new versions of that file. That second commit has, as its parent commit, the first commit, which stores (forever) the earlier version with just the one file in it.
The second commit is now our current commit, and is now the last commit on the main or master branch (whatever its name is).
Git allows us to check out any commit we have stored in the repository. When we do that, Git will erase from our working tree the files that go with the current commit. It will, instead, install into our working tree the files that go with the newly selected commit—which then becomes the current commit.
In this way, we can "go back in time", any time we like, to any older version, stored as a commit in the big database. All we have to do is find its commit hash ID (for which git log comes in handy, for instance). That's not what we'll focus on right now though.
3The exception here is a so-called bare repository. We won't cover these here.
4These are Unix-shell-style commands as I don't use Windows myself, but this should work in git-bash, which is just a port of bash to Windows for use with Git. You can do all this in PowerShell or even CMD.EXE instead, but some command details might change.
5Well, forever, or as long as the commit itself continues to exist. If we remove the commit, we remove its snapshot. This is actually kind of hard to do! However, if we remove the repository proper, we destroy the two databases, which removes all commits, and this is pretty easy to do.
"Nested" repositories: the thing you didn't want, but made
Given that the computer—the host operating system, which is in your case Windows, but this is also true of macOS and Linux—demands and uses a tree-structured file system, we can set up a structure like this:
java
.git
<various Git repository control files and databases>
hw1
.git
<various Git repository control files and databases>
main.java
hw2
.git
<various Git repository control files and databases>
main.java
and so on. Here we have one repository per hw directory plus one overall containing repository in the java directory.
But here's the problem: Git literally cannot store a Git repository inside a Git commit.6 Instead of doing so, the "outer" repository—in this case the one in java/.git, whose working tree is the java/* files—will store what Git calls a submodule using what Git calls a gitlink. To store a submodule correctly, you must use git submodule add, not git add; git add creates or updates only the gitlink, which is sort of half a submodule.
If someone does want submodules (but you don't), this git submodule add method is how to make them. The result is that when you clone the java repository, you get files, plus the magic gitlinks, that Git will need in order to run additional git clone commands, one for each submodule. This way, the person who clones the java repository can run git submodule update --init to run a bunch more git clone commands. But again, that's not what you want.
6There are some tricks to get around this problem if you really need to do it, but it's not a good idea in general. The recent safe.directory stuff is an outgrowth of a security issue that resulted in a CVE when someone discovered such a trick. The tricks that Git allows involve renaming the .git directory; the ones it doesn't allow, or accidentally allowed in the past, result in CVEs. 😀
Fixing the mess
The observations we should make at this point are these:
Git stores commits. It doesn't store files (though commits do store files). It stores commits.
What you want is a single repository with multiple commits, where the first commit—or maybe second; see below—contains a file named hw1/main.java,8 but no files named hw2/whatever.
What you have now are multiple repositories: one, a superproject, with submodules (or half-submodules) named hw1, hw2, and so on, and then more repositories that get cloned into hw1, hw2, and so on, each containing a main.java and whatever other files.
Now, if we assume (or you verify) that you do not need to save any of the commits in any of these repositories so far, what we can do is simply delete all the .git folders and their contents.
That is, on a Unix-like shell, we would run:
cd java
ls # make sure we're in the right place
rm -rf .git # remove this working tree's Git repository
rm -rf hw1/.git # remove the Git repository in hw1/
rm -rf hw2/.git # and so on ...
Note that we're using the OS's remove command, with the "remove everything without asking" options, on the hidden Git folders. Git has no opportunity to stop us: we're totally bypassing Git here. All of Git's files, including the two big databases, get completely removed. This is likely to be irrecoverable (depending on your OS and whether you're using the OS's "remove irrecoverably" command, or its "move to trash so I can get it back if I change my mind" command, and also depending on whether you have good backups, e.g., macOS Time Machine).
We now have only all of the working trees, with no .git folders: there are no repositories left. But all of the files are still there because the checked-out files were, and still are, in the working trees.
Now we create one new, totally-empty repository in the java directory, that we're still in:
git init
[Git prints message: Initialized empty Git repository in ...]
We now have our initial, totally-empty repository. I like to create a first commit that contains just a README.txt (and maybe one or two similar files):
echo repository for "insert class name here" > README.txt
git add README.txt
git commit -m "Initial commit"
We're now ready to "complete" homework assignment #1:
git add hw1
git commit
(write a good, proper commit message in editor)
By running git add hw1 when there's no Git repository inside hw1, we add all the files that are in hw1 (including any files in any subdirectories inside hw1).
The git commit command commits what's been stored so far, as updated by our git add. So when we commit the addition of hw1, we get README.txt—which we didn't change, so this commit literally re-uses the previous version of the file—plus all the hw1/* files.
We can now "complete" homework assignment #2 with git add hw2 and committing, and so on. We end up with a single repository in the java/.git directory, containing multiple commits: an initial one with the README file and subsequent ones with each homework assignment added. There is just the one branch name and it holds the hash ID of the last commit.
Pushing this to GitHub
Your last problem here is that if you have already created a GitHub repository and put some commits in it, your existing GitHub repository is going to be reluctant to lose those commits. You have several options:
You can keep those commits, if you really want to.
You can tell GitHub to completely delete that repository, then create a new one with the same name.
Or, you can use git push --force from your laptop (or other computer) that has your new repository, so as to command the Git software on GitHub to go ahead and lose the old commits from the old repository.
The general idea here, with the last option, is that we (and Git) find commits by starting from some branch name like master or main. That gives us the hash ID of the last commit, and from there, we have Git work backwards.
Suppose we command (not just ask) some GitHub repository to take a new chain of commits. That is, they had:
A <-B <-C <--main
We now make a totally new (empty) repository, and put in two commits: an initial commit D and a second commit E, neither of which have the same hash ID as any of those three commits in the original repository:
D <-E <--main
We run git remote add origin url to set things up so that we can git push to GitHub. If we run:
git push origin main
our Git will send commits D and E to GitHub, then politely ask if they can add commits D-E to their repository. But that would give them:
A <-B <-C
D <-E <-- main
which, they notice, will mean they no longer have any name by which to find commit C, which means they'll "lose" all three hash IDs. So they will say No! If I do that I'll lose my access to some of my commits!
Your Git software reports this as ! [rejected] main -> main (non-fast-forward): it means they are saying they could lose commits. But that's exactly what you want: you want them to lose A-B-C; those commits are no good! So you can use git push --force origin main, which sends D-E again but this time commands them to make their main point to E.
You have to have permission—GitHub add a whole set of permissions that base Git fails to provide—but if you own this GitHub repository, you probably will already have the right permissions.9 So they'll obey: they will make their branch name main point to commit E, and "forget" commits A-B-C.10
8Note that while your OS demands folders with subfolders and files, Git just stores "files with long names that have slashes in them". Git understands the folder-y requirements your OS makes, and can turn hw1/main.java into "file main.java in folder hw1. It will automatically save the OS's hw1/main.java—a file named main.java in a folder named hw1—as the Git file named hw1/main.java.
Normally, you don't need to worry about this whole mess. The time when you do have to worry about it is when you want to store an empty folder in Git, because Git literally can't do that. Git only stores files. There are some tricks for this though: see How can I add a blank directory to a Git repository?.
9If you own the repository, the only way you wouldn't have permissions is if you logged on to GitHub and told them to deny permission to yourself. To fix that, log on to GitHub again and tell them to give permission back to yourself.
10"Normal" Git setups really do eventually forget (or lose) commits this way. GitHub, however, have their software set up to retain all commits forever. So if you send a bad commit to GitHub, and for whatever reason, you really need it removed, you must contact GitHub support and get them to scrub it off their systems.
I am having an odd issue with GIT.
I have a git repo for a small project I'm working on. There are no remotes, this is just for my own work. Up until a few days ago, my development didn't even warrant any branches.
I have finally needed to make a branch for some experimental code. When I was done, I simply checked-out the master branch to go back to where I was. And this is where problems started.
There are six files that change between the new branch and the master branch. Every time I switch / checkout either branch (switching between the two) most of those six file names change case.
For example, if the file was supposed to be someCode.py:
Sometimes the file name switches to somecode.py (Incorrect)
Sometimes the file name switches to someCode.py (Correct)
It doesn't matter which branch I pick, the result is different every time. And which files end up with which CaSe is different every time.
I suspected the GIT plugin for VSCode I was using (GIT Graph) at first, but it also happens with the included GIT GUI, and even happens if I use "git switch" from the command line.
I read about the core.ignorecase setting; it was set to True. I tried setting it to False and the problem persists.
Does anyone have any ideas what's going on?
Thank you
Details:
GIT 2.30.0.windows.2
Windows 10
Files are on a mapped network drive
File system is NTFS
Natively, Git is case sensitive so you could have two different files named filename.txt and FileName.txt. On some operating systems (e.g. Linux) that works fine, and on others (Windows) it does not. Note the same thing goes for branch names too; you can't have two different branches differing by case only on Windows. The reason for this is branch names are stored as a file on your drive.
Most likely your problem is that the files in question have different names on the two branches, but differ by case only. If so, pick which one you want (perhaps master), and on the other branch rename those files to match the casing exactly. If they are actually supposed to be different files, then change their name by more than just casing.
Another possibility is that two (or more) files with names differing by case only exist in the same branch, and switching branches simply causes the issue to come to light. Note it could be the files themselves, or it can be any of the directories in the path to those files could have different casing too. To determine whether this is the problem, navigate to the root of your repo in Git Bash and run this command:
git ls-tree -r HEAD | grep -i [filename-without-path]
If you see two or more files with the same name but different casing, you've identified the problem.
It seems I have resolved the issue - at least to a point.
I created a new branch, manually fixed the file names in Windows Explorer. Now, switching between master and the newBranch does things as expected.
However, if I ever switch to the firstBranch from the question, the issue resurfaces. There seems to be an issue with that branch, and switching in and out of that branch causes the issue.
I still do not have an answer as to why.
EDIT: OK, maybe not. Repo seems to be having this problem again with other branches.
I'm new to git and have a git repository that I use with GitKraken.
In this repository I have multiple branches, and can move from branch to branch in order make modifications where necessary.
I am now in a situation where I'll be making some large modifications to 1 branch that I do not want to commit but in the meantime I would like to make some minor modifications to another branch.
I'm used to work with TFS and there I can just checkout branch to another folder.
I've tried to just copy the folder and my first impression is that this should work....
But, I have seen online remarks that say that I should clone a repository instead.
The git version is lower then 2.5 so I can't use Git-worktree.
Is it ok to just copy the folder or can this have an unexpected effect?
Yes, if you copy the whole folder from the root of the checkout, including the hidden .git folder, then you can make changes to each working copy independently. Each contains their own copy of the repository objects and they will behave exactly as if you have run two separate clones.
As discussed in the comments this isn't necessarily a good use case for this, though: it would be easier (and more disk-space-efficient) to commit your large changes to a local branch so that you can then switch and make other changes. There's no real downside to this; if you do want to remove that temporary commit later then that's easily done as well.
However if you are going to do this, then you probably want to
run a git repack -ad first, so that there are fewer files in the objects tree to copy
consider using git clone --reference instead, which might be slightly more disk-space-efficient
or you want a clean working copy you can create a new working copy folder, copy only the hidden .git folder into the new working copy and then git reset --hard to check out all of the files there too.
You may want to see if git stashing will work for you. I don't recommend copying to a new folder. Mostly because I don't know if it's even possible and I've never seen that as a recommendation. Cloning should also work but it sounds like you are interested in shelving/stashing vs. committing your changes in branch1 before checking out branch2.
https://git-scm.com/book/en/v1/Git-Tools-Stashing