View NUMBER of local uncommitted files - bash

I've learned I can use count to find the NUMBER of commits a branch is ahead/behind by, like so:
git rev-list --count HEAD..#{u}
But is there a way to do so for uncommitted files?
Just found out git status -suno shows how many files have been changed in a really concise way, so I could either count the lines of the output (with echo "$var" | wc -l) or just put a symbol to denote an arbitrary amount exist, or parse it in a weird way to see the number of deleted/added/modified.
However, do non "porcelain" and more directly-addressing commands exist to accomplish this task, as parsing commands such as these are seen as bad practice?
Also, I am using this to add to a git-bash prompt; I would normally just type in git status, but would like to have maximum convenience by just showing such.

Ironically, the --porcelain option of git status is meant to be parsed:
git status --porcelain -suno|wc -l
So while git status is porcelain, git status --porcelain does produce output suitable for consumption by porcelain scripts.
I tried to explain said option in "What does the term “porcelain” mean in Git?"

Related

Check out git repository that contains invalid filenames in windows [duplicate]

I'm working a shared project using git for version control. I'm on windows while my partner is on Unix.
My partner has named some files with <file1>.txt. When I try to pull these files they are not accepted as the < and > are invalid characters for Windows. This is fine, I don't need to touch the files. However, they are added to my commit as deleted. So, if I push then I'll delete these files which I don't want to do.
I can't use git reset --hard as it finds an invalid path for each of these "deleted" files.
Is there a way to exclude these files from my commits? I've tried adding <file1> to my .git/info/exclude but that didn't work.
You would need to get your partner to change the names to be something that is also valid on Windows. After they have renamed them, what I'd do is this:
Backup any changes that you only have locally (both uncommitted AND committed but not pushed).
Run git reset --hard <commit> where <commit> is any commit from before the files were added.
Run git pull to get all the way to the latest revision (where the files are renamed).
Restore your backed up changes from 1.
This should then get the newer revision where the files aren't named in this, to Windows, illegal way, and they won't be deleted (or ever created) from under git by the OS :)
P.S. I know this is an old question, but I've been getting this issue recently, so hopefully the solution I've arrived at can help others as well.
EDIT:
To avoid this happening again, your partner can add a pre-commit hook that will stop them from committing files with names that would not be allowed on Windows. There's a sample/example often in pre-commit.sample. I've changed the bit a little in the past and end up with something like:
# Cross platform projects tend to avoid non-ASCII filenames; prevent
# them from being added to the repository. We exploit the fact that the
# printable range starts at the space character and ends with tilde.
if [ "$allownonascii" != "true" ] &&
# Note that the use of brackets around a tr range is ok here, (it's
# even required, for portability to Solaris 10's /usr/bin/tr), since
# the square bracket bytes happen to fall in the designated range.
echo $(git diff --cached --name-only --diff-filter=A -z $against | LC_ALL=C)
test $(git diff --cached --name-only --diff-filter=A -z $against |
LC_ALL=C tr -d '[ !#-)+-.0-9;=#-[]-{}~]\0' | wc -c) != 0
then
cat <<\EOF
Error: Attempt to add a non-ASCII file name.
This can cause problems if you want to work with people on other platforms.
To be portable it is advisable to rename the file.
If you know what you are doing you can disable this check using:
  git config hooks.allownonascii true
EOF
exit 1
fi
The '[ !#-)+-.0-9;=#-[]-{}~]\0' bit is the important part that I've changed a little. It defines all the allowed ranges of characters, and the example one only disallows "non-ascii" characters (which is what the comment at the top says), but there are also ascii characters that are not allowed in file names on Windows (such as ? and :).
All the allowed characters are removed, and if there's anything left (wc -c != 0) it errors. It can be a bit difficult to read, as you can't see any of the disallowed characters. It helps if you have a list of the char ranges to look at when reading or editing it.
Ignoring doesn't help if the files are tracked already.
Use Sparse checkout to skip those files.

git branch command works fine as a cli command, but fails when run from loop or script using variables

In creating setup scripts, I have several git repos that I clone locally. This is done through a temporarily available proxy that may or may not be available later on, so I need to create all the remote branches from the remote repo as local branches that can be switched to. I have a method to extract the names of the remote repos that I want, when get stored as
[user]$ nvVar=$(git branch -r | grep -v '\->' | grep -Ev 'master|spdk\-1\.6' | cut -d'/' -f2)
This gives me variable list that can be iterated through, containing the branches I need to bring down.
[user]$ echo "$nvVar"
lightnvm
nvme-cuse
spdk
If I were doing all this manually, I would use commands like:
[user]$ git branch --track lightnvm origin/lightnvm
Branch lightnvm set up to track remote branch lightnvm from origin.
Which works fine...
But when I try to loop through the variable using shell expansion, I get a failure.
(FYI, if I put quotes around $nvVar, it doesn't iterate, and just tries running the whole string and fails. I have also tried to do this with an array, which also doesn't work, as well as using a while loop using the filtered output from git branch -r)
[user]$ for i in $nvVar; do git branch --track "${i}" "origin/${i}"; done
Which is supposed to produce the following git commands:
git branch --track lightnvm origin/lightnvm
git branch --track nvme-cuse origin/nvme-cuse
git branch --track spdk origin/spdk
Which seem to be identical to the same command typed in manually.. but instead, I get these errors:
fatal: 'lightnvm' is not a valid branch name.
fatal: 'nvme-cuse' is not a valid branch name.
fatal: 'spdk' is not a valid branch name.
Which makes no sense...
OS: RHEL 7.6
Git Version: 1.8.3.1
Bash Version: GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)
(Edit) Apparently I have some special characters being captured that are messing up the command.
there's a " ^[[m " being appended to the captured variable... Really not sure how to get rid of that without hard-coding the commands, which I had hoped to avoid
Figured out a solution:
echo '#!/bin/bash' > gitShell
git branch -r | grep -v '\->' | grep -Ev 'master|spdk\-1\.6' | cut -d'/' -f2 | while read remote; do
echo "git branch --track ${remote} origin/${remote}" >> gitShell
done
cat -v gitShell | sed 's/\^\[\[\m//g' > gitShell1
if /bin/bash -ex gitShell1; then
echo 'Git repos branched'
rm gitShell
rm gitShell1
fi
I simply push the output to a file, then use cat -v to force the hidden characters to get displayed as normal characters, then filter them out with sed, and run the new script.
It's cumbersome, but it works. Apparently git returns "private unicode characters" in response to remote queries.
Thanks to #Cyrus for cluing me in to the fact that I had hidden characters in the original variable.
The git branch command is not meant for writing scripts. The problem is occurring because you have color-changing text strings embedded within the branch names. For instance, ESC [ 3 1 m branch ESC [ m spells out "switch to green, print the word branch, and stop printing in green". (The git branch command uses green by default for the current branch, which is not the interesting one here, but still emits various escape sequences for non-current-branch cases.)
You should be using git for-each-ref instead of git branch. This is what Git calls a plumbing command: one that is meant for writing scripts. Its output is meant to be easily machine-parsed and not contain any traps like color-changing escape sequences. It also obviates the need for some of the subsequent tricks, as it has %(...) directives that can be used to strip the desired number of prefixes from items.
(Alternatively, it's possible to use git branch but to disable color output, e.g., git -c color.branch=never branch. But git branch does not promise not to make arbitrary changes to its output in the future, while git for-each-ref does.)
You might also consider attacking the original problem in a different way: create a "mirror clone", but then once the clone is done, rewrite the fetch refspec and remove the mirror configuration. The difference between a regular clone and a mirror clone is, in short, that a regular clone copies all1 the commits and none2 of the branches, but a mirror clone copies all of the commits and all of the branches and all the other references as well,3 and sets remote.remote.mirror to true.
1Well, most of the commits, depending on which ones are reachable from which refs. If some objects are hidden or only findable via reflogs, you don't normally get those—but you don't normally care either, and in fact it's often desirable, e.g., after deleting an accidentally-committed 10 GB database.
2After copying the commits, a regular fetch turns branch names into remote-tracking names (origin/master for instance). The final git checkout step creates one new branch name, or if -n is given a tag name, doesn't.
3As with the "all commits", this is a sort of polite fiction: the sender might hide certain refs, in which case you don't get those refs, and presumably don't get those commits and other objects either. On the other hand, optimizations to avoid repacking might accidentally send unneeded objects: that 10 GB database might come through even when you didn't want it. Fortunately reflogs aren't refs, so this generally shouldn't happen.

Check if my current branch on local fork exists on remote upstream

git branch -a | egrep 'remotes/upstream/master' tells me if master branch exists on upstream or not. That is good.
Lets say my current branch is named as my_branch which is the output of git rev-parse --abbrev-ref HEAD.
Question
In a bash script, how can I get the output of git rev-parse --abbrev-ref HEAD into a variable and check if that branch exists on upstream or not? I tried the following but it does not work?
#!/bin/bash
git checkout master
export MY_BRANCH=`git rev-parse --abbrev-ref HEAD`
git branch -a | egrep 'remotes/upstream/${MY_BRANCH}$'
# Above last command should output "remotes/upstream/master" but it doesn't like it does for "git branch -a | egrep 'remotes/origin/master'"?
PS:
Should I have to do git fetch --all for this to work correctly synced with upstream?
Your own answer works. There are a number of shortcuts you can use, but before we visit them, let's start with this:
Should I have to do git fetch --all for this to work correctly synced with upstream?
Given that you're querying your own Git about what it remembers about the remote you're calling upstream, it is a good idea to run git fetch upstream. Using git fetch --all is not a bad idea, and you can use that instead if you'd like to fetch from both origin and upstream, so as to update all your origin/* and upstream/* names. Given that you're only querying one of the upstream/* names, though, all you need is an update on those. Indeed, you could run git fetch upstream $MY_BRANCH here, although now we're really getting into hair-splitting.1
Now, as to the shortcuts: git rev-parse does everything you need. You don't need git branch -a and grep. (You definitely don't need egrep: grep was originally very simple, then got complicated-up into grep, fgrep, and egrep. It's been re-simplified in most modern systems, so that when you do need grep, you can just run grep, with whatever interface flags you like for the desired behavior. The program will pick the right algorithm, and run fast on its own.)
git rev-parse --abbrev-ref HEAD: this prints the current branch name, if there is a current branch name. If there is no current branch name, this prints HEAD. That's almost certainly what you want.3
git rev-parse refs/remotes/upstream/$MY_BRANCH: this prints the hash ID corresponding to the name refs/remotes/upstream/$MY_BRANCH. If there is no such name, it prints an error message (see below).
The refs/ in front of remotes/upstream/$MY_BRANCH is simply the way we use the full, un-ambiguous name in Git. Git normally strips off refs/heads/, refs/tags/, refs/remotes/, or refs/ from ref names, all of which start with refs/. This leaves us with master as a branch name, v1.2 as a tag name, and either origin/master or remotes/origin/master as a remote-tracking name.
As you've seen, git branch -a doesn't take off the remotes/ part from refs/remotes/origin/master or refs/remotes/upstream/master. However, if you were to use git branch -r, it would take the remotes/ part off, leaving you with, e.g., origin/master and upstream/master. There is no obvious reason that git branch behaves this way; it just does. By using git rev-parse instead, you avoid having to deal with this particular inconsistency.
1Presumably, the point of splitting hairs here is to make this go as fast / efficiently as possible. A git fetch --all has your Git call up each remote, one at a time,2 and fetch everything from each. So this will take as long as it takes to work its way all the way through every remote, one at a time. Meanwhile, a git fetch upstream calls up just the one remote (upstream) and then fetches everything there is at the one upstream. A git fetch upstream $MY_BRANCH calls up just the one remote and then asks it only for new commits and such on the one branch.
Let's compare the three:
--all dials up every "phone number". There's a certain amount of slack time in looking up the "phone number" (Internet address) and making the "call" to the server there. So this could take a few extra seconds. But once done, everything is updated and you don't have to run git fetch until things could have changed: depending on how busy these servers are, anywhere from days, to seconds.
Using one remote dials one "phone number", then gets all new commits from all branches and updates all the origin/* or upstream/* or whatever names. This takes a little longer—sometimes milliseconds, sometimes seconds—than updating just one. But once done, you'll need to repeat it for any other branch.
Using one remote and one branch name might gain you a few milliseconds, or as much as a few seconds. This is usually much smaller than the gain from avoiding --all. How much difference does it make for your particular cases? You'll have to measure.
That last one is really the bottom line, as it were. Measure, measure, measure. Or, write something convenient to write, and if it works well enough, stop. :-)
2A future Git might be able to call them up in parallel to take advantage of multiple CPUs and so on. This could change the calculus a bit, or even a lot.
3There are several corner cases to consider, as we'll see in the "errors" section below.
Error messages and corner cases
While git rev-parse is great for things that do work, there are things that don't work. For instance, here is what happens if I try to rev-parse a name that doesn't exist:
$ git rev-parse refs/remotes/upstream/foobranch
refs/remotes/upstream/foobranch
fatal: ambiguous argument 'refs/remotes/upstream/foobranch': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
The output here is a bit messy:
The standard output from rev-parse is just refs/remotes/upstream/foobranch. That is, I gave rev-parse a name that does not parse, so it gave that name right back to me.
The stderr output from rev-parse has the fatal: ..., Use '--' ..., and final line.
If all works, we get this:
$ git rev-parse refs/remotes/origin/master
225365fb5195e804274ab569ac3cc4919451dc7f
As before, you might want to save this in a variable:
$ result=`git rev-parse refs/remotes/origin/master`
$
The shell (sh or bash here) collects the standard output and assigns that to the variable ($result here); the standard error output goes through to the terminal:
$ result=`git rev-parse refs/remotes/upstream/foobranch`
fatal: ambiguous argument 'refs/remotes/upstream/foobranch': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
Note that this time, since the standard output went into $result, we did not see refs/remotes/upstream/foobranch echoed.
The backquote construct:
var=`cmd`
is usually better expressed as:
var=$(cmd)
if for no other reason than parentheses nest. That is:
var=`cmd1 `cmd2` -- `cmd3``
might be an attempt to run cmd1 with the output from cmd2 and cmd3 as arguments. This does not work, but:
var=$(cmd1 $(cmd2) -- $(cmd3))
does work. So we should do that.
We can get or check the status of the $(...) command:
var=$(cmd) || {
echo "unable to run the given command"
exit 1
}
for instance. So with git rev-parse, if the upstream doesn't exist, we might want:
result=$(git rev-parse refs/remotes/upstream/$MY_BRANCH) || exit
which will quit the script if git rev-parse fails. The rev-parse will still print an error message to stderr. To suppress that, we have:
--quiet --verify
as options to rev-parse, or we can redirect stderr to /dev/null; this part is your choice.
Finally, there are some corner cases to consider with git rev-parse --abbrev-rev HEAD:
You can be on a detached HEAD. In this case, as already noted, git rev-parse just prints HEAD. If that's what you'd like, go with that.
You can be in a completely empty repository, or on an unborn branch created with git checkout --orphan or git switch --orphan. In this case, git rev-parse produces an error message, and no output. You should check for this.
Optionally, you can use git symbolic-ref --short HEAD. This has different behavior: it produces the name of the branch if you are on a branch, including in the special case of an empty repository or orphan/unborn branch. But, when in detached-HEAD mode, it produces an error message.
So, in the end, the shortened script is:
MY_BRANCH=$(git rev-parse --abbrev-ref HEAD) || exit
hash=$(git rev-parse --quiet --verify refs/remotes/upstream/$MY_BRANCH) || {
echo "upstream/$MY_BRANCH does not exist"
exit 1
}
# do whatever you like with upstream/$MY_BRANCH here
If you want the hash ID corresponding to upstream/$MY_BRANCH, it is in $hash now.
Note that there's no point in exporting MY_BRANCH unless you want to access it from other programs that read an environment variable.
I figured that I needed to remove the single quotes and it works after that
git branch -a | egrep remotes/upstream/$MY_BRANCH
Following also works in a one liner:
git branch -a | egrep remotes/upstream/`git rev-parse --abbrev-ref HEAD`

How can I get a list of first parent commits using git python?

I'd like to run the equivalent of this command from git python, but have not been able to figure out a way to accomplish this.
git rev-list --first-parent commit1..HEAD
I'm looking to get the result of that command into an iterable of git python's Commit objects. I tried repo.iter_commits but it doesn't appear to be capable of taking in arguments to rev-list that don't take parameters.
My use case is that "commit1" will be the commit upon which a branch was based, and I'll run this code while the branch is checked out. Thus, this command would give me the list of commits committed to the branch, even in the presence of merge commits from the branch "commit1" is on.
I have also tried
repo.iter_commits('HEAD ^commit1')
but that results in the following error:
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git rev-list HEAD ^commit1 --
stderr: 'fatal: bad revision 'HEAD ^commit1'
However, I can run
git rev-list HEAD ^commit1 --
in bash and it runs fine. And besides, the command doesn't really give me what I need.
I was able to get what I needed by working directly with the commit's parents list in git python. Here is a snippet of what worked for me:
commits = list()
c = repo.head.commit
while (True):
firstparent = c.parents[0]
if (firstparent != commit1):
c = firstparent
commits.append(c)
else:
break
The above code does not handle first commits (which have no parents.)
You can use:
repo.iter_commits(repo.head.commit, first_parent=True)
(I'm working with git python version 3.1.29).

list files with git status

Background
I am well aware of how git status works, and even about git ls-files. Usually git status is all I need and want, it perfectly answers the question: "What is my status, and what files need my attention?"
However, I have been unable to find a quick command that answers the following question: "What files do I have, and what is their respective status?" So, I need a full listing of the directory (like ls -la) with a column that shows the status of each file/directory.
What I have tried
git status -s --ignored comes quite close to the output format that I want, but it just won't list the files that are unchanged between HEAD, index, and working directory. Also, it will recurse into directories.
git ls-files seems to be able to provide all the required info in scriptable form, but I've been unable to stop it from recursive listing the contents of all directories.
Obviously, I could hack something together that takes the output of these two commands and provides the view I would like to have. However, I would hate to reinvent the wheel if there is already some usable command out there.
Question
Is there some way of listing all files in a directory with their respective git status?
I want a full listing showing exactly the same files that ls would show.
Notes
This other question does not answer mine, because I definitely want an ls equivalent. Including unmodified, ignored, and untracked files, but excluding directory contents.
To restrict the paths Git inspects to just the current directory, use its Unix glob pathspecs. Since git status does a lot of checking against the index and against HEAD, use that, and to fill in the rest of the files ls would show you, use ls, just munge its output to have the same format as git status's output and take only the ones git status didn't already list.
( git status -s -- ':(glob)*'; ls -A --file-type | awk '{print " "$0}' ) \
| sort -t$'\n' -usk1.4
:(glob) tells Git the rest of the pathspec's a Unix glob, i.e. that * should match only one level, just like a (dotglob-enabled) shell wildcard¹.
The -t$'\n' tells sort that the field separator is a newline, i.e. it's all one big field, and -usk1.4 says uniquify, only take the first of a run, stable, preserve input order where it doesn't violate sort key order (which is a little slower so you have to ask for that specifically), k1.4 says the key starts at the first field, the fourth character in that field, with no end given so from there to the end.
¹ Why they decided to make pathspecs match neither like shell specs nor like gitignore specs by default, I might never bother learning, since I so much prefer ignorantly disapproving of their annoying choice.
Because output of git status -s is enough, let's just make a bash routine around this function! Then for unchanged files we could to echo the proper signaling manually. Following the specification we might use two symbols of space ' ' for this purpose either some another symbol. E.g. for directories, which are not tracked by Git anyway, selected symbol '_' as status code:
for FILE in *
do
if [[ -f $FILE ]]
then
if ! [[ $(git status -s $FILE) ]]
then
# first two simbols below is a two-letter status code
echo " $FILE"
else
git status -s "$FILE"
fi
fi
if [[ -d $FILE ]]
then
# first two symbols just selected as status code for directories
echo "__ $FILE"
fi
done
The script works in the same manner as ls. It can be written in one line using ; as well.

Resources