Using grep -q in shell one-liners - bash

I've written a script to list commits in a repo that contain a specific file. It's working perfectly, but I don't understand why I had to write this:
for c in $(git rev-list "$rev_list"); do
git ls-tree --name-only -r "$c" | grep -q "$file"
if [ $? -eq 0 ]; then
echo "Saw $file in $c"
fi
done
Whereas I normally write the same thing like this:
[[ $(git ls-tree --name-only -r "$c" | grep -q "$file") ]] && echo "Saw $file in $c"
# or
[[ ! $(git ls-tree --name-only -r "$c" | grep -q "$file") ]] || echo "Saw $file in $c"
Neither of the short versions work: they don't output anything. When I write it so that it shows all commits that don't contain the file, I do get output:
[[ $(git ls-tree --name-only -r "$c" | grep -q "$file") ]] || echo "Did not see $file in $c"
However, if I then take a commit hash from the output and run
git ls-tree -r <the hash> | grep file
I notice the file is in the tree for some commits, leading me to believe it's just listing all the commits the script processes. Either way, I'm probably missing something, but I can't exactly work out what it is

You don't need to wrap the command in a conditional statement ([[ $(command) ]]). In fact, that will never work with grep -q, because you're actually testing whether the command prints anything. You can just do this:
git ls-tree --name-only -r "$c" | grep -q "$file" && echo "Saw $file in $c"
In general, any code block like
foreground_command
if [ $? -eq 0 ]
then
bar
fi
can be replaced with either
if foreground_command
then
bar
fi
or even
foreground_command && bar
Which of the three alternatives you should use depends on whether foreground_command, bar, or both are multi-line commands.

awk to the rescue:
git ls-tree --name-only -r "$c" | awk "/$file/{printf '%s in %s\n', '$file', '$c'}"

Related

Delete empty files - Improve performance of logic

I am i need to find & remove empty files. The definition of empty files in my use case is a file which has zero lines.
I did try testing the file to see if it's empty However, this behaves strangely as in even though the file is empty it doesn't detect it so.
Hence, the best thing I could write up is the below script which i way too slow given it has to test several hundred thousand files
#!/bin/bash
LOOKUP_DIR="/path/to/source/directory"
cd ${LOOKUP_DIR} || { echo "cd failed"; exit 0; }
for fname in $(realpath */*)
do
if [[ $(wc -l "${fname}" | awk '{print $1}') -eq 0 ]]
then
echo "${fname}" is empty
rm -f "${fname}"
fi
done
Is there a better way to do what I'm after or alternatively, can the above logic be re-written in a way that brings better performance please?
Your script is slow beacuse wc reads every file to the end, which is not needed for your purpose. This might be what you're looking for:
#!/bin/bash
lookup_dir='/path/to/source/directory'
cd "$lookup_dir" || exit
for file in *; do
if [[ -f "$file" && -r "$file" && ! -L "$file" ]]; then
read < "$file" || echo rm -f -- "$file"
fi
done
Drop the echo after making sure it works as intended.
Another version, calling the rm only once, could be:
#!/bin/bash
lookup_dir='/path/to/source/directory'
cd "$lookup_dir" || exit
for file in *; do
if [[ -f "$file" && -r "$file" && ! -L "$file" ]]; then
read < "$file" || files_to_be_deleted+=("$file")
fi
done
rm -f -- "${files_to_be_deleted[#]}"
Explanation:
The core logic is in the line
read < "$file" || rm -f -- "$file"
The read < "$file" command attempts to read a line from the $file. If it succeeds, that is, a line is read, then the rm command on the right-hand side of the || won't be executed (that's how the || works). If it fails then the rm command will be executed. In any case, at most one line will be read. This has great advantage over the wc command because wc would read the whole file.
if ! read < "$file"; then rm -f -- "$file"; fi
could be used instead. The two lines are equivalent.
To check a "$fname" is a file and is empty or not, use [ -s "$fname" ]:
#!/usr/bin/env sh
LOOKUP_DIR="/path/to/source/directory"
for fname in "$LOOKUP_DIR"*/*; do
if ! [ -s "$fname" ]; then
echo "${fname}" is empty
# remove echo when output is what you want
echo rm -f "${fname}"
fi
done
See: help test:
File operators:
...
-s FILE True if file exists and is not empty.
Yet another method
wc -l ~/tmp/* 2>/dev/null | awk '$1 == 0 {print $2}' | xargs echo rm
This will break if any of your files have whitespace in the name.
To work around that, with awk still
wc -l ~/tmp/* 2>/dev/null \
| awk 'sub(/^[[:blank:]]+0[[:blank:]]+/, "")' \
| xargs echo rm
This works because the sub function returns the number of substitutions made, which can be treated as a boolean zero/not-zero condition.
Remove the echo to actually delete the files.

How to use wc -l integer output in a if statement

I am trying to execute the command git diff | grep pkg/client/clientset | wc -l and check if the output is more than 0 or not. I have the following script
if [ "$(git diff | grep pkg/client/clientset | wc -l "$i")" -gt "0" ]; then
echo "Hello"
fi
I am getting the following error while executing the script. The error I am getting is
line 29: [: : integer expression expected
Any idea of what can be going wrong?
Comparing the number of output lines to zero is almost always an antipattern. diff and grep both already tell you whether there was a difference (exit code 1) or a match (exit code 0) precisely so you can say
if diff old new; then
echo "There were differences"
fi
if git diff --exit-code; then
echo "There were differences"
fi
if git diff --exit-code pkg/client/clientset; then
echo "There were differences in this specific file"
fi
if git diff | grep -q pkg/client/clientset; then
echo "Hello"
fi
Notice that git diff requires an explicit option to enable this behavior.
-- EDIT --
There were some incorrect statements in the answer, pointed-out by commentators Gordon Davisson and iBug. They have been corrected in this version of the answer. The final conclusion (remove the "$i") remains the same though.
wc -l "$i" will count the lines in the file $i. If you never used i as a variable, then i will be empty and the command will be wc -l "". The output of that will be empty on STDOUT en contain wc: invalid zero-length file name on STDERR. If the variable i is used, wc will most likely complain about a non-existing file. The point is, that wc will not read STDIN.
I also made some incorrect statements about the quoting. As pointed out, between the ( and ), it is a different quoting context. This can be shown as follows:
$ a="$(/usr/bin/echo "hop")"
$ echo $a
hop
$ b=hop
$ a="$(/usr/bin/echo "$b")"
$ echo $a
hop
Just removing "$i" from the wc-l will solve your issue.
if [ "$(git diff | grep pkg/client/clientset | wc -l)" -gt "0" ]; then
echo "Hello"
fi
Only a note, that is to long for a comment:
if [ "$(git diff | grep pkg/client/clientset | wc -l "$i")" -gt "0" ]; then
I think you will test the existence of string pkg/client/clientset to enter the then part. In this case you can use:
if git diff | grep -q pkg/client/clientset; then
grep will only returns a status because option -q. The status is true after the first occurrence of the string. At this point grep stops. And this status is used by if.

How does process substitution work with while loops?

I'm reading/editing a bash git integration script
This snippet is supposed to print ${SYMBOL_GIT_PUSH} or ${SYMBOL_GIT_PULL} alongside how many commits i am behind and/or ahead by.
local marks
while IFS= read -r line; do
if [[ $line =~ ^## ]]; then
[[ $line =~ ahead\ ([0-9]+) ]] && marks+=" ${BASH_REMATCH[1]}${SYMBOL_GIT_PUSH}"
[[ $line =~ behind\ ([0-9]+) ]] && marks+=" ${BASH_REMATCH[1]}${SYMBOL_GIT_PULL}"
else
marks="${SYMBOL_GIT_MODIFIED}${marks}"
break
fi
done < <(git status --porcelain --branch 2>/dev/null)
printf '%s' "$marks"
Example:
4↑ 10↓
It is working, but i am trying to understand it.
Why is there some IFS and how does it work with process substitution?
I've heard process isn't defined in sh. Is there a way to do this the /bin/sh way or at least more efficiently?
I was provided with a link that should explain what IFS does.
I switched mixed up things and managed to remove the process substitution:
local marks
git status --porcelain --branch 2>/dev/null |
while IFS= read -r line; do
if [[ $line =~ ^## ]]; then
[[ $line =~ ahead\ ([0-9]+) ]] && marks+=" ${BASH_REMATCH[1]}${SYMBOL_GIT_PUSH}"
[[ $line =~ behind\ ([0-9]+) ]] && marks+=" ${BASH_REMATCH[1]}${SYMBOL_GIT_PULL}"
else
marks="${SYMBOL_GIT_MODIFIED}${marks}"
break
fi
done
printf '%s\n' "$marks"
But now, the value of $marks isn't saved and it prints nothing.
I was provided with another link that explains why.
Will return and update on what i've found.
I used the command grouping workaround and wrapped the loop and the print statement inside curly braces:
Also, i made the /bin/sh version almost functional (the exception - show how much commits i'm ahead or behind, not hard, i'm sure i'll do something with awk or cut).
I took advantage of fact that grep returns non-0 when nothing matches.
git status --porcelain --branch 2>/dev/null | {
SYMBOL_GIT_PUSH='↑'
SYMBOL_GIT_PULL='↓'
while IFS= read -r line
do
if echo "$line" | egrep -q '^##'
then
echo "$line" | egrep -q 'ahead' && marks="$marks $SYMBOL_GIT_PUSH"
echo "$line" | egrep -q 'behind' && marks="$marks $SYMBOL_GIT_PULL"
else
marks="*$marks"
break
fi
done
printf ' %s' "$marks"
}
This was a fun learning experience! Thanks to everyone who helped. When i find the 100% solution i'll update this.
Here's the bashism-less git info function.
__git() {
git_eng="env LANG=C git"
ref="$($git_eng symbolic-ref --short HEAD 2>/dev/null)"
[ -n "$ref" ] && ref="$SYMBOL_GIT_BRANCH$ref" || ref="$($git_eng describe --tags --always 2>/dev/null)"
[ -n "$ref" ] || return;
git status --porcelain --branch 2>/dev/null | {
SYMBOL_GIT_PUSH='↑'
SYMBOL_GIT_PULL='↓'
while IFS= read -r line
do
if echo "$line" | grep -E -q '^##'
then
echo "$line" | grep -E -q 'ahead' &&
marks="$marks $SYMBOL_GIT_PUSH$(echo "$line" | sed 's/.*\[ahead //g' | sed 's/\].*//g')"
echo "$line" | grep -E -q 'behind' &&
marks="$marks $SYMBOL_GIT_PULL$(echo "$line" | sed 's/.*\[behind //g' | sed 's/\].*//g')"
else
marks="$SYMBOL_GIT_MODIFIED$marks"
break
fi
done
printf ' %s%s' "$ref" "$marks"
}
}
sed searches for [ahead and deletes it, as well as everything before it, then it pipes it into another sed which deletes everything past ]. This way only the number remains.

Can the url of the remote be included in the output of `git for-each-ref` command?

I am writing to a script to list all the git repositories on my system, their branches, and the latest commits on the system. So I have created this script prints out the following.
directory, branch(ref), date, commit hash, date, commit message, ref (again).
#!/bin/bash
IFS='
'
for directory in `ls -d $1/*/`; do
# echo "$directory : $directory.git"
if [[ -d $directory/.git ]] ; then
# filter=/refs/heads
filter=''
for branch in $(git -C $directory for-each-ref --format='%(refname)' $filter); do
echo $directory : $branch : "$(git -C $directory log -n 1 --oneline --pretty=format:'%Cred%h - %C(yellow)%ai - %C(green)%s %C(reset) %gD' $branch)" : $(echo $branch | rev | cut -d\/ -f-1 | rev)
done
fi
done
What I don't have is the repository URLs for the remotes. Can the remote's URL be printed as part of the output of git for-each-ref command?
I guess I could use git remote -C $directory -v to list the remotes for the repository into a lookup list which I would use for each value of xxxxx in refs/remotes/xxxxx into a variable which would be added to the echo command.
You can use git for-each-ref to generate an arbitrary script that will later be evaluated with eval. Adapting to your case the longest of the examples from the documentation on git for-each-ref, I arrived at the following:
#!/bin/bash
IFS='
'
for directory in "$1"/*/ ; do
# echo "$directory : $directory.git"
if [[ -d $directory/.git ]] ; then
# filter=/refs/heads
filter=''
fmt="
dir=$directory"'
ref=%(refname)
refshort=%(refname:short)
h=%(objectname:short)
date=%(authordate:iso)
subj=%(subject)
printf "%s : %s : %s - %s - %s : %s" "$dir" "$ref" %(color:red)"$h" %(color:yellow)"$date" %(color:green)"$subj" %(color:reset)"$refshort"
if [[ $ref == refs/remotes/* ]]
then
remotename="${ref#refs/remotes/}"
remotename="${remotename%%/*}"
printf " : %s" "$(git -C "$dir" remote get-url "$remotename")"
fi
printf "\n"
'
eval="$(git -C $directory for-each-ref --shell --format="$fmt" $filter)"
eval "$eval"
fi
done
Note that I got rid of the git log from your implementation, but as a result I also dropped the %gD field, since I didn't quite well understand what it meant, and couldn't find how to obtain it inside git-for-each-ref. Besides, I changed the way of obtaining the short ref from the full ref; my implementation is not equivalent yours, but produces unambiguous results).

CVS branch name from tag name

I have a number of modules in CVS with different tags. How would I go about getting the name of the branch these tagged files exist on? I've tried checking out a file from the module using cvs co -r TAG and then doing cvs log but it appears to give me a list of all of the branches that the file exists on, rather than just a single branch name.
Also this needs to be an automated process, so I can't use web based tools like viewvc to gather this info.
I have the following Korn functions that you might be able to adjust to run in bash. It should be apparent what it's doing.
Use get_ver() to determine the version number for a file path and given tag. Then pass the file path and version number to get_branch_name(). The get_branch_name() function relies on a few other helpers to fetch information and slice up the version numbers.
get_ver()
{
typeset FILE_PATH=$1
typeset TAG=$2
TEMPINFO=/tmp/cvsinfo$$
/usr/local/bin/cvs rlog -r$TAG $FILE_PATH 1>$TEMPINFO 2>/dev/null
VER_LINE=`grep "^revision" $TEMPINFO | awk '{print $2}'`
echo ${VER_LINE:-NONE}
rm -Rf $TEMPINFO 2>/dev/null 1>&2
}
get_branch_name()
{
typeset FILE=$1
typeset VER=$2
BRANCH_TYPE=`is_branch $VER`
if [[ $BRANCH_TYPE = "BRANCH" ]]
then
BRANCH_ID=`get_branch_id $VER`
BRANCH_NAME=`get_tags $FILE $BRANCH_ID`
echo $BRANCH_NAME
else
echo $BRANCH_TYPE
fi
}
get_minor_ver()
{
typeset VER=$1
END=`echo $VER | sed 's/.*\.\([0-9]*\)/\1/g'`
echo $END
}
get_major_ver()
{
typeset VER=$1
START=`echo $VER | sed 's/\(.*\.\)[0-9]*/\1/g'`
echo $START
}
is_branch()
{
typeset VER=$1
# We can work out if something is branched by looking at the version number.
# If it has only two parts (i.e. 1.123) then it's on the trunk
# If it has more parts (i.e. 1.2.2.4) then it's on the branch
# We can error detect if it has an odd number of parts
POINTS=`echo $VER | tr -dc "." | wc -c | awk '{print $1}'`
PARTS=$(($POINTS + 1))
if [[ $PARTS -eq 2 ]]
then
print "TRUNK"
elif [[ $(($PARTS % 2)) -eq 0 ]]
then
print "BRANCH"
else
print "ERROR"
fi
}
get_branch_id()
{
typeset VER=$1
MAJOR_VER=`get_major_ver $VER`
MAJOR_VER=${MAJOR_VER%.}
BRANCH_NUMBER=`get_minor_ver $MAJOR_VER`
BRANCH_POINT=`get_major_ver $MAJOR_VER`
echo ${BRANCH_POINT}0.${BRANCH_NUMBER}
}
get_tags()
{
typeset FILE_PATH=$1
typeset VER=$2
TEMP_TAGS_INFO=/tmp/cvsinfo$$
cvs rlog -r$VER $FILE_PATH 1>${TEMP_TAGS_INFO} 2>/dev/null
TEMPTAGS=`sed -n '/symbolic names:/,/keyword substitution:/p' ${TEMP_TAGS_INFO} | grep ": ${VER}$" | cut -d: -f1 | awk '{print $1}'`
TAGS=`echo $TEMPTAGS | tr ' ' '/'`
echo ${TAGS:-NONE}
rm -Rf $TEMP_TAGS_INFO 2>/dev/null 1>&2
}

Resources