why git log output redirect to a while loop not working? - bash

I am trying this command in my bash shell script:
git log --oneline --no-decorate --pretty=format:"%s" $oldrev..$newrev
git log --oneline --no-decorate --pretty=format:"%s" $oldrev..$newrev | while read -r line; do
echo "$line"
done
the first git log can print output, but the second one followed by while won't print anything. why ?
I invoke my script like this:( second and third argument passed to $oldrev and $newrev)
./check master a735c2f eb23992
if I add --no-pager option, both will print nothing.
I am using bash 4.4.23(1)-release on fedora 28.

Instead of pretty=format, you should use pretty=tformat:
'tformat:'
The 'tformat:' format works exactly like 'format:', except that it provides "terminator" semantics instead of "separator" semantics.
In other words, each commit has the message terminator character (usually a newline) appended, rather than a separator placed between entries.
This means that the final entry of a single-line format will be properly terminated with a new line, just as the "oneline" format does. For example:
$ git log -2 --pretty=format:%h 4da45bef \
| perl -pe '$_ .= " -- NO NEWLINE\n" unless /\n/'
4da45be
7134973 -- NO NEWLINE
$ git log -2 --pretty=tformat:%h 4da45bef \
| perl -pe '$_ .= " -- NO NEWLINE\n" unless /\n/'
4da45be
7134973
In addition, any unrecognized string that has a % in it is interpreted as if it has tformat: in front of it.
For example, these two are equivalent:
$ git log -2 --pretty=tformat:%h 4da45bef
$ git log -2 --pretty=%h 4da45bef

Related

How can I use Git to identify function changes across different revisions of a repository?

I have a repository with a bunch of C files. Given the SHA hashes of two commits,
<commit-sha-1> and <commit-sha-2>,
I'd like to write a script (probably bash/ruby/python) that detects which functions in the C files in the repository have changed across these two commits.
I'm currently looking at the documentation for git log, git commit and git diff. If anyone has done something similar before, could you give me some pointers about where to start or how to proceed.
That doesn't look too good but you could combine git with your
favorite tagging system such as GNU global to achieve that. For
example:
#!/usr/bin/env sh
global -f main.c | awk '{print $NF}' | cut -d '(' -f1 | while read i
do
if [ $(git log -L:"$i":main.c HEAD^..HEAD | wc -l) -gt 0 ]
then
printf "%s() changed\n" "$i"
else
printf "%s() did not change\n" "$i"
fi
done
First, you need to create a database of functions in your project:
$ gtags .
Then run the above script to find functions in main.c that were
modified since the last commit. The script could of course be more
flexible, for example it could handle all *.c files changed between 2 commits as reported by git diff --stats.
Inside the script we use -L option of git log:
-L <start>,<end>:<file>, -L :<funcname>:<file>
Trace the evolution of the line range given by
"<start>,<end>" (or the function name regex <funcname>)
within the <file>. You may not give any pathspec
limiters. This is currently limited to a walk starting from
a single revision, i.e., you may only give zero or one
positive revision arguments. You can specify this option
more than once.
See this question.
Bash script:
#!/usr/bin/env bash
git diff | \
grep -E '^(##)' | \
grep '(' | \
sed 's/##.*##//' | \
sed 's/(.*//' | \
sed 's/\*//' | \
awk '{print $NF}' | \
uniq
Explanation:
1: Get diff
2: Get only lines with hunk headers; if the 'optional section heading' of a hunk header exists, it will be the function definition of a modified function
3: Pick only hunk headers containing open parentheses, as they will contain function definitions
4: Get rid of '## [old-file-range] [new-file-range] ##' sections in the lines
5: Get rid of everything after opening parentheses
6: Get rid of '*' from pointers
7: [See 'awk']: Print the last field (i.e: column) of the records (i.e: lines).
8: Get rid of duplicate names.

Passing zsh function parameter to grep -E

Background...
Trying to find which commit(s) last touched a specific file.
I can do this on the CLI piping from git-log to grep but I'm trying to wrap this in a zsh function, more for ease of memory.
Here's my function, and then here is the output I'd like to generate with it.
# match lines from git log that start with commit or include the
# filename I'm interested in and then pipe back through grep to color the output
glpg() {
\git log --name-only | \grep -E ‘“$"|^commit\s\S' | \grep -B1 --color -E ‘$'
}
Desired usage and output
dwight:assets (add-analytics*) $ glpg clickouts
commit 6662418b8e68e478b95e7254faa6406abdada30f
web/assets/app/viewmodels/clickouts.js
web/assets/app/views/clickouts.html
web/client/app/viewmodels/clickouts.js
web/client/app/views/clickouts.html
--
commit cee37549f613985210c9caf90a48e2cca28d4412
web/client/app/viewmodels/clickouts.js
web/client/app/views/clickouts.html
--
commit df9ea8cd90ff80b89a0c7e2b0657141b105d5e7e
web/client/app/viewmodels/clickouts.js
web/client/app/views/clickouts.html
Three problems.
You use Unicode apostrophes and quotes, ‘ and “. Replace them with ASCII quotes and doublequotes.
You can't use \s and \S to mean space or non-space with a standard (POSIX) grep. Use ' ' and [^ ] instead to be portable.
The list of all args is referenced with "$#" including the double quotes.

how to make a winmerge equivalent in linux

My friend recently asked how to compare two folders in linux and then run meld against any text files that are different. I'm slowly catching on to the linux philosophy of piping many granular utilities together, and I put together the following solution. My question is, how could I improve this script. There seems to be quite a bit of redundancy and I'd appreciate learning better ways to script unix.
#!/bin/bash
dir1=$1
dir2=$2
# show files that are different only
cmd="diff -rq $dir1 $dir2"
eval $cmd # print this out to the user too
filenames_str=`$cmd`
# remove lines that represent only one file, keep lines that have
# files in both dirs, but are just different
tmp1=`echo "$filenames_str" | sed -n '/ differ$/p'`
# grab just the first filename for the lines of output
tmp2=`echo "$tmp1" | awk '{ print $2 }'`
# convert newlines sep to space
fs=$(echo "$tmp2")
# convert string to array
fa=($fs)
for file in "${fa[#]}"
do
# drop first directory in path to get relative filename
rel=`echo $file | sed "s#${dir1}/##"`
# determine the type of file
file_type=`file -i $file | awk '{print $2}' | awk -F"/" '{print $1}'`
# if it's a text file send it to meld
if [ $file_type == "text" ]
then
# throw out error messages with &> /dev/null
meld $dir1/$rel $dir2/$rel &> /dev/null
fi
done
please preserve/promote readability in your answers. An answer that is shorter but harder to understand won't qualify as an answer.
It's an old question, but let's work a bit on it just for fun, without thinking in the final goal (maybe SCM) nor in tools that already do this in a better way. Just let's focus in the script itself.
In the OP's script, there are a lot of string processing inside bash, using tools like sed and awk, sometimes more than once in the same command line or inside a loop executing n times (one per file).
That's ok, but it's necessary to remember that:
Each time the script calls any of those programs, it's created a new process in the OS, and that is expensive in time and resources. So the less programs are called, the better is the performance of script that is executing:
diff 2 times (1 just to print to user)
sed 1 time processing diff result and 1 time for each file
awk 1 time processing sed result and 2 times for each file (processing file result)
file 1 time for each file
That doesn't apply to echo, read, test and others that are builtin commands of bash, so no external program is executed.
meld is the final command that will display the files to user, so it doesn't count.
Even with the builtin commands, redirection pipelines | has a cost too, because the shell has to create pipes, duplicate handles, and maybe even creating forks of the shell (that is a process itself). So again: less is better.
The messages of diff command are locale dependants, so if the system is not in english, the whole script won't work.
Thinking that, let's clean a bit the original script, mantaining the OP's logic:
#!/bin/bash
dir1=$1
dir2=$2
# Set english as current language
LANG=en_US.UTF-8
# (1) show files that are different only
diff -rq $dir1 $dir2 |
# (2) remove lines that represent only one file, keep lines that have
# files in both dirs, but are just different, delete all but left filename
sed '/ differ$/!d; s/^Files //; s/ and .*//' |
# (3) determine the type of file
file -i -f - |
# (4) for each file
while IFS=":" read file file_type
do
# (5) drop first directory in path to get relative filename
rel=${file#$dir1}
# (6) if it's a text file send it to meld
if [[ "$file_type" =~ "text/" ]]
then
# throw out error messages with &> /dev/null
meld ${dir1}${rel} ${dir2}${rel} &> /dev/null
fi
done
A little explaining:
Unique chain of commands cmd1 | cmd2 | ... where the output (stdout) of previous one is the input (stdin) of the next one.
Execute sed just once to execute 3 operations (separated with ;) in diff output:
Deleting lines ending with " differ"
Delete "Files " at the beginning of remaining lines
Delete from " and " to the end of remaining lines
Execute command file once to process the file list in stdin (option -f -)
Use the while bash sentence to read two values separated by : for each line line of stdin.
Use bash variable substitution to extract filename from a variable
Use bash test to compare a file type with a regular expression
For clarity reasons, I didn't considerate that file and directory names may have spaces. In such cases, both scripts will fail. To avoid that is necessary enclose in double quotes any reference to file/dir name variable.
I didn't use awk, because it is powerful enough that can replace almost the entire script ;-)

Shell: How to delete mercurial files using an automated script?

I want to make a tiny script that deleted ALL the files in my Symfony project that mercuaril gives me as unwanted files.
For example:
hg status:
...
? web/images/lightwindow/._arrow-up.gif
? web/images/lightwindow/._black-70.png
? web/images/lightwindow/._black.png
? web/images/lightwindow/._nextlabel.gif
? web/images/lightwindow/._pattern_148-70.png
? web/images/lightwindow/._pattern_148.gif
? web/images/lightwindow/._prevlabel.gif
? web/js/._lightwindow.js
? web/sfPropel15Plugin
? web/sfProtoculousPlugin
I would like to delete all the files that are marked with the ?. ONLY THOSE. Not the ones modified -M-, and so on.
I'm trying to do a mini-script for that:
hg status | grep '^?*' | rm -f
I don't know if it is OK. Could you help me with one?
You're missing xargs, which takes the input and gives it to a command as parameters (right now you're actually sending them to rm's standard input, which isn't meaningful). Something like:
hg status | grep '^?' | cut -d' ' -f2 | xargs rm -f
Note: it won't work if your file names contain spaces. It'd still be possible but you need to be more clever.
Try this:
hg status|awk '/^? /{gsub(/^\? /, "", $0);print;}'|while read line; do
rm -f "$line"
done
The awk command matches everything starting with '?', and executes the block '{gsub(/^\? /, "", $0);print;}'. The block does a substitution on $0 (the entire line matched), replacing the starting "? " with nothing, making $0 just the filename. The print then prints $0 (print with no args defaults to printing $0)
So the awk output prints a list of filenames, one per line. This is fed into a read loop, which removes each.
This will preserve whitespace in filenames, but it will break if there are any filenames that contain newlines! Handling newlines gracefully is impossible with hg status as the input, since hg status prints newline-separated output

Yanking text from the previous stdout onto the command line

I'd like to set up my Bash in such a way that I could yank text from the previous command's stdout. The example use case I'll use is resolving conflicts during a git rebase.
$ git status
# Not currently on any branch.
# Unmerged paths:
# (use "git reset HEAD <file>..." to unstage)
# (use "git add/rm <file>..." as appropriate to mark resolution)
#
# both modified: app/views/report/index.html.erb
#
$ vim app/views/report/index.html.erb
# .... edit, resolve conflicts ....
$ git add <Alt+.>
The problem is that the easiest way to grab the filename for the 2nd command (vim ...) is to move my hand over to the mouse. One option is screen, but that has its own set of issues as a day-to-day shell. (Not the least of which is that I use and abuse Ctrl+A as a readline shortcut)
Where could I start at making this work for me? Ideally I'd like to be able to pull the Nth line from the stdout of the previous command somewhere that I can manipulate it as a command.
Other than using the mouse, the only way I can think of is to use grep, sed and/or awk, perhaps with tee and/or a Bash function and process substitution and/or process and/or command substitution:
vim $(git status | tee /dev/tty | grep ...)
or
var=$(git status | tee /dev/tty | grep ...)
vim "$var"
git add "$var"
The tee allows you to see the full output while capturing the modified output. Creating a function would allow you to easily pass an argument that would select a certain line:
var=$(some_func 14)
etc.
The disadvantage is that you have to do this from the start. I don't know of any way to do this after the fact without using screen or some other output logging and scripting a rummage through the log.
I don't know of a good, clean solution, but as a hack you could try the script command, which logs all input and output to a file. For GNU script:
$ script -f
Script started, file is typescript
$ ls -1
bar
baz
foo
typescript
$ echo $(tail -3 typescript | head -1)
foo
pipe the output through sed:
git status | sed -n '5p'
to get the 5th line

Resources