How to use awk inside git --msg-filter - bash

i am trying to clean up some git history. For instance, trim all lines in my commits messages. I need to be able to do something like:
git filter-branch -f --msg-filter 'cat | awk '{$1=$1;print}'' HEAD
this, of course, will fail because of my bad usage of Apostrophes.
It does not work either if I try to escape then or use double apostrophes.
As an example of what I need to process take this:
Add cool service to application
Related: ISSUE-3
This is the result of appending related issue identifier at end of commit and remove it from my summary line, note the space(s) at beggining of commit summary. It is mostly those commit summary what i want to trim with awk.
Can anybody help me with my limited bash skills?
Thanks in advance

In the following command :
git filter-branch -f --msg-filter 'cat | awk '{$1=$1;print}'' HEAD
the expression between the innermost single quotes is not escaped and $1 is replaced by value
git filter-branch -f --msg-filter 'cat | awk '\''{$1=$1;print}'\' HEAD
may be valid. Try also to add echo command at the beginning of the line
echo git filter-branch -f --msg-filter 'cat | awk '{$1=$1;print}'' HEAD
echo git filter-branch -f --msg-filter 'cat | awk '\''{$1=$1;print}'\' HEAD
or clearer adding printf "'%s'\n"
printf "'%s'\n" git filter-branch -f --msg-filter 'cat | awk '\''{$1=$1;print}'\' HEAD

Related

To split the output(s) of a script into two fields and insert that output from a specific row in a csv file

I am trying to split the output of the following code into two fields and insert it from the 3rd row of a csv file
#!/bin/bash
cid=`git log -n 1 --pretty=format:"%H"`
git diff-tree --no-commit-id --name-only -r $cid | xargs -I {} echo '\'{} | xargs -I {} md5sum > final.csv
Current Output comes as a single line ( need to be separated into fields)
title,Path
l34sdg232f00b434532196298ecf8427e /path/to/file
sg35s3456f00b204e98324998ecsdf3af /path/to/file
Expected Output
final.csv
title,Path
l34sdg232f00b434532196298ecf8427e,/path/to/file
sg35s3456f00b204e98324998ecsdf3af,/path/to/file
I am thinking of placing the output of the script in a third file and then reading that file line by line using awk. Not sure if that's the correct way to proceed.
Thanks in advance.
You seem to be overcomplicating things.
#!/bin/sh
cid=$(git log -n 1 --pretty=format:"%H")
git diff-tree --no-commit-id --name-only -r "$cid" |
xargs md5sum |
sed 's/ /,/' > final.csv
This simply replaces the two spaces in the md5sum output with a comma.
Because nothing here is Bash-specific, I changed the shebang to #!/bin/sh; obviously, still feel free to use Bash if you prefer.
I also switched from the obsolescent `backtick` syntax to modern $(command substitution) syntax.
If you absolutely require the CSV header on top, adding that in the sed script should be trivial. Generally, header lines are more of a nuisance than actually useful, so maybe don't.
This kind of does what you're asking:
#!/bin/bash
cid=$(git log -n 1 --pretty=format:"%H")
git diff-tree --no-commit-id --name-only -r "$cid" | while read -r path
do
md5sum "${path}"
done | awk 'BEGIN{printf "%s,%s\n", "title", "path";printf "\n"}{printf "%s,%s\n",$1,$2}' > final.csv

Transform string into Git branch name format with Bash

Lazy programmer here, I'm making a simple shell script that takes a branch name from the user input, transforms that name into proper format and creates new branch locally, then pushes it to the remote.
So the goal is to transform a string e.g. 'Mary had a little lamb' into 'mary-had-a-little-lamb', removing all characters that aren't digits or letters along the way as well as replacing all spaces, single or multiple, with -.
I have a working solution but it looks pretty ugly to me, how can I improve it?
Also, is there a way to check if the specified branch already exists locally and only proceed if it doesn't?
#!/bin/bash
currentBranch=$(git branch --show-current)
echo "Checking out from branch $currentBranch"
echo "Enter new branch name:"
read branchName
branchName=$(echo $branchName | tr -d ':-') #remove special characters
branchName=$(echo $branchName | tr -s ' ') #replace multiple spaces with one
branchName=$(echo $branchName | tr ' ' '-') #replace spaces with -
branchName=${branchName,,}
echo "Checking out new branch $branchName..."
git checkout -b $branchName
echo "Pushing new branch $branchName to the remote..."
git push origin $branchName
You can use Bash's built-in string substitution:
#!/usr/bin/env bash
# Take this for a test
branch_name='foo bar baz:: ::: qux----corge'
# Need extglob for the pattern to replace
shopt -s extglob
# Do the substition with extglob pattern
#san_branch_name="${branch_name//+([[:punct:][:space:]-])/-}"
# this is a shorter filter for valid git identifiers
san_branch_name="${branch_name//+([^[:alnum:]])/-}"
# For debug purposes
declare -p branch_name san_branch_name
Actual output:
declare -- branch_name="foo bar baz:: ::: qux----corge"
declare -- san_branch_name="foo-bar-baz-qux-corge"
I suggest you to use sed in order to sanitize you branch name by sed as follow:
sanitized_branch_name=$(echo ${branchName} | sed -E 's/\s+/\s/g' | sed -E 's/[\s:]/\-/g')
About how to check branch it is enough:
if git branch -a | grep $sanitized_branch_name 2>& 1>/dev/null; then
echo "${sanitized_branch_name} branch exists!"
fi
Edit (example output):
$ branchName="antonio petri:cca"
$ echo ${branchName} | sed -E 's/\s+/\s/g' | sed -E 's/[\s:]/\-/g'
antonio-petri-cca

sed: Argument list too long when running sed -n

I am running this command from Why is my git repository so big? on a very big git repository as https://github.com/python/cpython
git rev-list --all --objects | sed -n $(git rev-list --objects --all | cut -f1 -d' ' | git cat-file --batch-check | grep blob | sort -n -k 3 | tail -n800 | while read hash type size; do size_in_kibibytes=$(echo $size | awk '{ foo = $1 / 1024 ; print foo "KiB" }'); echo -n "-e s/$hash/$size_in_kibibytes/p "; done) | sort -n -k1;
It works fine if I replace tail -n800 by tail -n40:
1160.94KiB Lib/ensurepip/_bundled/pip-8.0.2-py2.py3-none-any.whl
1169.59KiB Lib/ensurepip/_bundled/pip-8.1.1-py2.py3-none-any.whl
1170.86KiB Lib/ensurepip/_bundled/pip-8.1.2-py2.py3-none-any.whl
1225.24KiB Lib/ensurepip/_bundled/pip-9.0.0-py2.py3-none-any.whl
...
I found this question Bash : sed -n arguments saying I could use awk instead of sed.
Do you know how do fix this sed: Argument list too long when tail is -n800 instead of -n40?
It seems you have used this anwer in the linked question: Some scripts I use:.... There is a telling comment in that answer:
This function is great, but it's unimaginably slow. It can't even finish on my computer if I remove the 40 line limit. FYI, I just added an answer with a more efficient version of this function. Check it out if you want to use this logic on a big repository, or if you want to see the sizes summed per file or per folder. – piojo Jul 28 '17 at 7:59
And luckily piojo has written another answer addressing this. Just use his code.
As an alternative, check if git sizer would work on your repository: that would help isolating what takes place in your repository.
If not, you have other commands in "How to find/identify large commits in git history?", which do loop around each objects and avoid the sed -nxx part
The alternative would be to redirect your result/command to a file, then sed on that file, as in here.

Alias in bash_profile executes by itself

I have set up an alias in ~/.bash_profile as follows:
alias lcmt="git show $(git log --oneline | awk '{print $1;}' | head -n 1)"
However, whenever I open a terminal window, I see:
fatal: Not a git repository (or any of the parent directories): .git
I have been able to narrow it down to that particular alias because when I comment it out, there's no error message. Why does it evaluate by itself on OS X? Can I prevent it from doing so?
The $(...) inside a double-quoted expression gets executed at the time of the assignment, the creation of the alias. You can avoid that by escaping the $ of the $(...). And you want to do the same thing for the $1 inside the awk command:
alias lcmt="git show \$(git log --oneline | awk '{print \$1}' | head -n 1)"
Shell functions are better than aliases in a number of ways, including that there's no quoting weirdness like there is with aliases. Defining a shell function to do this is easy:
lcmd() { git show $(git log --oneline | awk '{print $1;}' | head -n 1); }
I'd make two other recommendations, though: put double-quotes around the $( ) expression, and have awk take care of stopping after the first line:
lcmd() { git show "$(git log --oneline | awk '{print $1; exit}')"; }

Git config using shell command

I have a alias that does a short status, parses it with sed then adds the files to the 'assume-unchanged' index of git.
However, the issue seems to be a simple problem with my understanding of escaping single quotes in OS X bash.
irm = !sh -c 'git ignore $(git st --short -u | sed '\''/^ D/s/^ D//g'\'')'
This is the full line in gitconfig. I can issue the command in the shell (with sh and the quote), but I get bad git config when I try to run it via git irm
based on advice below, I have configured this a little differently. However, it still doesn't work in gitconfig. So I added this to my ~/.profile
alias irm="git ignore $(git st --short | grep '^ D' | sed 's/^ D //')"
You should be able to use double quotes, but you'll have to escape them:
irm = !sh -c 'git ignore $(git st --short -u | sed \"s/^ D//\")'
You don't need to select the line since the operation is the same as the selection. You may want to use -n and p with sed as Chris suggests in the comment if you only want to output the lines that match and exclude any others.
Also, since the pattern is anchored you don't need the global option.

Resources