Transform string into Git branch name format with Bash - bash

Lazy programmer here, I'm making a simple shell script that takes a branch name from the user input, transforms that name into proper format and creates new branch locally, then pushes it to the remote.
So the goal is to transform a string e.g. 'Mary had a little lamb' into 'mary-had-a-little-lamb', removing all characters that aren't digits or letters along the way as well as replacing all spaces, single or multiple, with -.
I have a working solution but it looks pretty ugly to me, how can I improve it?
Also, is there a way to check if the specified branch already exists locally and only proceed if it doesn't?
#!/bin/bash
currentBranch=$(git branch --show-current)
echo "Checking out from branch $currentBranch"
echo "Enter new branch name:"
read branchName
branchName=$(echo $branchName | tr -d ':-') #remove special characters
branchName=$(echo $branchName | tr -s ' ') #replace multiple spaces with one
branchName=$(echo $branchName | tr ' ' '-') #replace spaces with -
branchName=${branchName,,}
echo "Checking out new branch $branchName..."
git checkout -b $branchName
echo "Pushing new branch $branchName to the remote..."
git push origin $branchName

You can use Bash's built-in string substitution:
#!/usr/bin/env bash
# Take this for a test
branch_name='foo bar baz:: ::: qux----corge'
# Need extglob for the pattern to replace
shopt -s extglob
# Do the substition with extglob pattern
#san_branch_name="${branch_name//+([[:punct:][:space:]-])/-}"
# this is a shorter filter for valid git identifiers
san_branch_name="${branch_name//+([^[:alnum:]])/-}"
# For debug purposes
declare -p branch_name san_branch_name
Actual output:
declare -- branch_name="foo bar baz:: ::: qux----corge"
declare -- san_branch_name="foo-bar-baz-qux-corge"

I suggest you to use sed in order to sanitize you branch name by sed as follow:
sanitized_branch_name=$(echo ${branchName} | sed -E 's/\s+/\s/g' | sed -E 's/[\s:]/\-/g')
About how to check branch it is enough:
if git branch -a | grep $sanitized_branch_name 2>& 1>/dev/null; then
echo "${sanitized_branch_name} branch exists!"
fi
Edit (example output):
$ branchName="antonio petri:cca"
$ echo ${branchName} | sed -E 's/\s+/\s/g' | sed -E 's/[\s:]/\-/g'
antonio-petri-cca

Related

bash shell text manipulation: I can extract a domain from a URL, how would I extend this to also exclude ".com" or ".co.uk" etc

"get a domain from a url" is quite a common question here on this site and the answer I have used for a long time is from this question:
How to extract domain name from url?
The most popular answer has a comment from user "sakumatto" which also handles sub-domains too, it is this:
echo http://www.test.example.com:3030/index.php | sed -e "s/[^/]*\/\/\([^#]*#\)\?\([^:/]*\).*/\2/" | awk -F. '{print $(NF-1) "." $NF}'
How would I further extend this command to exclude ".com" or ".co.uk" etc???
Insight:
I am writing a bash script for an amazing feature that Termux (Terminal emulator for Android) has, "termux-url-opener" that allows one to write a script that is launched when you use the native Android "share" feature, lets say i'm in the browser, github wants me to login, I press "share", then select "Termux", Termux opens and runs the script, echos the password to my clipboard and closes, now im automatically back in the browser with my password ready to paste!
Its very simple and uses pass (password-store) with pass-clip extension, gnupg and pinentry here is what I have so far which works fine, but currently its dumb (it would need me to continue writing if/elif statements for every password I have in pass) so I would like to automate things, all I need is to cut ".com" or ".co.uk" etc.
Here is my script so far:
#!/data/data/com.termux/files/usr/bin/bash
URL="$1"
WEBSITE=$(echo "$URL" | sed -e "s/[^/]*\/\/\([^#]*#\)\?\([^:/]*\).*/\2/" | awk -F. '{print $(NF-1) "." $NF}')
if [[ $WEBSITE =~ "github" ]]; then
# website contains "github"
pass -c github
elif [[ $WEBSITE =~ "codeberg" ]]; then
# website contains "codeberg"
pass -c codeberg
else
# is another app or website, so list all passwords entries.
pass clip --fzf
fi
As my pass password entries are just website names e.g "github" or "codeberg" if I could cut the ".com" or ".co.uk" from the end then I could add something like:
PASSWORDS=$(pass ls)
Now I can check if "$1" (my shared URL) is a listed within pass ls and this stops having to write:
elif [[ $WEBSITE =~ "codeberg" ]]; then
For every single entry in pass.
Thank you! its really appreciated!
i might be missing something, but why don't you just strip the offending TLDs from the hostname?
as in:
sed \
-e "s|[^/]*//\([^#]*#\)\?\([^:/]*\).*|\2|" \
-e 's|\.$||' \
-e 's|\.com$||' \
-e 's|\.co\.[a-zA-Z]*$||' \
-e 's|.*\.\([^.]*\.[^.]*\)|\1|'
"s|[^/]*//\([^#]*#\)\?\([^:/]*\).*|\2|" - this is your original regex, but using | as delimiter rather than / (gives you less quoting)
's|\.$||' - drop any accidently trailing dot (example.com. is a valid hostname!)
's|\.com$||' - remove trailing .com
's|\.co\.[a-zA-Z]*$||' - remove trailing .co.uk, .co.nl,...
's|.*\.\([^.]*\.[^.]*\)|\1|' - remove all components from the hostname except for the last two (this is basically your awk-script)
How about doing it entirely within bash:
if [[ $WEBSITE =~ ^(.*)([.]co)[.][a-z]+$ || $WEBSITE =~ ^(.*)[.][a-z]+$ ]]
then
pass=${BASH_REMATCH[1]}
else
echo WARNING: Unexpected value for WEBSITE: $WEBSITE
pass=$WEBSITE # Fallback
fi
I used two clauses (for the .co case and for the other cases), because bash a regexp does not understand non-greedy matching (i.e. .*?).
I propose you to work around a very simple modification like this grep command add:
WEBSITE=$(echo $1 | grep -vE ".com|.uk" | sed -e "s/[^/]*\/\/\([^#]*#\)\?\([^:/]*\).*/\2/" | awk -F. '{print $(NF-1) "." $NF}')
test -z $WEBSITE && exit 1 # if empty (.com or .uk generates an empty variable)
$ cat > toto
WEBSITE=$(echo $1 | grep -vE ".com|.uk" | sed -e "s/[^/]*\/\/\([^#]*#\)\?\([^:/]*\).*/\2/" | awk -F. '{print $(NF-1) "." $NF}')
test -z $WEBSITE && exit 1
echo $WEBSITE
With an example:
$ bash toto http://www.google.fr
google.fr
$ bash toto http://www.google.com
$ bash toto http://www.google.uk
$ bash toto http://www.google.gertrude
google.gertrude
$ rm toto
$
I used .uk in my example so do not just copy/paste the line.

Grep list of files returned from git status

So I am trying to clean up a script I have that gets the list of currently staged files using git status to return the relative file path of each staged file. I have looked at the documentation for git status to see if there was a way to get it to return the absolute path of each staged file, but there doesn't seem to be an option for that and git ls files nor git diff will work for this since the specific use case is during a merge.
During the merge, using git diff returns nothing, while git status does show all of the staged files, so I am stuck using git status.
From this list I have, I want to then grep through the list of files to extract any line(s) that contain the string "Path: " and output it. Basically, I have a list of staged .yml files and I want to extract all changes to the Path property in those ymls. Heres what I have so far:
IFS=$'\n'
for file in `git status -s -uno | sed s/^..//`
do
relativePath=$(echo $file | sed 's/^[ \t]*//;s/[ \t]*$//' | tr -d '"')
startPath=`pwd`
grep "Path: " "$startPath/$relativePath"
done
unset IFS
Explanation:
git status -s -uno | sed s/^..//
I am piping the result of git status into sed to remove any extra whitespace
relativePath=$(echo $file | sed 's/^[ \t]*//;s/[ \t]*$//' | tr -d '"')
I echo the file path and pipe it into sed to remove any extra spaces that weren't removed from the initial sed call in the start of the for loop. I then pipe that into tr to remove the first and last double quotes from the string since I need to combine that relative path with my starting path in order to get the complete path.
startPath=`pwd`
grep "Path: " "$startPath/$relativePath"
Store the current working directory in a variable, then combine it with our relative path and pass that into grep
This script works and extracts the data that I need, but I feel like there is a much cleaner way I could be doing this. Is there a way I can get git status to return the full path of each staged file so I don't have to have my second $startPath variable that I combine with my $relativePath thats passed into grep?
The simplest (correct) way to do this is by using git grep combined with git ls-files. The latter is used as a selector for grep.
Recursive search of modified tracked files using a pattern:
git grep -r 'pattern' -- `git ls-files -m`
Recursive search of all tracked files using a pattern:
git grep -r "pattern" .
Note that this grep search doesn't cover untracked files. You must add them first so that git can see them.
Since you probably call git status from inside the repo, all paths will be relative to $PWD, so you can just add it in place, yes?
$: git status -s | sed "s#^[^ ]* #$PWD/#"

what does the at sign before a dollar sign #$VAR do in a SED string in a Shell script?

What does #$VAR mean in Shell? I don't get the use of # in this case.
I encountered the following shell file while working on my dotfiles repo
#!/usr/bin/env bash
KEY="$1"
VALUE="$2"
FILE="$3"
touch "$FILE"
if grep -q "$1=" "$FILE"; then
sed "s#$KEY=.*#$KEY=\"$VALUE\"#" -i "$FILE"
else
echo "export $KEY=\"$VALUE\"" >> "$FILE"
fi
and I'm struggling with understanding the sed "s#$KEY=.*#$KEY=\"$VALUE\"#" -i "$FILE" line, especially the use of #.
When using sed you must not necessarily use a / character as the delimiter for the substitute action.
Thereby, the #, or % characters are also perfectly fine options to be used instead:
echo A | sed s/A/B/
echo A | sed s#A#B#
echo A | sed s%A%B%
In the command
sed "s#$KEY=.*#$KEY=\"$VALUE\"#" -i "$FILE"
the character # is used as a delimiter in the s command of sed. The general form of the s (substitute) command is
s<delim><searchPattern><delim><replaceString><delim>[<flags>]
where the most commonly used <delim> is /, but other characters are sometimes used, especially when either <searchPattern> or <replaceString> contain (or might contain) slashes.

How to use awk inside git --msg-filter

i am trying to clean up some git history. For instance, trim all lines in my commits messages. I need to be able to do something like:
git filter-branch -f --msg-filter 'cat | awk '{$1=$1;print}'' HEAD
this, of course, will fail because of my bad usage of Apostrophes.
It does not work either if I try to escape then or use double apostrophes.
As an example of what I need to process take this:
Add cool service to application
Related: ISSUE-3
This is the result of appending related issue identifier at end of commit and remove it from my summary line, note the space(s) at beggining of commit summary. It is mostly those commit summary what i want to trim with awk.
Can anybody help me with my limited bash skills?
Thanks in advance
In the following command :
git filter-branch -f --msg-filter 'cat | awk '{$1=$1;print}'' HEAD
the expression between the innermost single quotes is not escaped and $1 is replaced by value
git filter-branch -f --msg-filter 'cat | awk '\''{$1=$1;print}'\' HEAD
may be valid. Try also to add echo command at the beginning of the line
echo git filter-branch -f --msg-filter 'cat | awk '{$1=$1;print}'' HEAD
echo git filter-branch -f --msg-filter 'cat | awk '\''{$1=$1;print}'\' HEAD
or clearer adding printf "'%s'\n"
printf "'%s'\n" git filter-branch -f --msg-filter 'cat | awk '\''{$1=$1;print}'\' HEAD

Git config using shell command

I have a alias that does a short status, parses it with sed then adds the files to the 'assume-unchanged' index of git.
However, the issue seems to be a simple problem with my understanding of escaping single quotes in OS X bash.
irm = !sh -c 'git ignore $(git st --short -u | sed '\''/^ D/s/^ D//g'\'')'
This is the full line in gitconfig. I can issue the command in the shell (with sh and the quote), but I get bad git config when I try to run it via git irm
based on advice below, I have configured this a little differently. However, it still doesn't work in gitconfig. So I added this to my ~/.profile
alias irm="git ignore $(git st --short | grep '^ D' | sed 's/^ D //')"
You should be able to use double quotes, but you'll have to escape them:
irm = !sh -c 'git ignore $(git st --short -u | sed \"s/^ D//\")'
You don't need to select the line since the operation is the same as the selection. You may want to use -n and p with sed as Chris suggests in the comment if you only want to output the lines that match and exclude any others.
Also, since the pattern is anchored you don't need the global option.

Resources