bash shell text manipulation: I can extract a domain from a URL, how would I extend this to also exclude ".com" or ".co.uk" etc - bash

"get a domain from a url" is quite a common question here on this site and the answer I have used for a long time is from this question:
How to extract domain name from url?
The most popular answer has a comment from user "sakumatto" which also handles sub-domains too, it is this:
echo http://www.test.example.com:3030/index.php | sed -e "s/[^/]*\/\/\([^#]*#\)\?\([^:/]*\).*/\2/" | awk -F. '{print $(NF-1) "." $NF}'
How would I further extend this command to exclude ".com" or ".co.uk" etc???
Insight:
I am writing a bash script for an amazing feature that Termux (Terminal emulator for Android) has, "termux-url-opener" that allows one to write a script that is launched when you use the native Android "share" feature, lets say i'm in the browser, github wants me to login, I press "share", then select "Termux", Termux opens and runs the script, echos the password to my clipboard and closes, now im automatically back in the browser with my password ready to paste!
Its very simple and uses pass (password-store) with pass-clip extension, gnupg and pinentry here is what I have so far which works fine, but currently its dumb (it would need me to continue writing if/elif statements for every password I have in pass) so I would like to automate things, all I need is to cut ".com" or ".co.uk" etc.
Here is my script so far:
#!/data/data/com.termux/files/usr/bin/bash
URL="$1"
WEBSITE=$(echo "$URL" | sed -e "s/[^/]*\/\/\([^#]*#\)\?\([^:/]*\).*/\2/" | awk -F. '{print $(NF-1) "." $NF}')
if [[ $WEBSITE =~ "github" ]]; then
# website contains "github"
pass -c github
elif [[ $WEBSITE =~ "codeberg" ]]; then
# website contains "codeberg"
pass -c codeberg
else
# is another app or website, so list all passwords entries.
pass clip --fzf
fi
As my pass password entries are just website names e.g "github" or "codeberg" if I could cut the ".com" or ".co.uk" from the end then I could add something like:
PASSWORDS=$(pass ls)
Now I can check if "$1" (my shared URL) is a listed within pass ls and this stops having to write:
elif [[ $WEBSITE =~ "codeberg" ]]; then
For every single entry in pass.
Thank you! its really appreciated!

i might be missing something, but why don't you just strip the offending TLDs from the hostname?
as in:
sed \
-e "s|[^/]*//\([^#]*#\)\?\([^:/]*\).*|\2|" \
-e 's|\.$||' \
-e 's|\.com$||' \
-e 's|\.co\.[a-zA-Z]*$||' \
-e 's|.*\.\([^.]*\.[^.]*\)|\1|'
"s|[^/]*//\([^#]*#\)\?\([^:/]*\).*|\2|" - this is your original regex, but using | as delimiter rather than / (gives you less quoting)
's|\.$||' - drop any accidently trailing dot (example.com. is a valid hostname!)
's|\.com$||' - remove trailing .com
's|\.co\.[a-zA-Z]*$||' - remove trailing .co.uk, .co.nl,...
's|.*\.\([^.]*\.[^.]*\)|\1|' - remove all components from the hostname except for the last two (this is basically your awk-script)

How about doing it entirely within bash:
if [[ $WEBSITE =~ ^(.*)([.]co)[.][a-z]+$ || $WEBSITE =~ ^(.*)[.][a-z]+$ ]]
then
pass=${BASH_REMATCH[1]}
else
echo WARNING: Unexpected value for WEBSITE: $WEBSITE
pass=$WEBSITE # Fallback
fi
I used two clauses (for the .co case and for the other cases), because bash a regexp does not understand non-greedy matching (i.e. .*?).

I propose you to work around a very simple modification like this grep command add:
WEBSITE=$(echo $1 | grep -vE ".com|.uk" | sed -e "s/[^/]*\/\/\([^#]*#\)\?\([^:/]*\).*/\2/" | awk -F. '{print $(NF-1) "." $NF}')
test -z $WEBSITE && exit 1 # if empty (.com or .uk generates an empty variable)
$ cat > toto
WEBSITE=$(echo $1 | grep -vE ".com|.uk" | sed -e "s/[^/]*\/\/\([^#]*#\)\?\([^:/]*\).*/\2/" | awk -F. '{print $(NF-1) "." $NF}')
test -z $WEBSITE && exit 1
echo $WEBSITE
With an example:
$ bash toto http://www.google.fr
google.fr
$ bash toto http://www.google.com
$ bash toto http://www.google.uk
$ bash toto http://www.google.gertrude
google.gertrude
$ rm toto
$
I used .uk in my example so do not just copy/paste the line.

Related

Script does not work with ` but works in single command

In my bash, the whole script won't work... When I use `
My script is
#!/bin/bash
yesterday=$(date --date "$c days ago" +%F)
while IFS= read -r line
do
dir=$(echo $line | awk -F, '{print $1 }')
country=$(echo $line | awk -F, '{print $2 }')
cd path/$dir
cat `ls -v | grep email.csv` > e.csv
done < "s.csv"
Above output is blank.
If i use ""
output is No such file or directory
but if I use only 1 line on the terminal it works
cat `ls -v | grep email.csv` > e.csv
I also try with / , but didnt work either...
You should generally avoid ls in scripts.
Also, you should generally prefer the modern POSIX $(command substitution) syntax like you already do in several other places in your script; the obsolescent backtick `command substitution` syntax is clunky and somewhat more error-prone.
If this works in the current directory but fails in others, it means that you have a file matching the regex in the current directory, but not in the other directory.
Anyway, the idiomatic way to do what you appear to be attempting is simply
cat *email?csv* >e.csv
If you meant to match a literal dot, that's \. in a regular expression. The ? is a literal interpretation of what your grep actually did; but in the following, I will assume you actually meant to match *email.csv* (or in fact probably even *email.csv without a trailing wildcard).
If you want to check if there are any files, and avoid creating e.csv if not, that's slightly tricky; maybe try
for file in *email.csv*; do
test -e "$file" || break
cat *email.csv* >e.csv
break
done
Alternatively, look into the nullglob feature of Bash. See also Check if a file exists with wildcard in shell script.
On the other hand, if you just want to check whether email.csv exists, without a wildcard, that's easy:
if [ -e email.csv ]; then
cat email.csv >e.csv
fi
In fact, that can even be abbreviated down to
test -e email.csv && cat email.csv >e.csv
As an aside, read can perfectly well split a line into tokens.
#!/bin/bash
yesterday=$(date --date "$c days ago" +%F)
while IFS=, read -r dir country _
do
cd "path/$dir" # notice proper quoting, too
cat *email.csv* > e.csv
# probably don't forget to cd back
cd ../..
done < "s.csv"
If this is in fact all your script does, probably do away with the silly and slightly error-prone cd;
while IFS=, read -r dir country _
do
cat "path/$dir/"*email.csv* > "path/$dir/e.csv"
done < "s.csv"
See also When to wrap quotes around a shell variable.

grep: compare string from file with another string

I have a list of files paths that I need to compare with a string:
git_root_path=$(git rev-parse --show-toplevel)
list_of_files=.git/ForGeneratingSBConfigAlert.txt
cd $git_root_path
echo "These files needs new OSB config:"
while read -r line
do
modfied="$line"
echo "File for compare: $modfied"
if grep -qf $list_of_files" $modfied"; then
echo "Found: $modfied"
fi
done < <(git status -s | grep -v " M" | awk '{if ($1 == "M") print $2}')
$modified - is a string variable that stores path to file
Pattern file example:
SVCS/resources/
SVCS/bus/projects/busCallout/
SVCS/bus/projects/busconverter/
SVCS/bus/projects/Resources/ (ignore .jar)
SVCS/bus/projects/Teema/
SVCS/common/
SVCS/domain/
SVCS/techutil/src/
SVCS/tech/mds/src/java/fi/vr/h/service/tech/mds/exception/
SVCS/tech/mds/src/java/fi/vr/h/service/tech/mds/interfaces/
SVCS/app/cashmgmt/src/java/fi/vr/h/service/app/cashmgmt/exception/
SVCS/app/cashmgmt/src/java/fi/vr/h/service/app/cashmgmt/interfaces/
SVCS/app/customer/src/java/fi/vr/h/service/app/customer/exception/
SVCS/app/customer/src/java/fi/vr/h/service/app/customer/interfaces/
SVCS/app/etravel/src/java/fi/vr/h/service/app/etravel/exception/
SVCS/app/etravel/src/java/fi/vr/h/service/app/etravel/interfaces/
SVCS/app/hermes/src/java/fi/vr/h/service/app/hermes/exception/
SVCS/app/hermes/src/java/fi/vr/h/service/app/hermes/interfaces/
SVCS/app/journey/src/java/fi/vr/h/service/app/journey/exception/
SVCS/app/journey/src/java/fi/vr/h/service/app/journey/interfaces/
SVCS/app/offline/src/java/fi/vr/h/service/app/offline/exception/
SVCS/app/offline/src/java/fi/vr/h/service/app/offline/interfaces/
SVCS/app/order/src/java/fi/vr/h/service/app/order/exception/
SVCS/app/order/src/java/fi/vr/h/service/app/order/interfaces/
SVCS/app/payment/src/java/fi/vr/h/service/app/payment/exception/
SVCS/app/payment/src/java/fi/vr/h/service/app/payment/interfaces/
SVCS/app/price/src/java/fi/vr/h/service/app/price/exception/
SVCS/app/price/src/java/fi/vr/h/service/app/price/interfaces/
SVCS/app/product/src/java/fi/vr/h/service/app/product/exception/
SVCS/app/product/src/java/fi/vr/h/service/app/product/interfaces/
SVCS/app/railcar/src/java/fi/vr/h/service/app/railcar/exception/
SVCS/app/railcar/src/java/fi/vr/h/service/app/railcar/interfaces/
SVCS/app/reservation/src/java/fi/vr/h/service/app/reservation/exception/
SVCS/app/reservation/src/java/fi/vr/h/service/app/reservation/interfaces/
kraken_test.txt
namaker_test.txt
shmaker_test.txt
I need to compare file search pattern with a string, is it possible using grep?
I'm not sure I understand the overall logic, but a few immediate suggestions come to mind.
You can avoid grep | awk in the vast majority of cases.
A while loop with a grep on a line at a time inside the loop is an antipattern. You probably just want to run one grep on the whole input.
Your question would still benefit from an explanation of what you are actually trying to accomplish.
cd "$(git rev-parse --show-toplevel)"
git status -s | awk '!/ M/ && $1 == "M" { print $2 }' |
grep -Fxf .git/ForGeneratingSBConfigAlert.txt
I was trying to think of a way to add back your human-readable babble, but on second thought, this program is probably better without it.
The -x option to grep might be wrong, depending on what you are really hoping to accomplish.
This should work:
git status -s | grep -v " M" | awk '{if ($1 == "M") print $2}' | \
grep --file=.git/ForGeneratingSBConfigAlert.txt --fixed-strings --line-regexp
Piping the awk output directly to grep avoids the while loop entirely. In most cases you'll find you don't really need to print debug messages and the like in it.
--file takes a file with one pattern to match per line.
--fixed-strings avoids treating any characters in the patterns as special.
--line-regexp anchors the patterns so that they only match if a full line of input matches one of the patterns.
All that said, could you clarify what you are actually trying to accomplish?

Regex multiline output variable in if clause

Consider the following on a debian based system:
VAR=$(dpkg --get-selections | awk '{print $1}' | grep linux-image)
This will print a list of installed packages with the string "linux-image" in them on my system this output looks like:
linux-image-3.11.0-17-generic
linux-image-extra-3.11.0-17-generic
linux-image-generic
Now as we all know
echo $VAR
results in
linux-image-3.11.0-17-generic linux-image-extra-3.11.0-17-generic linux-image-generic
and
echo "$VAR"
results in
linux-image-3.11.0-17-generic
linux-image-extra-3.11.0-17-generic
linux-image-generic
I do not want to use external commands in a if clause, it seems rather dirty and not very elegant, so I wanted to use bash built in regex matching:
if [[ "$VAR" =~ ^linux-image-g ]]; then
echo "yes"
fi
however that does not work, since it does not seem to consider multiple lines here. How can I match beginnings of lines in a variable?
There's nothing wrong with using an external command as part of the if statement; I would skip the VAR variable altogether and use
if dpkg --get-selections | awk '{print $1}' | grep -q linux-image;
The -q option to grep suppresses its output, and the if statement uses the exit status of grep directly. You could also drop the grep and test $1 directly in the awk script:
if dpkg --get-selections | awk '$1 =~ "^linux-image" { exit 0; } END {exit 1}'; then
or you can skip awk, since there doesn't seem to be a real need to drop the other fields before calling grep:
if dpkg --get-selections | grep -q '^linux-image'; then

how to print user1 from user1#10.129.12.121 using shell scripting or sed

I wanted to print the name from the entire address by shell scripting. So user1#12.12.23.234 should give output "user1" and similarly 11234#12.123.12.23 should give output 11234
Reading from the terminal:
$ IFS=# read user host && echo "$user"
<user1#12.12.23.234>
user1
Reading from a variable:
$ address='user1#12.12.23.234'
$ cut -d# -f1 <<< "$address"
user1
$ sed 's/#.*//' <<< "$address"
user1
$ awk -F# '{print $1}' <<< "$address"
user1
Using bash in place editing:
EMAIL='user#server.com'
echo "${EMAIL%#*}
This is a Bash built-in, so it might not be very portable (it won't run with sh if it's not linked to /bin/bash for example), but it is probably faster since it doesn't fork a process to handle the editing.
Using sed:
echo "$EMAIL" | sed -e 's/#.*//'
This tells sed to replace the # character and as many characters that it can find after it up to the end of line with nothing, ie. removing everything after the #.
This option is probably better if you have multiple emails stored in a file, then you can do something like
sed -e 's/#.*//' emails.txt > users.txt
Hope this helps =)
I tend to use expr for this kind of thing:
address='user1#12.12.23.234'
expr "$address" : '\([^#]*\)'
This is a use of expr for its pattern matching and extraction abilities. Translated, the above says: Please print out the longest prefix of $address that doesn't contain an #.
The expr tool is covered by Posix, so this should be pretty portable.
As a note, some historical versions of expr will interpret an argument with a leading - as an option. If you care about guarding against that, you can add an extra letter to the beginning of the string, and just avoid matching it, like so:
expr "x$address" : 'x\([^#]*\)'

Bash grep variable from multiple variables on a single line

I am using GNU bash, version 4.2.20(1)-release (x86_64-pc-linux-gnu). I have a music file list I dumped into a variable: $pltemp.
Example:
/Music/New/2010s/2011;Ziggy Marley;Reggae In My Head
I wish to grep the 3rd field above, in the Master-Music-List.txt, then continue another grep for the 2nd field. If both matched, print else echo "Not Matched".
So the above will search for the Song Title (Reggae In My Head), then will make sure it has the artist "Shaggy" on the same line, for a success.
So far, success for a non-variable grep;
$ grep -i -w -E 'shaggy.*angel' Master-Music-MM-Playlist.m3u
$ if ! grep Shaggy Master-Music-MM-Playlist.m3u ; then echo "Not Found"; fi
$ grep -i -w Angel Master-Music-MM-Playlist.m3u | grep -i -w shaggy
I'm not sure how to best construct the 'entire' list to process.
I want to do this on a single line.
I used this to dump the list into the variable $pltemp...
Original: \Music\New\2010s\2011\Ziggy Marley - Reggae In My Head.mp3
$ pltemp="$(cat Reggae.m3u | sed -e 's/\(.*\)\\/\1;/' -e 's/\(.*\)\ -\ /\1;/' -e 's/\\/\//g' -e 's/\\/\//g' -e 's/.mp3//')"
If you realy want to "grep this, then grep that", you need something more complex than grep by itself. How about awk?
awk -F';' '$3~/title/ && $2~/artist/ {print;n=1;exit;} END {if(n=0)print "Not matched";}'
If you want to make this search accessible as a script, the same thing simply changes form. For example:
#!/bin/sh
awk -F';' -vartist="$1" -vtitle="$2" '$3~title && $2~artist {print;n=1;exit;} END {if(n=0)print "Not matched";}'
Write this to a file, make it executable, and pipe stuff to it, with the artist substring/regex you're looking for as the first command line option, and the title substring/regex as the second.
On the other hand, what you're looking for might just be a slightly more complex regular expression. Let's wrap it in bash for you:
if ! echo "$pltemp" | egrep '^[^;]+;[^;]*artist[^;]*;.*title'; then
echo "Not matched"
fi
You can compress this to a single line if you like. Or make it a stand-along shell script, or make it a function in your .bashrc file.
awk -F ';' -v title="$title" -v artist="$artist" '$3 ~ title && $2 ~ artist'
Well, none of the above worked, so I came up with this...
for i in *.m3u; do
cat "$i" | sed 's/.*\\//' | while read z; do
grep --color=never -i -w -m 1 "$z" Master-Music-Playlist.m3u \
| echo "#NotFound;"$z" "
done > "$i"-MM-Final.txt;
done
Each line is read (\Music\Lady Gaga - Paparazzi.mp3), the path is stripped, the song is searched in the Master Music List, if not found, it echos "Not Found", saved into a new playlist.
Works {Solved}
Thanks anyway.

Resources