need to parse between second underscore and first hyphen of the text using sed - bash

I have an rpm file, e.g. abc_defg_hijd-3.29.0-2_el6_11h.txt.
I need to parse the words between the 2nd underscore _ and first hyphen - of the above text,
so the required output will be hijd.
I was able to parse the above with sed for the above, but it worked only for the above example and I have filenames which differ a little, hence I would like to explicitly parse between the second underscore and first hyphen.

Use this sed command (on Mac):
sed -E 's/^[^_]*_[^_]*_([^-]*)-.*$/\1/'
OR (on Linux):
sed -r 's/^[^_]*_[^_]*_([^-]*)-.*$/\1/'
Using awk:
awk -F '_' '{sub(/-.*$/, "", $3); print $3}'

$ foo='abc_defg_hijd-3.29.0-2_el6_11h.txt'
$ bar=${foo%%-*} # remove everything after the first -
$ bar=${bar#*_}; bar=${bar#*_} # remove everything before the second _
$ echo "${bar}"
hijd

grep was born to extract:
grep -oP '[^_-]*_\K[^_-]*(?=-)'
example
kent$ echo 'abc_defg_hijd-3.29.0-2_el6_11h.txt'|grep -oP '[^_-]*_\K[^_-]*(?=-)'
hijd
awk is nuclear bomb for text processing,but it can kill a fly for sure:
awk -F- 'split($1,a,"_")&&$0=a[3]'
or shorter(gawk):
awk -v FPAT="[^-_]*" '$0=$3'
example
kent$ echo 'abc_defg_hijd-3.29.0-2_el6_11h.txt'|awk -F- 'split($1,a,"_")&&$0=a[3]'
hijd
kent$ echo 'abc_defg_hijd-3.29.0-2_el6_11h.txt'|awk -v FPAT="[^-_]*" '$0=$3'
hijd

with GNU sed
echo 'abc_defg_hijd-3.29.0-2_el6_11h.txt' |
sed 's/\([^_]\+_\)\{2\}\([^-]\+\)-.*/\2/g'
hijd

windows batch:
for /f "tokens=3delims=_-" %%i in ("abc_defg_hijd-3.29.0-2_el6_11h.txt") do echo %%i
hijd

Related

How do you remove a section of of a file name after underscore including the underscore using bash? [duplicate]

How can I remove all text after a character, in this case a colon (":"), in bash? Can I remove the colon, too? I have no idea how to.
In Bash (and ksh, zsh, dash, etc.), you can use parameter expansion with % which will remove characters from the end of the string or # which will remove characters from the beginning of the string. If you use a single one of those characters, the smallest matching string will be removed. If you double the character, the longest will be removed.
$ a='hello:world'
$ b=${a%:*}
$ echo "$b"
hello
$ a='hello:world:of:tomorrow'
$ echo "${a%:*}"
hello:world:of
$ echo "${a%%:*}"
hello
$ echo "${a#*:}"
world:of:tomorrow
$ echo "${a##*:}"
tomorrow
An example might have been useful, but if I understood you correctly, this would work:
echo "Hello: world" | cut -f1 -d":"
This will convert Hello: world into Hello.
$ echo 'hello:world:again' |sed 's/:.*//'
hello
I know some solutions:
# Our mock data:
A=user:mail:password
With awk and pipe:
$ echo $A | awk -v FS=':' '{print $1}'
user
Via bash variables:
$ echo ${A%%:*}
user
With pipe and sed:
$ echo $A | sed 's#:.*##g'
user
With pipe and grep:
$ echo $A | egrep -o '^[^:]+'
user
With pipe and cut:
$ echo $A | cut -f1 -d\:
user
egrep -o '^[^:]*:'
trim off everything after the last instance of ":"
grep -o '^.*:' fileListingPathsAndFiles.txt
and if you wanted to drop that last ":"
grep -o '^.*:' file.txt | sed 's/:$//'
#kp123: you'd want to replace : with / (where the sed colon should be \/)
Let's say you have a path with a file in this format:
/dirA/dirB/dirC/filename.file
Now you only want the path which includes four "/". Type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-4 -d"/"
and your output will be
/dirA/dirB/dirC
The advantage of using cut is that you can also cut out the uppest directory as well as the file (in this example), so if you type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-3 -d"/"
your output would be
/dirA/dirB
Though you can do the same from the other side of the string, it would not make that much sense in this case as typing
$ echo "/dirA/dirB/dirC/filename.file" | cut -f2-4 -d"/"
results in
dirA/dirB/dirC
In some other cases the last case might also be helpful. Mind that there is no "/" at the beginning of the last output.

extract string between '$$' characters - $$extractabc$$

I am working on shell script and new to it. I want to extract the string between double $$ characters, for example:
input:
$$extractabc$$
output
extractabc
I used grep and sed but not working out. Any suggestions are welcome!
You could do
awk -F"$" '{print $3}' file.txt
assuming the file contained input:$$extractabc$$ output:extractabc. awk splits your data into pieces using $ as a delimiter. First item will be input:, next will be empty, next will be extractabc.
You could use sed like so to get the same info.
sed -e 's/.*$$\(.*\)$$.*/\1/' file.txt
sed looks for information between $$s and outputs that. The goal is to type something like this .*$$(.*)$$.*. It's greedy but just stay with me.
looks for .* - i.e. any character zero or more times before $$
then the string should have $$
after $$ there'll be any character zero or more times
then the string should have another $$
and some more characters to follow
between the 2 $$ is (.*). String found between $$s is given a placeholder \1
sed finds such information and publishes it
Using grep PCRE (where available) and look-around:
$ echo '$$extractabc$$' | grep -oP "(?<=\\$\\$).*(?=\\$\\$)"
extractabc
echo '$$extractabc$$' | awk '{gsub(/\$\$/,"")}1'
extractabc
Here is an other variation:
echo "$$extractabc$$" | awk -F"$$" 'NF==3 {print $2}'
It does test of there are two set of $$ and only then prints whats between $$
Does also work for input like blabla$$some_data$$moreblabla
How about remove all the $ in the input?
$ echo '$$extractabc$$' | sed 's/\$//g'
extractabc
Same with tr
$ echo '$$extractabc$$' | tr -d '$'
extractabc

How to do basename on second field in a file and replace it in line

I'm trying to get something like basename of second field in a file and replace it:
$ myfile=/var/lib/jenkins/myjob/myfile
$ sha512sum "$myfile" | tee myfile-checksum
$ cat myfile-checksum
deb32b1c7122fc750a6742765e0e54a821 /var/lib/jenkins/myjob/myfile
Desired output:
deb32b1c7122fc750a6742765e0e54a821 myfile
So people can easily do sha512sum -c myfile-checksum with no manual edits.
With sed or awk, that is how far i made it for now :)
awk -F/ '{print $NF}' myfile-checksum
sed -i "s|${value}|$(basename $value)|" myfile-checksum
Thanks.
You can set the field separators to both spaces and slashes and print the first and last fields:
awk -F" |/" '{print $1, $NF}'
With your input:
$ awk -F" |/" '{print $1, $NF}' <<< "deb32b1c7122fc750a6742765e0e54a821 /var/lib/jenkins/myjob/myfile"
deb32b1c7122fc750a6742765e0e54a821 myfile
In case your filename contain spaces, do remove everything from the first field up to the last slash, as indicated by Ed Morton:
$ awk '{hash=$1; gsub(/^.*\//,""); print hash, $0}' <<< "deb32b1c7122fc750a6742765e0e54a821 /var/lib/jenkins/myjob/myfile with spaces"
deb32b1c7122fc750a6742765e0e54a821 myfile with spaces
$ awk 'sub(".*/",$1" ")' <<< "deb32b1c7122fc750a6742765e0e54a821 /var/lib/jenkins/myjob/myfile"
deb32b1c7122fc750a6742765e0e54a821 myfile
The will work for any file name except one that contains newlines. If you have that case let us know.
sha512sum will simply use the file name you've passed to it - unchanged.
If you pass
sha512sum /path/to/file
it will give you:
123456.. /path/to/file
But if you:
pushd /path/to
sha512sum file
popd
it will give you
123456.. file
If the filename is a variable you can use parameter expansion like this:
pushd "${file%/*}"
sha256sum "${file##*/}"
popd
or even
# cd will not change the PWD of the current shell since
# the command runs in a sub shell
(cd "${file%/*}"; sha256sum "${file##*/}")
Having that $file contains the filename, ${file%/*} expands to the path without the filename and ${file##*/} expands to the filename without the path.

how to remove last comma from line in bash using "sed or awk"

Hi I want to remove last comma from a line. For example:
Input:
This,is,a,test
Desired Output:
This,is,a test
I am able to remove last comma if its also the last character of the string using below command: (However this is not I want)
echo "This,is,a,test," |sed 's/,$//'
This,is,a,test
Same command does not work if there are more characters past last comma in line.
echo "This,is,a,test" |sed 's/,$//'
This,is,a,test
I am able to achieve the results using dirty way by calling multiple commands, any alternative to achieve the same using awk or sed regex ?(This is I want)
echo "This,is,a,test" |rev |sed 's/,/ /' |rev
This,is,a test
$ echo "This,is,a,test" | sed 's/\(.*\),/\1 /'
This,is,a test
$ echo "This,is,a,test" | perl -pe 's/.*\K,/ /'
This,is,a test
In both cases, .* will match as much as possible, so only the last comma will be changed.
You can use a regex that matches not-comma, and captures that group, and then restores it in the replacement.
echo "This,is,a,test" |sed 's/,\([^,]*\)$/ \1/'
Output:
This,is,a test
All the answer are based on regex. Here is a non-regex way to remove last comma:
s='This,is,a,test'
awk 'BEGIN{FS=OFS=","} {$(NF-1)=$(NF-1) " " $NF; NF--} 1' <<< "$s"
This,is,a test
In Gnu AWK too since tagged:
$ echo This,is,a,test|awk '$0=gensub(/^(.*),/,"\\1 ","g",$0)'
This,is,a test
One way to do this is by using Bash Parameter Expansion.
$ s="This,is,a,test"
$ echo "${s%,*} ${s##*,}"
This,is,a test

shell command to truncate/cut a part of string

I have a file with the below contents. I got the command to print version number out of it. But I need to truncate the last part in the version file
file.spec:
Version: 3.12.0.2
Command used:
VERSION=($(grep -r "Version:" /path/file.spec | awk '{print ($2)}'))
echo $VERSION
Current output : 3.12.0.2
Desired output : 3.12.0
There is absolutey no need for external tools like awk, sed etc. for this simple task if your shell is POSIX-compliant (which it should be) and supports parameter expansion:
$ cat file.spec
Version: 3.12.0.2
$ version=$(<file.spec)
$ version="${version#* }"
$ version="${version%.*}"
$ echo "${version}"
3.12.0
Try this:
VERSION=($(grep -r "Version:" /path/file.spec| awk '{print ($2)}' | cut -d. -f1-3))
Cut split string with field delimiter (-d) , then you select desired field with -f param.
You could use this single awk script awk -F'[ .]' '{print $2"."$3"."$4}':
$ VERSION=$(awk -F'[ .]' '{print $2"."$3"."$4}' /path/file.spec)
$ echo $VERSION
3.12.0
Or this single grep
$ VERSION=$(grep -Po 'Version: \K\d+[.]\d+[.]\d' /path/file.spec)
$ echo $VERSION
3.12.0
But you never need grep and awk together.
if you only grep single file, -r makes no sense.
also based on the output of your command line, this grep should work:
grep -Po '(?<=Version: )(\d+\.){2}\d+' /path/file.spec
gives you:
3.12.0
the \K is also nice. worked for fixed/non-fixed length look-behind. (since PCRE 7.2). There is another answer about it. but I feel look-behind is easier to read, if fixed length.

Resources