How to print before and after dot text Unix - shell

I am trying to print only specific output from sentence like below
Before and after dot text should be printed
InputVar="ABC SDFSG XYZ.AFGAJK JKK"
Expected output :
XYZ.AFGAJK
I am using cut command not working
echo "$InputVar" | cut -d'' -f2
Any other approach ?

Here are a few suggestions. awk with RS set to a space seems easiest. YMMV
$ echo "$InputVar" | cut -d ' ' -f 3
XYZ.AFGAJK
$ echo "$InputVar" | awk '/\./' RS=' '
XYZ.AFGAJK
$ echo "$InputVar" | awk '{for(i=1;i<=NF;i++) if(match($i,"\\.")) print $i}'
XYZ.AFGAJK
$ echo "$InputVar" | sed -n 's/.* \([^ .]*[.][^ .]*\) .*/\1/p'
XYZ.AFGAJK

Using cut:
If you really want to use cut, then you could try:
echo "$InputVar" | cut -d' ' -f3
Which uses a space character as a delimiter (you originally had an empty string, which is not allowed), and extracts field 3 rather than field 2.
Using grep:
You can use grep rather than cut, to match & extract specifically what you want:
echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+'
Explanation:
The -E option is for extended regex
The -o option is for extracting the matched component only
The regex matches a literal ., surrounded by a non-empty sequence of non-space characters
Comparing the two methods:
Either of these will work with your shown example. But, suppose the input string was instead:
InputVar="ABC SDFSG XYZ.AFGAJK JKK XYZ.ABC"
The version using grep would give all the matches (a literal . with non-space characters on either side).
Using cut however, you would need to specify the specific fields you want, i.e.
$ echo "$InputVar" | cut -d' ' -f3,5
XYZ.AFGAJK
XYZ.ABC
If you instead wanted just the n-th match, using the grep approach, you could use sed to select the n-th match, e.g.
$ echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+'
XYZ.AFGAJK
XYZ.ABC
$ echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+' | sed '1q;d'
XYZ.AFGAJK
$ echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+' | sed '2q;d'
XYZ.ABC

Related

How to get version number from string in bash

I have a variable having following format
bundle="chn-pro-X.Y-Z.el8.x86_64"
X,Y,Z are numbers having any number of digits
Ex:
1.0-2 # X=1 Y=0 Z=2
12.45-9874 # X=12 Y=45 Z=9874
How can I grab X.Y and store it in another variable?
EDIT:
I wasn't right with my wording, but
I want to store X.Y into new variable not individual X & Y's
I'm looking to finally have a variable version which has X.Y grabbed from bundle:
version="X.Y"
I would use awk:
bundle="chn-pro-12.45-9874.el8.x86_64"
echo "$bundle" | awk -F "[.-]" '{print $3,$4,$5}'
12 45 9874
Now if you want to assign to x, y, z use read and process substitution:
read -r x y z < <(echo "$bundle" | awk -F "[.-]" '{print $3,$4,$5}')
echo "x=$x, y=$y, z=$z"
x=12, y=45, z=9874
If you just want the value of X.Y as a single value this is still great use for awk:
bundle="chn-pro-12.45-9874.el8.x86_64"
echo "$bundle" | awk -F "[-]" '{print $3}'
12.45
And if you then want to put that into a variable:
x_y=$(echo "$bundle" | awk -F "[-]" '{print $3}')
echo "x_y=$x_y"
x_y=12.45
Or you can use cut in this case to get the third field:
echo "$bundle" | cut -d- -f3
12.45
Like that:
$ bundle="chn-pro-1.0-2.el8.x86_64"
$ X="$(echo "$bundle" | cut -d . -f1 | cut -d- -f3)"
$ Y="$(echo "$bundle" | cut -d . -f2 | cut -d- -f1)"
$ Z="$(echo "$bundle" | cut -d . -f2 | cut -d- -f2)"
$ echo "$X"
1
$ echo "$Y"
0
$ echo "$Z"
2
You can merge X and Y into a single variable:
$ XY="$X.$Y"
$ echo $XY
1.0
Use regex to separate numbers:
numbers=$(echo $bundle | grep -Eo '([0-9]+\.[0-9]+\-[0-9]+)' | sed 's/\./\t/g;s/\-/\t/g')
Then assign them to variables with using awk or tr or cut, whatever you want:
X=$(echo $numbers| awk '{print $1}')
Y=$(echo $numbers| awk '{print $2}')
Z=$(echo $numbers| awk '{print $3}')
EDIT
For storing x.y into single version variable you can simply ignore pervios commands:
version=$(echo $bundle | grep -Eo '([0-9]+\.[0-9]+\-[0-9]+)' | grep -Eo '([0-9]+\.[0-9]+)')
Given this input:
$ bundle="chn-pro-12.45-9874.el8.x86_64"
using GNU or BSD sed for -E:
$ foo=$(echo "$bundle" | sed -E 's/.*-([0-9]+\.[0-9]+)-[0-9].*/\1/')
$ echo "$foo"
12.45
or with any sed:
$ foo=$(echo "$bundle" | sed 's/.*-\([0-9][0-9]*\.[0-9][0-9]*\)-[0-9].*/\1/')
$ echo "$foo"
12.45
Assumptions:
the input string will always contain (at least) 3 hyphens
the desired version string will always reside between the 2nd and 3rd hyphens of the input string
we need to maintain the input string (ie, don't clobber/overwrite the variable containing the input string)
We can eliminate the subprocess calls (necessary for echo/sed/grep/awk/sed) by using some parameter expansions:
$ bundle="chn-pro-X.Y-Z.el8.x86_64"
$ temp="${bundle#*-}" # strip off 1st hyphen delimited string
$ echo "${temp}"
pro-X.Y-Z.el8.x86_64
$ temp="${temp#*-}" # strip off 2nd hyphen delimited string
$ echo "${temp}"
X.Y-Z.el8.x86_64
$ version="${temp%%-*}" # save 3rd hyphen delimited string (aka our version)
$ echo "${version}"
X.Y
NOTE: We can eliminate the temp variable by replacing all occurrences of temp with version with the understanding version does not contain what we want until after the 3rd parameter expansion has occurred, eg:
$ bundle="chn-pro-X.Y-Z.el8.x86_64"
$ version="${bundle#*-}"
$ version="${version#*-}"
$ version="${version%%-*}"
$ echo "${version}"
X.Y

Count how many words in file test.txt start with “tol”?

I'm new to Linux shell. I know there are tools to do this thing, such as awk. But I'm wondering if I could do it using grep or wc or other commands? awk seems intimidating to me. Thanks.
I tried grep and wc, like this:
grep tol test.txt | wc -w
But grep will give me the whole line.
If I tried the following:
grep '^tol$*' test.txt | wc -w
It only counts the line begins with mol.
How can I grep the words starting with tol?
Something like that:
grep -o '\<tol[[:alpha:]]*\>' test.txt | wc -w
< - for beginning of the word,
> - the end of the word.
[[:alpha:]] - to avoid match of combinations like tol123 (You said you need only words).
-o - to show only matches, not the entire line.
You can do the same fairly simply with awk, e.g.
awk '{for(i=1;i<=NF;i++) $i~/^tol/ && n++} END {print n}'
Example
$ echo -e "tolerance topaz tolstoy\nbats toluene toledo" |
> awk '{for(i=1;i<=NF;i++) $i~/^tol/ && n++} END {print n}'
4
Another option is to translate all whitespace characters into linefeeds so that each word starts on a new line, then grep can count them itself:
echo -e "tolerance topaz\ttolstoy\nbats toluene toledo" | tr '[:space:]' '\n' | grep -c "^tol"
4
Or, if using a file called words.txt:
tr '[:space:]' '\n' < words.txt | grep -c "^tol"

Count the number of whitespaces in a file

File test
musically us
challenged a goat that day
spartacus was his name
ba ba ba blacksheep
grep -oic "[\s]*" test
grep -oic "[ ]*" test
grep -oic "[\t]*" test
grep -oic "[\n]*" test
All give me 4, when I expect 11
grep --version -> grep (BSD grep) 2.5.1-FreeBSD
Running this on OSX Sierra 10.12
Repeating spaces should not be counted as one space.
If you are open to tricks and alternatives you might like this one:
$ awk '{print --NF}' <(tr -d '\n' <file)
11
Above solution will count "whitespace" between words. As a result for a string of 'fifteen--> <--spaces' awk will measure 1, like grep.
If you need to count actual single spaces you can use this :
$ awk -F"[ ]" '{print --NF}' <<<"fifteen--> <--spaces"
15
$ awk -F"[ ]" '{print --NF}' <<<" 2 4 6 8 10"
10
$ awk -F"[ ]" '{print --NF}' <(tr -d '\n' <file)
11
One step forward, to count single spaces and tabs:
$ awk -F"[ ]|\t" '{print --NF}' <(echo -e " 2 4 6 8 10\t12 14")
13
tr is generally better for this (in most cases):
tr -d -C ' ' <file | wc -c
The grep solution relies on the fact that the output of grep -o is newline-separated — it will fail miserably for example in the following type of circumstance where there might be multiple spaces:
v='fifteen--> <--spaces'
echo "$v" | grep -o -E ' +' | wc -l
echo "$v" | tr -d -C ' ' | wc -c
grep only returns 1, when it should be 15.
EDIT: If you wanted to count multiple characters (eg. TAB and SPACE) you could use:
tr -dC $'[ \t]' <<< $'one \t' | wc -c
Just use awk:
$ awk -v RS=' ' 'END{print NR-1}' file
11
or if you want to handle empty files gracefully:
$ awk -v RS=' ' 'END{print NR - (NR?1:0)}' /dev/null
0
The -c option counts the number of lines that match, not individual matches. Use grep -o and then pipe to wc -l, which will count the number of lines.
grep -o ' ' test | wc -l

Retrieve an exact word- Unix

I have a file which contains the same headings for different information. I want to extract the information for one of them. How to do it?
Actually, I want to extract number 234874 from /membership_number="ID:234874 for the person named sarah, but not them same ID from John. Actually, the number can be anything, I just want to extract the number with the condition that I don't know the exact number to use: grep '234874'
Try this:
grep -v '^$' <filename> | awk '/Information \/Name="Sarah"/ {getline; getline; print $1}' | cut -d':' -f2 | tr -d '"'
Here:
grep -v '^$' <filename>: This removes the blank lines.
awk '/Information \/Name="Sarah"/ {getline; getline; print $1}': This finds the name and gets the membership line.
cut -d':' -f2 | tr -d '"': This fetches the exact number.
Something like
grep -E "Name=\"Sarah\"" inputfile | grep -Eo "membership_number=\"[^\"]*" | cut -d: -f2
or put things together with
sed -n 's/.*Name="Sarah".*membership_number="ID:\([^"]*\).*/\1/p' inputfile

how to parse a string in Shell script

I want to parse the following string in shell script.
VERSION=2.6.32.54-0.11.def
Here I want to get two value.
first = 263254
second = 11
I am using following to get the first value:
first=`expr substr $VERSION 1 9| sed "s/\.//g" |sed "s/\-//g"`
to get the second:
second=`expr substr $VERSION 10 6| sed "s/\.//g" |sed "s/\-//g"`
Using above code the output is:
first=263254
second=11
The result wont be consistent if version is changed to:
VERSION=2.6.32.54-0.1.def
Here second value will become 1d, but I want it give output of 1 only.
How can I directly parse the number after '-' and before '.d'?
$ first=$(echo $VERSION | cut -d- -f1 | sed 's/\.//g')
$ second=$(echo $VERSION | cut -d- -f2 | cut -d. -f2)
$ first=$(echo $VERSION | cut -d- -f1 | tr -d '.')
$ second=$(echo $VERSION | cut -d- -f2 | cut -d. -f2)
$ echo $first
263254
$ echo $second
11
you don't need multiple processes (sed|sed|sed...). single process with awk should work.
if you have VERSION=xxxx as string:
to get the first:
awk -F'[-=]' '{gsub(/\./,"",$2)}$0=$2'
to get the second:
awk -F'-|\\.def' '{split($2,a,".")}$0=a[2]'
test:
first:
kent$ echo "VERSION=2.6.32.54-0.1.def"|awk -F'[-=]' '{gsub(/\./,"",$2)}$0=$2'
263254
second
kent$ echo "VERSION=2.6.32.54-0.1.def"|awk -F'-|\\.def' '{split($2,a,".")}$0=a[2]'
1
kent$ echo "VERSION=2.6.32.54-0.1234.def"|awk -F'-|\\.def' '{split($2,a,".")}$0=a[2]'
1234
if you have VERSION=xxx as variable $VERSION:
first:
awk -F'-' '{gsub(/\./,"",$1)}$0=$1'
second:
awk -F'-|\\.def' '{split($2,a,".")}$0=a[2]'
test:
VERSION=2.6.32.54-0.1234.def
kent$ echo $VERSION|awk -F'-' '{gsub(/\./,"",$1)}$0=$1'
263254
7pLaptop 11:18:22 /tmp/test
kent$ echo $VERSION|awk -F'-|\\.def' '{split($2,a,".")}$0=a[2]'
1234
You should use regular expressions instead of the number of characters.
first=`sed 's/.//g' | sed 's/\(.*\)-.*/\1/'`
second=`sed 's/.//g' | sed 's/.*-\([0-9]*\).*/\1/'`
\(...\) are used to create a capturing group, and \1 output this group.
first=$(echo ${VERSION} | sed -e 's/^\([^-]*\)-0\.\([0-9]*\)\.def/\1/' -e 's/\.//g')
second=$(echo ${VERSION} | sed -e 's/^\([^-]*\)-0\.\([0-9]*\)\.def/\2/' -e 's/\.//g')
$ first=$(echo $VERSION | awk -F"\." '{gsub(/-.*/,"",$4);print $1$2$3$4}')
$ second=$(echo $VERSION | awk -F"\." '{print $5}' )

Resources