Reformatting text to return text before delimiter - bash

I have been cleansing the following string of file names () as follows:
#!/bin/sh
QUEUES='FILENAME1.WORLD1.J00.2D.00;FILENAME2.WORLD1.J01.2D.00;FILENAME3.WORLD1.J00.2D.00;FILENAME4.WORLD1.J01.2D.00;FILENAME5.WORLD1.J00.2D.00'
for i in $(echo $MQ_QUEUES | sed "s/;/ /g")
do
a=$(echo "$i" | tr [:upper:] '[:lower:]' | sed 's/\./_/g')
print $a
done
This currently produces a lower case version of the queue above with the full stop delimiter replaced by an underscore. But I now only want to grab the first section of the title. How would I do this?
I.e. so instead of $a returning filename1_world1_j00_2d_00, it would simply return filename1.

why not use awk? with:
echo "$QUEUES" | tr ';' '\n' | awk -F'.' '{print tolower($1)}'
you will get:
filename1
filename2
...

Related

How to get version number from string in bash

I have a variable having following format
bundle="chn-pro-X.Y-Z.el8.x86_64"
X,Y,Z are numbers having any number of digits
Ex:
1.0-2 # X=1 Y=0 Z=2
12.45-9874 # X=12 Y=45 Z=9874
How can I grab X.Y and store it in another variable?
EDIT:
I wasn't right with my wording, but
I want to store X.Y into new variable not individual X & Y's
I'm looking to finally have a variable version which has X.Y grabbed from bundle:
version="X.Y"
I would use awk:
bundle="chn-pro-12.45-9874.el8.x86_64"
echo "$bundle" | awk -F "[.-]" '{print $3,$4,$5}'
12 45 9874
Now if you want to assign to x, y, z use read and process substitution:
read -r x y z < <(echo "$bundle" | awk -F "[.-]" '{print $3,$4,$5}')
echo "x=$x, y=$y, z=$z"
x=12, y=45, z=9874
If you just want the value of X.Y as a single value this is still great use for awk:
bundle="chn-pro-12.45-9874.el8.x86_64"
echo "$bundle" | awk -F "[-]" '{print $3}'
12.45
And if you then want to put that into a variable:
x_y=$(echo "$bundle" | awk -F "[-]" '{print $3}')
echo "x_y=$x_y"
x_y=12.45
Or you can use cut in this case to get the third field:
echo "$bundle" | cut -d- -f3
12.45
Like that:
$ bundle="chn-pro-1.0-2.el8.x86_64"
$ X="$(echo "$bundle" | cut -d . -f1 | cut -d- -f3)"
$ Y="$(echo "$bundle" | cut -d . -f2 | cut -d- -f1)"
$ Z="$(echo "$bundle" | cut -d . -f2 | cut -d- -f2)"
$ echo "$X"
1
$ echo "$Y"
0
$ echo "$Z"
2
You can merge X and Y into a single variable:
$ XY="$X.$Y"
$ echo $XY
1.0
Use regex to separate numbers:
numbers=$(echo $bundle | grep -Eo '([0-9]+\.[0-9]+\-[0-9]+)' | sed 's/\./\t/g;s/\-/\t/g')
Then assign them to variables with using awk or tr or cut, whatever you want:
X=$(echo $numbers| awk '{print $1}')
Y=$(echo $numbers| awk '{print $2}')
Z=$(echo $numbers| awk '{print $3}')
EDIT
For storing x.y into single version variable you can simply ignore pervios commands:
version=$(echo $bundle | grep -Eo '([0-9]+\.[0-9]+\-[0-9]+)' | grep -Eo '([0-9]+\.[0-9]+)')
Given this input:
$ bundle="chn-pro-12.45-9874.el8.x86_64"
using GNU or BSD sed for -E:
$ foo=$(echo "$bundle" | sed -E 's/.*-([0-9]+\.[0-9]+)-[0-9].*/\1/')
$ echo "$foo"
12.45
or with any sed:
$ foo=$(echo "$bundle" | sed 's/.*-\([0-9][0-9]*\.[0-9][0-9]*\)-[0-9].*/\1/')
$ echo "$foo"
12.45
Assumptions:
the input string will always contain (at least) 3 hyphens
the desired version string will always reside between the 2nd and 3rd hyphens of the input string
we need to maintain the input string (ie, don't clobber/overwrite the variable containing the input string)
We can eliminate the subprocess calls (necessary for echo/sed/grep/awk/sed) by using some parameter expansions:
$ bundle="chn-pro-X.Y-Z.el8.x86_64"
$ temp="${bundle#*-}" # strip off 1st hyphen delimited string
$ echo "${temp}"
pro-X.Y-Z.el8.x86_64
$ temp="${temp#*-}" # strip off 2nd hyphen delimited string
$ echo "${temp}"
X.Y-Z.el8.x86_64
$ version="${temp%%-*}" # save 3rd hyphen delimited string (aka our version)
$ echo "${version}"
X.Y
NOTE: We can eliminate the temp variable by replacing all occurrences of temp with version with the understanding version does not contain what we want until after the 3rd parameter expansion has occurred, eg:
$ bundle="chn-pro-X.Y-Z.el8.x86_64"
$ version="${bundle#*-}"
$ version="${version#*-}"
$ version="${version%%-*}"
$ echo "${version}"
X.Y

what does this bash script line of code mean

I am new to shell scripting and I found following line of code in a given script.
Could someone explain me with an example what the following line of code means
Path=`echo $line | awk -F '|' '{print $1}'`
echo $line will print the value of the variable $line, the | symbol means that the output of this will be passed (or piped) to another program/command/script. I will not attempt to explain awk here, but what is done above is that the output from the echo $line is taken and processed with it.
the option -FS as per awk man page means
-F fs Use fs for the input field separator
so the string after it will be used to split the input string given to awk into different fields. Example, you variable $line has a value of a|b it will be split into two fields a and b. What is to be done with this is specified within the '{}' expression.
Again, what can be done in there is next to infinite, here the only thing that is done is to print the first field which can be accessed with $1, or a in the above example ($2 would be b as can be guessed).
Finally, the output of this whole operation is then stored in the variable Path.
to summarize:
line="a|b"
echo $line | awk -F '|' '{print $1}'
> a
Path=`echo $line | awk -F '|' '{print $1}'`
echo $Path
> a
echo $line | awk -F '|' '{print $1}'
Explanation:
echo -> display a line of text
$line -> parameter expansion read the line
| -> A pipeline is a sequence of one or more commands separated by one of the control operators |
awk -> Invoke awk program
-F '|' -> Field separator as | for the data feed
'{print $1}' -> Print the first field
Example
echo 'a|b|c' | awk -F '|' '{print $1}'
will print a
I think this is just a complicated way to express
echo ${line%%|*}
i.e. write to stdout the part of the content of the variable line which goes up to - but not including - the first vertical bar.
Path=`echo $line | awk -F '|' '{print $1}'`
^ ^ ^ ^
| | | |
| | | print 1st column
| | |
| | input field separator
| |
| echo variable line
|
variable Path
-F'|' - by default awk splits record/line/row into columns by single space, but with |, awk splits by pipe
Above one can be written as
Path=$( awk -F '|' '{ print $1 }' <<< "$line" )
Suppose say
$ line="1|2|3"
$ Path=$( awk -F '|' '{ print $1 }' <<< "$line" )
$ echo $Path; # you get first column
1
Same as
$ Path=$( cut -d'|' -f1 <<< "$line" )
$ echo $Path;
1
the default field separator is ' ', if you have -F , means change default separator to '|'

Bash - stdir words to file

I am trying to store whole user input in a bash variable (appending variable).
Then to sort them etc.
The problem is that for input f.e.:
sdsd fff sss
asdasds
It creates this output:
fff
sdsd
sssasdasds
Expected output is:
asdasds
fff
sdsd
sss
Code follows:
content=''
while read line
do
content+=$(echo "$line")
done
result=`echo "$content" | sed -r 's/[^a-zA-Z ]+/ /g' | tr '[:upper:]' '[:lower:]' | tr ' ' '\n' | sort -u | sed '/^$/d' | sed 's/[^[:alpha:]]/\n/g'`
echo "$result" >> "$dictionary"
You aren't providing a space when you are appending.
content+=$(echo "$line")
You need to make sure there is a space between the end of the old value and the new value.
content+=" $line"
(There's no need for echo for this either as #gniourf_gniourf correctly pointed out.)
Something that will achieve what you're showing in your example:
words_ary=()
while read -r -a line_ary; do
(( ${#line_ary[#]} )) || continue # skip empty lines
words_ary+=( "${line_ary[#],,}" ) # The ,, is to convert to lower-case
done
printf '%s\n' "${words_ary[#]}" | sort -u >> "$dictionary"
We're splitting input into words at spaces and put these words in array line_ary
We're checking that we have a non-empty input
we append each word, converted to lowercase, from input to the array words_ary
finally we sort each word from words_ary and append the sorted words to file $dictionary.

sort fields within a line

input:
87 6,1,9,13
3 9,4,14,35,38,13
31 3,1,6,5
(i.e. a tab-delimited column where the second field is a comma-delimited list of unordered integers.)
desired output:
87 1,6,9,13
3 4,9,13,14,35,38
31 1,3,5,6
Goal:
for each line separately, sort the comma-separated list appearing in the second field. i.e. sort the 2nd column within for each line separately.
Note: the rows should not be re-ordered.
What I've tried:
sort - Since the order of the rows should not change, then sort is simply not applicable.
awk - since the greater file is tab-delimited, not comma-delimited, it cannot parse the second column as multiple "sub-fields"
There might be a perl way? I know nothing about perl though...
It can be done by simple perl oneliner:
perl -F'/\t/' -alne'$s=join",",sort{$a<=>$b}split",",$F[1];print"$F[0]\t$s"'
and shell (bash) one as well:
while read a b;do echo -e "$a\t$(echo $b|tr , '\n'|sort -n|tr '\n' ,|sed 's/,$//')"; done
while read LINE; do
echo -e "$(echo $LINE | awk '{print $1}')\t$(echo $LINE | awk '{print $2}' | tr ',' '\n' | sort -n | paste -s -d,)";
done < input
Obviously a lot going on here so here we go:
input contains your input
$(echo $LINE | awk '{print $1}') prints the first field, pretty straightforward
$(echo $LINE | awk '{print $2}' | tr ',' '\n' | sort -n | paste -s -d,) prints the second field, but breaks it down into lines by replacing the commas by newlines (tr ',' '\n'), then sort numerically, then assemble the lines back to comma-delimited values (paste -s -d,).
$ cat input
87 6,1,9,13
3 9,4,14,35,38,13
31 3,1,6,5
$ while read LINE; do echo -e "$(echo $LINE | awk '{print $1}')\t$(echo $LINE | awk '{print $2}' | tr ',' '\n' | sort -n | paste -s -d,)"; done < input
87 1,6,9,13
3 4,9,13,14,35,38
31 1,3,5,6
Another way:
echo happybirthday|awk '{split($0,A);asort(A); for (i=1;i<length(A);i++) {print A[i]}}' FS=""|tr -d '\n';echo aabdhhipprty
I didn't know how to get back to this page after recovering login info, so am posting as a guest.

How to split a string in shell and get the last field

Suppose I have the string 1:2:3:4:5 and I want to get its last field (5 in this case). How do I do that using Bash? I tried cut, but I don't know how to specify the last field with -f.
You can use string operators:
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
Another way is to reverse before and after cut:
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef
This makes it very easy to get the last but one field, or any range of fields numbered from the end.
It's difficult to get the last field using cut, but here are some solutions in awk and perl
echo 1:2:3:4:5 | awk -F: '{print $NF}'
echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5
Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.
You could try something like this if you want to use cut:
echo "1:2:3:4:5" | cut -d ":" -f5
You can also use grep try like this :
echo " 1:2:3:4:5" | grep -o '[^:]*$'
One way:
var1="1:2:3:4:5"
var2=${var1##*:}
Another, using an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[#]: -1}
Yet another with an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[#]}
var2=${var2[$count-1]}
Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}
$ echo "a b c d e" | tr ' ' '\n' | tail -1
e
Simply translate the delimiter into a newline and choose the last entry with tail -1.
Using sed:
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5
$ echo '' | sed 's/.*://' # => (empty)
$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c
$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c
There are many good answers here, but still I want to share this one using basename :
basename $(echo "a:b:c:d:e" | tr ':' '/')
However it will fail if there are already some '/' in your string.
If slash / is your delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash commands.
If your last field is a single character, you could do this:
a="1:2:3:4:5"
echo ${a: -1}
echo ${a:(-1)}
Check string manipulation in bash.
Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo \$${#}
0
echo "a:b:c:d:e"|xargs -d : -n1|tail -1
First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.
Regex matching in sed is greedy (always goes to the last occurrence), which you can use to your advantage here:
$ foo=1:2:3:4:5
$ echo ${foo} | sed "s/.*://"
5
A solution using the read builtin:
IFS=':' read -a fields <<< "1:2:3:4:5"
echo "${fields[4]}"
Or, to make it more generic:
echo "${fields[-1]}" # prints the last item
for x in `echo $str | tr ";" "\n"`; do echo $x; done
improving from #mateusz-piotrowski and #user3133260 answer,
echo "a:b:c:d::e:: ::" | tr ':' ' ' | xargs | tr ' ' '\n' | tail -1
first, tr ':' ' ' -> replace ':' with whitespace
then, trim with xargs
after that, tr ' ' '\n' -> replace remained whitespace to newline
lastly, tail -1 -> get the last string
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'
From the pythonpy help: -x treat each row of stdin as x.
With that tool, it is easy to write python code that gets applied to the input.
Edit (Dec 2020):
Pythonpy is no longer online.
Here is an alternative:
$ echo "a:b:c:d:e" | python -c 'import sys; sys.stdout.write(sys.stdin.read().split(":")[-1])'
it contains more boilerplate code (i.e. sys.stdout.read/write) but requires only std libraries from python.

Resources