Get substring after a special character [duplicate] - bash

This question already has answers here:
Extract filename and extension in Bash
(38 answers)
Closed 7 months ago.
I have many strings that look like the following:
word1.word2.word3.xyz
word1.word2.word3.word4.abc
word1.word2.mno
word1.word2.word3.pqr
Using bash, I would like to just get the string after the last '.'(dot) character.
So the output I want:
xyz
abc
mno
pqr
Is there any way to do this?

AWK will do it. I'm using GNU AWK:
$ awk -F '.' '{print $NF}' <<EOF
word1.word2.word3.xyz
word1.word2.word3.word4.abc
word1.word2.mno
word1.word2.word3.pqr
EOF
xyz
abc
mno
pqr
AWK splits lines into fields and we use -F to set the field separator to .. Fields are indexed from 1, so $1 would get the first one (e.g. word1 in the first line) and we can use the variable $NF (for "number of fields") to get the value of the last field in each line.
https://www.grymoire.com/Unix/Awk.html is a great tutorial on AWK.
You can then just use a for loop to iterate over each of the resulting lines:
$ lines=$(awk -F '.' '{print $NF}' <<EOF
word1.word2.word3.xyz
word1.word2.word3.word4.abc
word1.word2.mno
word1.word2.word3.pqr
EOF
)
$ for line in $lines; do echo $line; done
xyz
abc
mno
pqr
I'm using command substitution here - see the Advanced Bash Scripting Guide for information on loops, command substitution and other useful things.

One simple solution would be to split the string on . and then get the last item from the splitted array
lines=(word1.word2.word3.xyz word1.word2.word3.xyz word1.word2.word3.word4.abc word1.word2.mno word1.word2.word3.pqr abcdef 'a * b')
for line in "${lines[#]}"
do
line_split=(${line//./ })
echo "${line_split[-1]}"
done
Another clean shell-checked way would be (the idea is the same)
lines=(word1.word2.word3.xyz word1.word2.word3.xyz word1.word2.word3.word4.abc word1.word2.mno word1.word2.word3.pqr abcdef)
for line in "${lines[#]}"; do
if [[ $line == *.* ]]; then # check if line contains dot character
IFS=. read -r -a split_array <<<"$line" # one-line solution
echo "${split_array[-1]}" # shows the results
else
echo "No dot in string: $line"
fi
done

This is a one-liner solution (after array assignment), without using an explicit loop (but using printf's implicit loop).
arr=( 'word1.word2.word3.xyz'
'word1.word2.word3.word4.abc'
'word1.word2.mno'
'word1.word2.word3.pqr' )
printf '%s\n' "${arr[#]##*.}"

Related

How to parse multiple line output as separate variables

I'm relatively new to bash scripting and I would like someone to explain this properly, thank you. Here is my code:
#! /bin/bash
echo "first arg: $1"
echo "first arg: $2"
var="$( grep -rnw $1 -e $2 | cut -d ":" -f1 )"
var2=$( grep -rnw $1 -e $2 | cut -d ":" -f1 | awk '{print substr($0,length,1)}')
echo "$var"
echo "$var2"
The problem I have is with the output, the script I'm trying to write is a c++ function searcher, so upon launching my script I have 2 arguments, one for the directory and the second one as the function name. This is how my output looks like:
first arg: Projekt
first arg: iseven
Projekt/AX/include/ax.h
Projekt/AX/src/ax.cpp
h
p
Now my question is: how do can I save the line by line output as a variable, so that later on I can use var as a path, or to use var2 as a character to compare. My plan was to use IF() statements to determine the type, idea: IF(last_char == p){echo:"something"}What I've tried was this question: Capturing multiple line output into a Bash variable and then giving it an array. So my code looked like: "${var[0]}". Please explain how can I use my line output later on, as variables.
I'd use readarray to populate an array variable just in case there's spaces in your command's output that shouldn't be used as field separators that would end up messing up foo=( ... ). And you can use shell parameter expansion substring syntax to get the last character of a variable; no need for that awk bit in your var2:
#!/usr/bin/env bash
readarray -t lines < <(printf "%s\n" "Projekt/AX/include/ax.h" "Projekt/AX/src/ax.cpp")
for line in "${lines[#]}"; do
printf "%s\n%s\n" "$line" "${line: -1}" # Note the space before the -1
done
will display
Projekt/AX/include/ax.h
h
Projekt/AX/src/ax.cpp
p

Read each line of a column of a file and execute grep

I have file.txt exemplary here:
This line contains ABC
This line contains DEF
This line contains GHI
and here the following list.txt:
contains ABC<TAB>ABC
contains DEF<TAB>DEF
Now I am writing a script that executes the following commands for each line of this external file list.txt:
take the string from column 1 of list.txt and search in a third file file.txt
if the first command is positive, return the string from column 2 of list.txt
So my output.txt is:
ABC
DEF
This is my code for grep/echo with putting the query/return strings manually:
if grep -i -q 'contains abc' file.txt
then
echo ABC >output.txt
else
echo -n
fi
if grep -i -q 'contains def' file.txt
then
echo DEF >>output.txt
else
echo -n
fi
I have about 100 search terms, which makes the task laborious if done manually. So how do I include while read line; do [commands]; done<list.txt together with the commands about column1 and column2 inside that script?
I would like to use simple grep/echo/awkcommands if possible.
Something like this?
$ awk -F'\t' 'FNR==NR { a[$1] = $2; next } {for (x in a) if (index($0, x)) {print a[x]}} ' list.txt file.txt
ABC
DEF
For the lines of the first file (FNR==NR), read the key-value pairs to array a. Then for the lines of the second line, loop through the array, check if the key is found on the line, and if so, print the stored value. index($0, x) tries to find the contents of x from (the current line) $0. $0 ~ x would instead take x as a regex to match with.
If you want to do it in the shell, starting a separate grep for each and every line of list.txt, something like this:
while IFS=$'\t' read k v ; do
grep -qFe "$k" file.txt && echo "$v"
done < list.txt
read k v reads a line of input and splits it (based on IFS) into k and v.
grep -F takes the pattern as a fixed string, not a regex, and -q prevents it from outputting the matching line. grep returns true if any matching lines are found, so $v is printed if $k is found in file.txt.
Using awk and grep:
for text in `awk '{print $4}' file.txt `
do
grep "contains $text" list.txt |awk -F $'\t' '{print $2}'
done

How to read lines in bash and delimit them by a specified delimiter? [duplicate]

This question already has answers here:
How do I split a string on a delimiter in Bash?
(37 answers)
Closed 7 years ago.
I need to write a script with the following behaviour:
$ echo $'one&some text\ntwo&other text' | ./my_script.sh --delimiter &
Line:
1st: one
2nd: some tex
Line:
1st: two
2nd: other text
Which can be also called with the default delimiter which is \t:
$ echo $'one\tsome text\nfive\tother text' | ./my_script.sh
Output should be the same as above.
Script should take input via standard in.
What is the easiest way to do this? Possibly in pure bash.
I've tried this approach but it does not work and I don't know why:
while read -r line
do
echo "$line"
IFS=$DELIMITER
arr=(${line//$DELIMITER/ })
echo ${arr[0]}
echo ${arr[1]}
done
You can do it in bash without using external programs.
$ cat script.sh
#!/bin/bash
if [ "$1" = "--delimiter" ]
then
d=$2
else
d=$'\t'
fi
while IFS="$d" read -r first rest; do
echo "1st: $first"
echo "2nd: $rest"
done
$ echo $'one\tsome text\nfive\tother text' | ./script.sh
1st: one
2nd: some text
1st: five
2nd: other text
$ echo $'one&some text\nfive&other text' | ./script.sh --delimiter \&
1st: one
2nd: some text
1st: five
2nd: other text
Note that the ampersand symbol must be escaped (or quoted) otherwise it will execute the command in the background.
awk to the rescue...
echo -e "one&some text\ntwo&other text" | awk
`BEGIN {
n=spit("st,nd,rd,th",s,",")
}
{ print "Line: ";
c=split($0,r,"&");
for(i=1;i<=c;i++)
print i s[(i%10)%n] ": " r[i]
}
will give
Line:
1st: one
2nd: some text
Line:
1st: two
2nd: other text
Note that this simple suffix lookup will breakdown for 11-13

output of oddlines in sed not appearing on separate lines

I have the following file:
>A6NGG8_201_I_F
line2
>B1AK53_719_S_R
line4
>B1AK53_744_D_N
line5
>B7U540_205_R_H
line6
>B7U540_354_T_M
line7
where I want to print out all odd lines. I can do this by:
$ sed -n 1~2p file
>A6NGG8_201_I_F
>B1AK53_719_S_R
>B1AK53_744_D_N
>B7U540_205_R_H
>B7U540_354_T_M
and so I want to store the number in each line as a variable in bash, however I run into a problem - storing the result of sed puts the output all on one line:
#!/bin/bash
line1=$(sed -n 1~2p)
echo ${line1}
in which the output is:
>A6NGG8_201_I_F >B1AK53_719_S_R >B1AK53_744_D_N >B7U540_205_R_H >B7U540_354_T_M
so that when I do something like:
#!/bin/bash
line1=$(sed -n 1~2p)
pos=$(echo ${line1} | awk -F"[__]" 'NF>2{print $2}')
echo ${pos}
I get
201
where I of course want:
201
719
744
205
354
How do I store the result of sed into separate lines so that they are processed properly when piped into my awk statement? I see you can use the /anotation, however when I tried sed -n '/1~2p/a' filethis does not work in my bash script. Thanks
As said in comments, you need to quote the variable to make this happen:
echo "${line1}"
instead of
echo ${line1}
However, you can directly say:
awk -F_ 'NR%2 && NF>2 {print $2}' file
This will process even lines and, in them, print the 2nd field on _ separated, just if it there are more than 2 fields.
From tripleee's answer I observe that a FASTA file can contain a different format. If so, I guess you will still want to get the ID in the lines starting with ">". This can be translated as:
awk -F_ '/^>/ && NF>2 {print $2}' file
See an example of how quoting preserves the format:
The file:
$ cat a
hello
bye
Read it into a variable:
$ var=$(< a)
echo without quoting:
$ echo $var
hello bye
Let's quote!
$ echo "$var"
hello
bye
If you are trying to get the header lines out of a FASTA file, your problem statement is wrong -- the data between the headers could be more than one line. You could simply do
sed -n '/^>/!d;s/^[^_]*//;s/_.*//p' file.fasta
to get just the second underscore-delimited field out of each header line; or equivalently, in Awk,
awk -F _ '/^>/ { print $2 }' file.fasta

How to add multiple line of output one by one to a variable in Bash?

This might be a very basic question but I was not able to find solution. I have a script:
If I run w | awk '{print $1}' in command line in my server I get:
f931
smk591
sc271
bx972
gaw844
mbihk988
laid640
smk59
ycc951
Now I need to use this list in my bash script one by one and manipulate some operation on them. I need to check their group and print those are in specific group. The command to check their group is id username. How can I save them or iterate through them one by one in a loop.
what I have so far is
tmp=$(w | awk '{print $1})
But it only return first record! Appreciate any help.
Populate an array with the output of the command:
$ tmp=( $(printf "a\nb\nc\n") )
$ echo "${tmp[0]}"
a
$ echo "${tmp[1]}"
b
$ echo "${tmp[2]}"
c
Replace the printf with your command (i.e. tmp=( $(w | awk '{print $1}') )) and man bash for how to work with bash arrays.
For a lengthier, more robust and complete example:
$ cat ./tstarrays.sh
# saving multi-line awk output in a bash array, one element per line
# See http://www.thegeekstuff.com/2010/06/bash-array-tutorial/ for
# more operations you can perform on an array and its elements.
oSET="$-"; set -f # save original set flags and turn off globbing
oIFS="$IFS"; IFS=$'\n' # save original IFS and make IFS a newline
array=( $(
awk 'BEGIN{
print "the quick brown"
print " fox jumped\tover\tthe"
print "lazy dogs back "
}'
) )
IFS="$oIFS" # restore original IFS value
set +f -$oSET # restore original set flags
for (( i=0; i < ${#array[#]}; i++ ));
do
printf "array[%d] of length=%d: \"%s\"\n" "$i" "${#array[$i]}" "${array[$i]}"
done
printf -- "----------\n"
printf -- "array[#]=\n\"%s\"\n" "${array[#]}"
printf -- "----------\n"
printf -- "array[*]=\n\"%s\"\n" "${array[*]}"
.
$ ./tstarrays.sh
array[0] of length=22: "the quick brown"
array[1] of length=23: " fox jumped over the"
array[2] of length=21: "lazy dogs back "
----------
array[#]=
"the quick brown"
array[#]=
" fox jumped over the"
array[#]=
"lazy dogs back "
----------
array[*]=
"the quick brown fox jumped over the lazy dogs back "
A couple of non-obvious key points to make sure your array gets populated with exactly what your command outputs:
If your command output can contain globbing characters than you should disable globbing before the command (oSET="$-"; set -f) and re-enable it afterwards (set +f -$oSET).
If your command output can contain spaces then set IFS to a newline before the command (oIFS="$IFS"; IFS=$'\n') and set it back to it's old value after the command (IFS="$oIFS").
tmp=$(w | awk '{print $1}')
while read i
do
echo "$i"
done <<< "$tmp"
You can use a for loop, i.e.
for user in $(w | awk '{print $1}'); do echo $user; done
which in a script would look nicer as:
for user in $(w | awk '{print $1}')
do
echo $user
done
You can use the xargs command to do this:
w | awk '{print $1}' | xargs -I '{}' id '{}'
With the -I switch, xargs will take each line of its standard input separately, then construct and execute a command line by replacing the specified string '{}' in the command line template with the input line
I guess you should use who instead of w. Try this out,
who | awk '{print $1}' | xargs -n 1 id

Resources