Grepping/awking assigned $variable - bash

Is it possible to grep strings or compare fields with awk in an assigned $variable.
For example
grep "word" "$foo"
only lists the complete content of $foo.
The awk command does not recognize variables but searches for a file in my folder:
awk 'FNR==NR{a[$1]++;next}a[$1]' "$foo" "$fee"
It says awk: fatal: cannot open file `$foo' for reading (No such file or directory)
#BMW suggested to provide more details. Here they are:
This is the complete command:
foo=$(cat my_text.txt | grep -B5 'application' | paste -s --delimiters=" " |sed 's/--/\n/g'| awk '{print $1 " " $2 " " $3 " " $4 " " $5}')
This is the output and the content of $foo.
reaction_1 jj-cju 2 application
reaction_1 jj-cju 2 application
reaction_1 jj-cjo 2 application
reaction_4 jj-cji 2 application
reaction_5 jj-cju 2 application
reaction_5 kk-cju 2 application
reaction_7 jj-cju 2 application
reaction_7 kk-cji 2 application
reaction_7 kk-cji 2 application
reaction_7 kk-cju 2 application
reaction_7 mm-cju 2 application

You can do this by using a herestring in place of a filename:
grep "word" <<< "$foo"
This will work if your command only requires a single input file/variable. If you require more than one, like your example awk command, you need to use process substitution:
awk 'FNR==NR{a[$1]++;next}a[$1]' <(echo "$foo") <(echo "$fee")
The <(...) construct runs the inner commands, then the output is treated as if it is a file.
Examples:
$ echo "$foo"
first line
second line
last one
$ echo "$fee"
example
text
$ grep "line" <<< "$foo"
first line
second line
$ grep "last" <(echo "$foo")
last one
$ awk '{print NR": "$0}' <(echo "$foo") <(echo "$fee")
1: first line
2: second line
3: last one
4: example
5: text

If you're doing word search using awk then use:
awk -v w="word" '$0 ~ w' "$foo"
Assuming $foo is a file.
You can even use:
awk '/word/' "$foo"

Related

Replace text in file with incremented text

I have a file in the directory with this text
VERSION_NUMBER: 1
I need to get the value of VERSION_NUMBER convert it to number and make n+1 and write it to a variable, for example variable test.
How I can do this using sed
Assumptions:
there's only one line in the input file
there's no need to verify that the value following the : is a number
no need to update the file with the new value
Input file:
$ cat myfile
VERSION_NUMBER: 1
One sed idea:
$ x=$(sed -En 's/^.*: (.*)$/\1/p' myfile)
$ ((x++))
$ echo "${x}"
2
One cut idea:
$ x=$(cut -d: -f2 myfile)
$ ((x++))
$ echo "${x}"
2
Same thing with awk:
$ x=$(awk '{print $2}' myfile)
$ ((x++))
$ echo "${x}"
2
In a comment OP has asked how to update the file with the new value.
Since we're only talking about a single line the following ...
$ echo "VERSION_NUMBER: ${x}" > myfile
... is probably going to be easier/simpler than running another sed or awk command to overwrite the current file.

Bash Shell: Infinite Loop

The problem is the following I have a file that each line has this form:
id|lastName|firstName|gender|birthday|joinDate|IP|browser
i want to sort alphabetically all the firstnames in that file and print them one on each line but each name only once
i have created the following program but for some reason it creates an infinite loop:
array1=()
while read LINE
do
if [ ${LINE:0:1} != '#' ]
then
IFS="|"
array=($LINE)
if [[ "${array1[#]}" != "${array[2]}" ]]
then
array1+=("${array[2]}")
fi
fi
done < $3
echo ${array1[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
NOTES
if [ ${LINE:0:1} != '#' ] : this command is used because there are comments in the file that i dont want to print
$3 : filename
array1 : is used for all the seperate names
Wow, there's a MUCH simpler and cleaner way to achieve this, without having to mess with the IFS variable or using arrays. You can use "for" to do this:
First I created a file with the same structure as yours:
$ cat file
id|lastName|Douglas|gender|birthday|joinDate|IP|browser
id|lastName|Tim|gender|birthday|joinDate|IP|browser
id|lastName|Andrew|gender|birthday|joinDate|IP|browser
id|lastName|Sasha|gender|birthday|joinDate|IP|browser
#id|lastName|Carly|gender|birthday|joinDate|IP|browser
id|lastName|Madson|gender|birthday|joinDate|IP|browser
Here's the script I wrote using "for":
#!/bin/bash
for LINE in `cat file | grep -v "^#" | awk -F'|' '{print$3}' | sort -u`
do
echo $LINE
done
And here's the output of this script:
$ ./script.sh
Andrew
Douglas
Madson
Sasha
Tim
Explanation:
for LINE in `cat file`
Creates a loop that reads each line of "file". The commands between ` are run by linux, for example, if you wanted to store the date inside of a variable you could use "VARDATE=`date`".
grep -v "^#"
The option -v is used to exclude results matching the pattern, in this case the pattern is "^#". The "^" character means "line begins with". So grep -v "^#" means "exclude lines beginning with #".
awk -F'|' '{print$3}'
The -F option switches the column delimiter from the default (the default is a space) to whatever you put between ' after it, in this case the "|" character.
The '{print$3}' prints the 3rd column.
sort -u
And the "sort -u" command to sort the names alphabetically.

output of oddlines in sed not appearing on separate lines

I have the following file:
>A6NGG8_201_I_F
line2
>B1AK53_719_S_R
line4
>B1AK53_744_D_N
line5
>B7U540_205_R_H
line6
>B7U540_354_T_M
line7
where I want to print out all odd lines. I can do this by:
$ sed -n 1~2p file
>A6NGG8_201_I_F
>B1AK53_719_S_R
>B1AK53_744_D_N
>B7U540_205_R_H
>B7U540_354_T_M
and so I want to store the number in each line as a variable in bash, however I run into a problem - storing the result of sed puts the output all on one line:
#!/bin/bash
line1=$(sed -n 1~2p)
echo ${line1}
in which the output is:
>A6NGG8_201_I_F >B1AK53_719_S_R >B1AK53_744_D_N >B7U540_205_R_H >B7U540_354_T_M
so that when I do something like:
#!/bin/bash
line1=$(sed -n 1~2p)
pos=$(echo ${line1} | awk -F"[__]" 'NF>2{print $2}')
echo ${pos}
I get
201
where I of course want:
201
719
744
205
354
How do I store the result of sed into separate lines so that they are processed properly when piped into my awk statement? I see you can use the /anotation, however when I tried sed -n '/1~2p/a' filethis does not work in my bash script. Thanks
As said in comments, you need to quote the variable to make this happen:
echo "${line1}"
instead of
echo ${line1}
However, you can directly say:
awk -F_ 'NR%2 && NF>2 {print $2}' file
This will process even lines and, in them, print the 2nd field on _ separated, just if it there are more than 2 fields.
From tripleee's answer I observe that a FASTA file can contain a different format. If so, I guess you will still want to get the ID in the lines starting with ">". This can be translated as:
awk -F_ '/^>/ && NF>2 {print $2}' file
See an example of how quoting preserves the format:
The file:
$ cat a
hello
bye
Read it into a variable:
$ var=$(< a)
echo without quoting:
$ echo $var
hello bye
Let's quote!
$ echo "$var"
hello
bye
If you are trying to get the header lines out of a FASTA file, your problem statement is wrong -- the data between the headers could be more than one line. You could simply do
sed -n '/^>/!d;s/^[^_]*//;s/_.*//p' file.fasta
to get just the second underscore-delimited field out of each header line; or equivalently, in Awk,
awk -F _ '/^>/ { print $2 }' file.fasta

How to add multiple line of output one by one to a variable in Bash?

This might be a very basic question but I was not able to find solution. I have a script:
If I run w | awk '{print $1}' in command line in my server I get:
f931
smk591
sc271
bx972
gaw844
mbihk988
laid640
smk59
ycc951
Now I need to use this list in my bash script one by one and manipulate some operation on them. I need to check their group and print those are in specific group. The command to check their group is id username. How can I save them or iterate through them one by one in a loop.
what I have so far is
tmp=$(w | awk '{print $1})
But it only return first record! Appreciate any help.
Populate an array with the output of the command:
$ tmp=( $(printf "a\nb\nc\n") )
$ echo "${tmp[0]}"
a
$ echo "${tmp[1]}"
b
$ echo "${tmp[2]}"
c
Replace the printf with your command (i.e. tmp=( $(w | awk '{print $1}') )) and man bash for how to work with bash arrays.
For a lengthier, more robust and complete example:
$ cat ./tstarrays.sh
# saving multi-line awk output in a bash array, one element per line
# See http://www.thegeekstuff.com/2010/06/bash-array-tutorial/ for
# more operations you can perform on an array and its elements.
oSET="$-"; set -f # save original set flags and turn off globbing
oIFS="$IFS"; IFS=$'\n' # save original IFS and make IFS a newline
array=( $(
awk 'BEGIN{
print "the quick brown"
print " fox jumped\tover\tthe"
print "lazy dogs back "
}'
) )
IFS="$oIFS" # restore original IFS value
set +f -$oSET # restore original set flags
for (( i=0; i < ${#array[#]}; i++ ));
do
printf "array[%d] of length=%d: \"%s\"\n" "$i" "${#array[$i]}" "${array[$i]}"
done
printf -- "----------\n"
printf -- "array[#]=\n\"%s\"\n" "${array[#]}"
printf -- "----------\n"
printf -- "array[*]=\n\"%s\"\n" "${array[*]}"
.
$ ./tstarrays.sh
array[0] of length=22: "the quick brown"
array[1] of length=23: " fox jumped over the"
array[2] of length=21: "lazy dogs back "
----------
array[#]=
"the quick brown"
array[#]=
" fox jumped over the"
array[#]=
"lazy dogs back "
----------
array[*]=
"the quick brown fox jumped over the lazy dogs back "
A couple of non-obvious key points to make sure your array gets populated with exactly what your command outputs:
If your command output can contain globbing characters than you should disable globbing before the command (oSET="$-"; set -f) and re-enable it afterwards (set +f -$oSET).
If your command output can contain spaces then set IFS to a newline before the command (oIFS="$IFS"; IFS=$'\n') and set it back to it's old value after the command (IFS="$oIFS").
tmp=$(w | awk '{print $1}')
while read i
do
echo "$i"
done <<< "$tmp"
You can use a for loop, i.e.
for user in $(w | awk '{print $1}'); do echo $user; done
which in a script would look nicer as:
for user in $(w | awk '{print $1}')
do
echo $user
done
You can use the xargs command to do this:
w | awk '{print $1}' | xargs -I '{}' id '{}'
With the -I switch, xargs will take each line of its standard input separately, then construct and execute a command line by replacing the specified string '{}' in the command line template with the input line
I guess you should use who instead of w. Try this out,
who | awk '{print $1}' | xargs -n 1 id

using awk within loop to replace field

I have written a script finding the hash value from a dictionary and outputting it in the form "word:md5sum" for each word. I then have a file of names which I would like to use to place each name followed by every hash value i.e.
tom:word1hash
tom:word2hash
.
.
bob:word1hash
and so on. Everything works fine but I can not figure out the substitution. Here is my script.
$#!/bin/bash
#/etc/dictionaries-common/words
cat words.txt | while read line; do echo -n "$line:" >> dbHashFile.txt
echo "$line" | md5sum | sed 's/[ ]-//g' >> dbHashFile.txt; done
cat users.txt | while read name
do
cat dbHashFile.txt >> nameHash.txt;
awk '{$1="$name"}' nameHash.txt;
cat nameHash.txt >> dbHash.txt;
done
the line
$awk '{$1="$name"}' nameHash.txt;
is where I attempt to do the substitution.
thank you for your help
Try replacing the entire contents of the last loop (both cats and the awk) with:
awk -v name="$name" -F ':' '{ print name ":" $2 }' dbHashFile.txt >>dbHash.txt

Resources