Why am I not able to store bash output to shell? - bash

I have the following script:
#!/bin/bash
…code setting array ids, etc…
for i in "${!ids[#]}" ; do
echo "#${ids[i]}_${pos[i]}_${wild[i]}_${sub[i]}"
curl -sS "http://www.uniprot.org/uniprot/"${ids[i]}".fasta";
done |
sed '/^>/ d' |
sed -r 's/[#]+/>/g' |
perl -npe 'chomp if ($.!=1 && !s/^>/\n>/)' > $id.pph.fasta
However the results will not store in the file. I can output the result to the terminal and store in file by doing:
./myscript > result.txt
However I want to do this within the script and output to file outside the loop.

Add
exec 1>result.txt
to the top of the script, and all output will be redirected.

Here is a variation of your script:
#!/bin/sh
for i in ${!ids[*]}
do
echo ">${ids[i]}_${pos[i]}_${wild[i]}_${sub[i]}"
curl -Ss www.uniprot.org/uniprot/${ids[i]}.fasta
done |
awk '
/>/ {if (z++) printf RS; print; printf RS; getline; next}
1
END {printf RS}
' ORS= > $id.pph.fasta

Related

multiline awk script inside shell script

#!/usr/bin/tcsh
cmd='BEGIN{c=0}{
if($1=="Net"){print $0}
if($1=="v14")
{
if($4>=200)
{print "Drop more than 200 at "$1}
}
}'
awk -f "$cmd" input_file.txt > output_file.txt
I am trying to execute shell script which contains multiline awk script inside it.
storing awk script (especially multiline awk script) to a variable cmd & then excuting that cmd in awk -f "$cmd" input_file.txt > output_file.txt.
this is giving an error like below
awk: fatal: can't open source file `BEGIN{c=0}{
if($1=="Net"){print $0}
if($1=="v14")
{
if($4>=200)
{print"Drop more than 200 at $1}
}
}' for reading (No such file or directory)
my questin is how do i execute shell script which contains multiline awk script inside it?
can you please help me with this as i couldn't figureout even after searching in google/reference manual?
You use awk -f when you want to pass a file name for the script to execute.
Here your awk script is an inline string, so just removing the -f option will fix your issue.
awk "$cmd" input_file.txt > output_file.txt
Don't write [t]csh scripts, see any of the many results of https://www.google.com/search?q=csh+why+not, use a Bourne-derived shell like bash.
Don't store an awk script in a shell variable and then ask awk to interpret the contents of that variable, just store the script in a function and call that.
So, do something like this:
#!/usr/bin/env bash
foo() {
awk '
{ print "whatever", $0 }
' "${#:--}"
}
foo input_file.txt > output_file.txt
This is the equivalent script
$1=="Net"
$1=="v14" && $4>=200 {print "Drop more than 200 at "$1}
save into a file, for example test.awk and run as
$ awk -f test.awk input_file > output_file
Or, as for simple one time scripts you can just
$ awk '$1=="Net"; $1=="v14" && $4>=200 {print "Drop more than 200 at "$1}' input_file > output_file
obviously the above line can be inserted in a shell script as well.
Don't know in tcsh but in bash it is also possible using heredoc :
#!/usr/bin/bash
awk -f <(cat - <<-'_EOF_'
BEGIN{c=0}{
if($1=="Net"){print $0}
if($1=="v14")
{
if($4>=200)
{print "Drop more than 200 at "$1}
}
}
_EOF_
) input_file.txt > output_file.txt

Starting a new line in bash scripting

I need to start a new line after each field. I know I need to use \n at the end of the command how would I do it if I am using the cat command at the start.
I have tried using && after the awk -F : 'NR==1' && '\n'. My code is:
cat /etc/shadow | awk -F : 'NR==1' && "\n"
cat /etc/shadow | awk -F : 'NR == 1 { print "Username: " $1, "\n"}'
When you want to split the fields in different lines, you can use
... | tr ':' '\n'
or when you want to hold the : at the end of each line
... | sed 's/:/:\n/g'
Maybe
&& echo or
&& Printf "\n"
Not clear the mean.

awk command has different behaviors when executing the exact same code. Why?

I have created a little shellscript that is capable of receiving a list of values such as "MY_VAR_NAME=var_value MY_VAR_NAME2=value2 ...", separated by spaces only. There should be also the possibility to use values such as MY_VAR_NAME='' or MY_VAR_NAME= (nothing).
These values are then used to change the value inside a environment variables file, for example, MY_VAR_NAME=var_value would make the script change the MY_VAR_NAME value inside the .env file to var_value, without changing anything else about the file.
The env file has the following configuration:
NODE_ENV=development
APP_PATH=/media
BASE_URL=http://localhost:3000
ASSETS_PATH=http://localhost:3000
USE_CDN=false
APP_PORT=3000
WEBPACK_PORT=8080
IS_CONNECTED_TO_BACKEND=false
SHOULD_BUILD=false
USE_REDUX_TOOL=false
USE_LOG_OUTPUT_AS_JSON=false
ACCESS_KEY_ID=
SECRET_ACCESS_KEY=
BUCKET_NAME=
BASE_PATH=
MIX_PANEL_KEY=
RDSTATION_KEY=
RESOURCE_KEY=
SHOULD_ENABLE_INTERCOM=false
SHOULD_ENABLE_GTM=false
SHOULD_ENABLE_UTA=false
SHOULD_ENABLE_WOOTRIC=false
I have debugged my script, and found out that this is the point where sometimes it has a problem
cat .envtemp | awk -v var_value="$VAR_VALUE" \
-v var_name="$VAR_NAME" \
-F '=' '$0 !~ var_name {print $0} $0 ~ var_name {print $1"="var_value}' | tee .envtemp
This piece of code sometimes outputs to .envtemp the proper result, while sometimes it just outputs nothing, making .envtemp empty
The complete code i am using is the following:
function change_value(){
VAR_NAME=$1
VAR_VALUE=$2
cat .envtemp | awk -v var_value="$VAR_VALUE" \
-v var_name="$VAR_NAME" \
-F '=' '$0 !~ var_name {print $0} $0 ~ var_name {print $1"="var_value}' | tee .envtemp
ls -l -a .env*
}
function manage_env(){
for VAR in $#
do
var_name=`echo $VAR | awk -F '=' '{print $1}'`
var_value=`echo $VAR | awk -F '=' '{print $2}'`
change_value $var_name $var_value
done
}
function main(){
manage_env $#
cat .envtemp > .env
exit 0
}
main $#
Here is an example script for recreating the error. It does not happen every time, and when it happens, it is not always with the same input.
#!/bin/bash
ENV_MANAGER_INPUT="NODE_ENV=production BASE_URL=http://qa.arquivei.com.br ASSETS_PATH=https://d4m6agb781hapn.cloudfront.net USE_CDN=true WEBPACK_PORT= IS_CONNECTED_TO_BACKEND=true ACCESS_KEY_ID= SECRET_ACCESS_KEY= BUCKET_NAME=frontend-assets-dev BASE_PATH=qa"
cp .env.dist .env
#Removes comment lines. The script needs a .envtemp file.
cat .env.dist | grep -v '#' | grep -v '^$' > .envtemp
./jenkins_env_manager.sh ${ENV_MANAGER_INPUT}
Have you tried use two files:
mv .envtemp .envtemp.tmp
cat .envtemp.tmp | awk ... | tee .envtemp

Script returned '/usr/bin/awk: Argument list too long' in using -v in awk command

Here is the part of my script that uses awk.
ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
This works perfectly, but when I tried to get data to two or more files like this.
ids=`cut -d ',' -f1 $file1 $file2 $file3 | sed ':a;N;$!ba;s/\n/,/g'`
It returned this error.
/usr/bin/awk: Argument list too long
As I researched, it was not caused by the number of files, but the number of ids fetched.
Does anybody have an idea on how to solve this? Thanks.
You could use an environment variable to pass the data to awk. In awk the environment variables are accessible via an array ENVIRON.
So try something like this:
export ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -F',' 'NR > 1 {if(index(ENVIRON["ids"],$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
Change the way you generate your ids so they come out one per line, like this, which I use as a very simple way to generate ids 2,3 and 9:
echo 2; echo 3; echo 9
2
3
9
Now pass that as the first file to awk and your $input_file as the second file to awk:
awk '...' <(echo 2; echo 3; echo 9) "$input_file"
In bash you can generate a pseudo-file with the output of a process using <(some commands), and that is what I am using.
Now, in your awk, pick up the ids from the first file like this:
awk 'FNR==NR{ids[$1]++;next}' <(echo 2; echo 3; echo 9)
which will set ids[2]=1, ids[3]=1 and ids[9]=1.
Then pass both your files and add in your original processing:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(echo 2; echo 3; echo 9) "$input_file"
So, for my final answer, your entire code will look like:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(cut ... file1 file2 file3 | sed ...) "$input_file"
As #hek2mgl alludes in the comments, you can likely just pass the files which include the ids to awk "as is" and let awk find the ids itself rather than using cut and sed. If there are many, you can make them all come to awk as the first file with:
awk '...' <(cat file1 file2 file3) "$input_file"
There's 2 problems in your script:
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
that could be causing that error:
-vdata=.. - that is gawk-specific, in other awks you need to leave a space between -v and data=. So if you aren't running gawk then idk what your awk will make of that statement but it might treat it as multiple args.
$input_file - you MUST quote shell variables unless you have a specific purpose in mind by leaving them unquoted. If $input_file contains globbing chars or spaces then you leaving it unquoted will cause them to be expanded into potentially multiple files/args.
So try this:
awk -v data="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' "$input_file" >> "$output_file"
and see if you still have the problem. Your script does have other unrelated issues of course, some of which have already been pointed out, and you can post a followup question if you want help with those, but just FYI that awk script could be written more concisely as:
awk -v data="$ids" 'BEGIN{FS=OFS=","} NR > 1{print $0, (index(data,$2) ? "true" : "false")}'

How can I specify a row in awk in for loop?

I'm using the following awk command:
my_command | awk -F "[[:space:]]{2,}+" 'NR>1 {print $2}' | egrep "^[[:alnum:]]"
which successfully returns my data like this:
fileName1
file Name 1
file Nameone
f i l e Name 1
So as you can see some file names have spaces. This is fine as I'm just trying to echo the file name (nothing special). The problem is calling that specific row within a loop. I'm trying to do it this way:
i=1
for num in $rows
do
fileName=$(my_command | awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]])"
echo "$num $fileName"
$((i++))
done
But my output is always null
I've also tried using awk -v record=$i and then printing $record but I get the below results.
f i l e Name 1
EDIT
Sorry for the confusion: rows is a variable that list ids like this 11 12 13
and each one of those ids ties to a file name. My command without doing any parsing looks like this:
id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3
I can only use the id field to run a the command that I need, but I want to use the File Info field to notify the user of the actual File that the command is being executed against.
I think your $i does not expand as expected. You should quote your arguments this way:
fileName=$(my_command | awk -F "[[:space:]]{2,}+" "NR==$i {print \$2}" | egrep "^[[:alnum:]]")
And you forgot the other ).
EDIT
As an update to your requirement you could just pass the rows to a single awk command instead of a repeatitive one inside a loop:
#!/bin/bash
ROWS=(11 12)
function my_command {
# This function just emulates my_command and should be removed later.
echo " id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3"
}
awk -- '
BEGIN {
input = ARGV[1]
while (getline line < input) {
sub(/^ +/, "", line)
split(line, a, / +/)
for (i = 2; i < ARGC; ++i) {
if (a[1] == ARGV[i]) {
printf "%s %s\n", a[1], a[2]
break
}
}
}
exit
}
' <(my_command) "${ROWS[#]}"
That awk command could be condensed to one line as:
awk -- 'BEGIN { input = ARGV[1]; while (getline line < input) { sub(/^ +/, "", line); split(line, a, / +/); for (i = 2; i < ARGC; ++i) { if (a[1] == ARGV[i]) {; printf "%s %s\n", a[1], a[2]; break; }; }; }; exit; }' <(my_command) "${ROWS[#]}"
Or better yet just use Bash instead as a whole:
#!/bin/bash
ROWS=(11 12)
while IFS=$' ' read -r LINE; do
IFS='|' read -ra FIELDS <<< "${LINE// +( )/|}"
for R in "${ROWS[#]}"; do
if [[ ${FIELDS[0]} == "$R" ]]; then
echo "${R} ${FIELDS[1]}"
break
fi
done
done < <(my_command)
It should give an output like:
11 File Name1
12 Fi leNa me2
Shell variables aren't expanded inside single-quoted strings. Use the -v option to set an awk variable to the shell variable:
fileName=$(my_command | awk -v i=$i -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]])"
This method avoids having to escape all the $ characters in the awk script, as required in konsolebox's answer.
As you already heard, you need to populate an awk variable from your shell variable to be able to use the desired value within the awk script so thi:
awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]]"
should be this:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
Also, though, you don't need awk AND grep since awk can do anything grep van do so you can change this part of your script:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
to this:
awk -v i="$i" -F "[[:space:]]{2,}+" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
and you don't need a + after a numeric range so you can change {2,}+ to just {2,}:
awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
Most importantly, though, instead of invoking awk once for every invocation of my_command, you can just invoke it once for all of them, i.e. instead of this (assuming this does what you want):
i=1
for num in rows
do
fileName=$(my_command | awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}')
echo "$num $fileName"
$((i++))
done
you can do something more like this:
for num in rows
do
my_command
done |
awk -F '[[:space:]]{2,}' '$2~/^[[:alnum:]]/{print NR, $2}'
I say "something like" because you don't tell us what "my_command", "rows" or "num" are so I can't be precise but hopefully you see the pattern. If you give us more info we can provide a better answer.
It's pretty inefficient to rerun my_command (and awk) every time through the loop just to extract one line from its output. Especially when all you're doing is printing out part of each line in order. (I'm assuming that my_command really is exactly the same command and produces the same output every time through your loop.)
If that's the case, this one-liner should do the trick:
paste -d' ' <(printf '%s\n' $rows) <(my_command |
awk -F '[[:space:]]{2,}+' '($2 ~ /^[::alnum::]/) {print $2}')

Resources