Arguments to change variable values in Bash script - bash

i have this script in bash:
#!/bin/bash
dir="/home/dortiz/Prueba"
for i in $dir/*
do
cat $i | awk '{print $1" " $2" " $3" " $4"\n " $5}' | \
awk '/gi/{print ">" $0; getline; print}' | \
awk '$3>20.00 {print $0; getline; print;}' \
> "${i}.outsel"
done
cd /home/dortiz/Prueba
mv *.outsel /home/dortiz/Prueba2
and i would like to set an argument to change the value after ""awk '$3>"" in an easy way from my main program that will call this script.
i have read something about getopts but i dont uderstand it at all
Thanks a lot in advance

The simplest way is to just pass an argument to your script:
yourscript.sh 20.0
Then in your script
#!/bin/bash
value=$1 # store the value passed in as the first parameter.
dir="/home/dortiz/Prueba"
for i in $dir/*; do
awk '{print $1" " $2" " $3" " $4"\n " $5}' "$i" |
awk '/gi/{print ">" $0; getline; print}' |
awk -v val="$value" '$3>val {print $0; getline; print;}' > "${i}.outsel"
# ^^^^^^^^^^^^^^^
done
...
and the cat|awk|awk|awk pipeline can probably be written like this:
awk -v val="$value" '
$3 > val {
prefix = /gi/ ? ">" : ""
print prefix $1 " " $2" " $3" " $4"\n " $5
}
' "$i" > "$i.outsel"

Related

Iterating through Comma Separated rows in loop in Shell

Final.txt
Failed,2021-12-07 22:30 EST,Scheduled Backup,abc,/clients/FORD_1030PM_EST_Windows2008,Windows File System
Failed,2021-12-07 22:00 EST,Scheduled Backup,def,/clients/FORD_10PM_EST_Windows2008,Windows File System
I want to iterate through these rows instead of column
Expected Output
client=abc
client=def
group=/clients/FORD_1030PM_EST_Windows2008
group=/clients/FORD_10PM_EST_Windows2008
I tried this
while read line ; do
group=$(awk -F',' '{print $4}')
client=$(awk -F',' '{print $5}')
echo $group
echo $client
done < Final
it's Not working but when I am individually doing this
cat Final | awk -F',' '{print $4}'
then it is giving me the expected output but does not work when I am trying in the loop.
With GNU awk:
awk -F ',' 'BEGINFILE{f++}
f==1{print "client=" $4}
f==2{print "group=" $5}
' Final Final
Output:
client=abc
client=def
grooup=/clients/FORD_1030PM_EST_Windows2008
grooup=/clients/FORD_10PM_EST_Windows2008
One-pass awk solution, storing field $5 in an array for printing at the end:
$ awk -F, '{print $4; groups[NR]=$5} END {for (i=1;i<=NR;i++) print groups[i]}' Final.txt
abc
def
/clients/FORD_1030PM_EST_Windows2008
/clients/FORD_10PM_EST_Windows2008
Two-pass awk that eliminates need to store field $5 in an array:
$ awk -F, 'FNR==NR {print $4;next} {print $5}' Final.txt Final.txt
abc
def
/clients/FORD_1030PM_EST_Windows2008
/clients/FORD_10PM_EST_Windows2008
with bash
declare -a client group
while IFS=, read -ra fields; do
client+=("${fields[3]}")
group+=("${fields[4]}")
done < Final
printf 'client=%s\n' "${client[#]}"
printf 'group=%s\n' "${group[#]}"
Using miller, it's no different really from a single-pass awk
solution, collecting the values in arrays:
mlr --icsv --implicit-csv-header put '
#client[NR] = $4;
#group[NR] = $5;
filter false;
end {
emit #client, "client";
emit #group, "group";
}
' Final
The equivalent of the above, as more readable (IMO) awk code:
awk -F, '
{client[NR] = $4; group[NR] = $5}
END {
for (i=1; i<=NR; i++) print "client=" client[i]
for (i=1; i<=NR; i++) print "group=" group[i]
}
' Final
Using csvtool is nice because it has a transpose function,
but it still needs help getting to the desired output
csvtool col 4,5 Final \
| csvtool cat <(echo "client,group") - \
| csvtool transpose - \
| awk -F, -v OFS="=" '{for (i=2; i<=NF; i++) print $1, $i}'
LOG="Failed,2021-12-07 22:30 EST,Scheduled Backup,abc,/clients/FORD_1030PM_EST_Windows2008,Windows File System
Failed,2021-12-07 22:00 EST,Scheduled Backup,def,/clients/FORD_10PM_EST_Windows2008,Windows File System"
getColumnTextSed(){
log="$1"
column=$2
[ -n "$log" -a -n "$column" ] && sed -E 's/([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),(.*)/\'$column'/mg;t;d' <<<$log
}
getColumnTextAWK(){
log="$1"
column=$2
[ -n "$log" -a -n "$column" ] && awk -F',' -v "c=$column" '{print $(c)}' <<<$log
}
echo "# with sed and column number"
echo "-------------------------------"
getColumnTextSed "$LOG" 4
getColumnTextSed "$LOG" 5
echo "-------------------------------"
echo "# with awk and column number"
echo "-------------------------------"
getColumnTextAWK "$LOG" 4
getColumnTextAWK "$LOG" 5
OUTPUT
# with sed
-------------------------------
abc
def
/clients/FORD_1030PM_EST_Windows2008
/clients/FORD_10PM_EST_Windows2008
-------------------------------
# with awk
-------------------------------
abc
def
/clients/FORD_1030PM_EST_Windows2008
/clients/FORD_10PM_EST_Windows2008

/usr/bin/zcat: Argument list too long

I have used the following a guide with these commands:
echo "accession\taccession.version\ttaxid\tgi" > reference_proteomes.taxid_map
zcat */*/*.idmapping.gz | grep "NCBI_TaxID" | awk '{print $1 "\t" $1 "\t" $3 "\t" 0}' >> reference_proteomes.taxid_map
I used echo "accession\taccession.version\ttaxid\tgi" > reference_proteomes.taxid_map & zcat *.idmapping.gz | grep "NCBI_TaxID" | awk '{print $1 "\t" $1 "\t" $3 "\t" 0}' >> reference_proteomes.taxid_map
but I received the messaged that the argument list is too long:/usr/bin/zcat: Argument list too long
so I have tried xargs in this: find /Volumes/My\ Passport\ for\ Mac/uniprot | xargs zcat *.idmapping.gz | grep "NCBI_TaxID" | awk '{print $1 "\t" $1 "\t" $3 "\t" 0}' >> reference_proteomes.taxid_map
but still received the long list argument message -bash: /usr/bin/xargs: Argument list too long, any suggestion?

Argument not recognised/accesed by egrep - Shell

Egrep and Awk to output columns of a line , with a specific value for the first column
I am to tasked to write a shell program which when ran as such
./tool.sh -f file -id id OR ./tool.sh -id id -f file
must output the name surname and birthdate (3 columns of the file ) for that specific id.
So far my code is structured as such :
elif [ "$#" -eq 4 ];
then
while [ "$1" != "" ];
do
case $1 in
-f)
cat < "$2" | egrep '"$4"' | awk ' {print $3 "\t" $2 "\t" $5}'
shift 4
;;
-id)
cat < "$4" | egrep '"$2"' | awk ' {print $3 "\t" $2 "\t" $5}'
shift 4
esac
done
(Ignoring the opening elif cause there are more subtasks for later)
My output is nothing. The program just runs.
I've tested the cat < people.dat | egrep '125' | awk ' {print $3 "\t" $2 "\t" $5}'
and it runs just fine.
I also had an instance where i had an output from the program while it was run like so
cat < "$2" | egrep '["$4"]' | awk ' {print $3 "\t" $2 "\t" $5}'
but it wasnt only that specific ID.
`egrep "$4"` was correct instead of `egrep '["$4"]'` in
`cat < "$2" | egrep '["$4"]' | awk ' {print $3 "\t" $2 "\t" $5}'`
Double quotes allow variables, single quotes don't. No commands need
certain types of quotes, they are purely a shell feature that are not
passed to the command. mentioned by(#that other guy)

How to print the remaining columns using awk?

Right now I have a command that prints my log file with a delimited | per column.
cat ambari-alerts.log | awk -F '[ ]' '{print $1 "|" $2 "|" $3 "|" $4 "|" $5 "|"}' |
grep "$(date +"%Y-%m-%d")"
Sample of the log file data is this:
2016-02-11 09:40:33,875 [OK] [MAPREDUCE2] [mapreduce_history_server_rpc_latency] (History Server RPC Latency) Average Queue Time:[0.0], Average Processing Time:[0.0]
The result of my command is this:
2016-02-11|09:40:33,875|[OK]|[MAPREDUCE2]|[mapreduce_history_server_rpc_latency]
I want to print the remaining columns. How can I do that? I tried this syntax adding $0, but unfortunately it just prints the whole line again.
awk -F '[ ]' '{print $1 "|" $2 "|" $3 "|" $4 "|" $5 "|" $0}'
Hope you can help me, newbie here in using awk.
This seems to be all you need:
$ awk '{for (i=1;i<=5;i++) sub(/ /,"|")} 1' file
2016-02-11|09:40:33,875|[OK]|[MAPREDUCE2]|[mapreduce_history_server_rpc_latency]|(History Server RPC Latency) Average Queue Time:[0.0], Average Processing Time:[0.0]
This is a bit of a hassle with awk
awk -F '[ ]' '{
printf "%s|%s|%s|%s|%s|", $1, $2, $3, $4, $5
for (i=6; i<=NF; i++) printf "%s ", $i
print ""
}'
or, replace the first 5 spaces:
awk -F '[ ]' '{
sub(/ /, "|");sub(/ /, "|");sub(/ /, "|");sub(/ /, "|");sub(/ /, "|")
print
}'
This is actually easier in bash
while IFS=" " read -r a b c d e rest; do
echo "$a|$b|$c|$d|$e|$rest"
done < file.log
Folding in your grep:
awk -F '[ ]' -v date="$(date +%Y-%m-%d)" '{
$0 ~ date {
printf "%s|%s|%s|%s|%s|", $1, $2, $3, $4, $5
for (i=6; i<=NF; i++) printf "%s ", $i
print ""
}
}'
Here is some awk that provides a somewhat more generalized approach than brute-forcing the first 5 columns:
awk '{
for (i = 1; i < 6; i++)
printf "%s|", $i
for (i = 6; i < NF; i++)
printf " %s ", $i
}' ambari-alerts.log | grep "$(date +"%Y-%m-%d")"

Parse /proc/mounts and substitude only one field

I am looking for a way to print informations from /proc/mounts like that:
/home /dev/md9 /dev/mapper/home home
/var/tmp /dev/md7 /dev/mapper/vartmp vartmp
I try:
awk '{ print $2 " " $1; gsub("/","",$2); print "/dev/mapper/"$2" "$2 }' /proc/mounts
But the result is on two lines:
/home /dev/mapper/home
/dev/mapper/home home
/var/tmp /dev/md7
/dev/mapper/vartmp vartmp
Does anyone have a solution?
fix
use printf ( to avoid the implicit linefeed )
add whitespace to separate the printf from the gsub output
adjusted command
awk '{ printf $2 " " $1 " "; gsub("/","",$2); print "/dev/mapper/"$2" "$2 }' /proc/mounts
input.txt
/dev/mapper/home /home blah blah blah blah
output
$ awk '{ printf $2 " " $1 " "; gsub("/","",$2); print "/dev/mapper/"$2" "$2 }' input.txt
/home /dev/mapper/home /dev/mapper/home home

Resources