Bash Script changing the output file for every file - bash

I'm trying to use a command on everyfile .log in a folder but I can't understand how to change the output file for every file in the folder.
#!/bin/bash
N="0"
for i in "*.log"
do
echo "Processing $f file..."
cat $i | grep "test" | awk '/show/ {a = $1} !/show/{print a,$6}' > "log$N.txt"
done
How can i increment the counter for log$n.txt?

It's bad practice to write shell loops just to process text (see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice). This is all you need:
awk '
FNR==1 { print "Processing", FILENAME; a=""; close(out); out="log" fileNr++ ".txt" }
!/test/ { next }
/show/ { a = $1; next }
{ print a, $6 > out }
' *.log

#!/bin/bash
N="0"
for i in *.log
do
echo "Processing $f file..."
cat $i | grep "test" | awk '/show/ {a = $1} !/show/{print a,$6}' > "log$N.txt"
N=$((N+1))
done
You need to increment the variable 'N' over every iteration and remove the double-quotes in the for loop

Related

Adding data to line in CSV if value exists in external file

Here is my sample data:
1,32425,New Zealand,number,21004
1,32425,New Zealand,number,20522
1,32434,Australia,number,1542
1,32434,Australia,number,986
1,32434,Fiji,number,1
Here is my expected output:
1,32425,New Zealand,number,21004,No
1,32425,New Zealand,number,20522,No
1,32434,Australia,number,1542,No
1,32434,Australia,number,986,No
1,32434,Fiji,number,1,Yes
Basically I am trying to append the Yes/No based on if field 3 is contained in an external file. Here is what I have currently but as I understand it grep is eating all the stdin in the while loop. So I am only getting No added to the end of each line as the first value is not contained in the external file.
while IFS=, read -r type id country number volume
do
if grep $country externalfile.csv
then
echo "${country}"
sed 's/$/,Yes/' >> file2.csv
else
echo "${country}"
sed 's/$/,No/' >> file2.csv
fi
done < file1.csv
I added the echo "${country}" as I was trying to troubleshoot and that's how I discovered it was only parsing the first line.
Assuming there are no headers -
awk -F, 'NR==FNR{lookup[$1]=$1; next;}
{ if ( lookup[$3] == $3 ) { print $0 ",Yes" } else { print $0 ",No" } }
' externalfile.csv file2.csv
This will parse both files in one pass.
If you just prefer to do it in pure bash,
declare -A lookup
while read c; do lookup["$c"]="$c"; done < externalfile.csv
declare -p lookup # this is just to show you what my example loaded
declare -A lookup='([USA]="USA" [Fiji]="Fiji" )'
while IFS=, read a b c d; do
[[ -n "${lookup[$c]}" ]] && echo "$a,$b,$c,$d,Yes" || echo "$a,$b,$c,$d,No"
done < file2.csv
1,32425,New Zealand,number,21004,No
1,32425,New Zealand,number,20522,No
1,32434,Australia,number,1542,No
1,32434,Australia,number,986,No
1,32434,Fiji,number,1,Yes
No grep needed.
awk -F, -v OFS=, 'NR == FNR { ++a[$1]; next } { $(++NF) = $3 in a ? "Yes" : "No" } 1' externalfile.csv file2.csv
Try this:
while read -r line
do
country=`echo $line | cut -d',' -f3`
if grep "$country" externalfile.csv
then
echo "$line,Yes" >> file2.csv
else
echo "$line,No" >> file2.csv
fi
done < test.txt
You need to put $country inside the ", because some country could contains more than 1 word. For example New Zealand. You can also set country variable easier using cut command.

Splitting out a large file

I would like to process a 200 GB file with lines like the following:
...
{"captureTime": "1534303617.738","ua": "..."}
...
The objective is to split this file into multiple files grouped by hours.
Here is my basic script:
#!/bin/sh
echo "Splitting files"
echo "Total lines"
sed -n '$=' $1
echo "First Date"
head -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
echo "Last Date"
tail -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
while read p; do
date=$(echo "$p" | sed 's/{"captureTime": "//' | sed 's/","ua":.*//' | xargs -i date -d '#{}' '+%Y%m%d%H')
echo $p >> split.$date
done <$1
Some facts:
80 000 000 lines to process
jq doesn't work well since some JSON lines are invalid.
Could you help me to optimize this bash script?
Thank you
This awk solution might come to your rescue:
awk -F'"' '{file=strftime("%Y%m%d%H",$4); print >> file; close(file) }' $1
It essentially replaces your while-loop.
Furthermore, you can replace the complete script with:
# Start AWK file
BEGIN{ FS='"' }
(NR==1){tmin=tmax=$4}
($4 > tmax) { tmax = $4 }
($4 < tmin) { tmin = $4 }
{ file="split."strftime("%Y%m%d%H",$4); print >> file; close(file) }
END {
print "Total lines processed: ", NR
print "First date: "strftime("%Y%m%d%H",tmin)
print "Last date: "strftime("%Y%m%d%H",tmax)
}
Which you then can run as:
awk -f <awk_file.awk> <jq-file>
Note: the usage of strftime indicates that you need to use GNU awk.
you can start optimizing by changing this
sed 's/{"captureTime": "//' | sed 's/","ua":.*//'
with this
sed -nE 's/(\{"captureTime": ")([0-9\.]+)(.*)/\2/p'
-n suppress automatic printing of pattern space
-E use extended regular expressions in the script

How can I specify a row in awk in for loop?

I'm using the following awk command:
my_command | awk -F "[[:space:]]{2,}+" 'NR>1 {print $2}' | egrep "^[[:alnum:]]"
which successfully returns my data like this:
fileName1
file Name 1
file Nameone
f i l e Name 1
So as you can see some file names have spaces. This is fine as I'm just trying to echo the file name (nothing special). The problem is calling that specific row within a loop. I'm trying to do it this way:
i=1
for num in $rows
do
fileName=$(my_command | awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]])"
echo "$num $fileName"
$((i++))
done
But my output is always null
I've also tried using awk -v record=$i and then printing $record but I get the below results.
f i l e Name 1
EDIT
Sorry for the confusion: rows is a variable that list ids like this 11 12 13
and each one of those ids ties to a file name. My command without doing any parsing looks like this:
id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3
I can only use the id field to run a the command that I need, but I want to use the File Info field to notify the user of the actual File that the command is being executed against.
I think your $i does not expand as expected. You should quote your arguments this way:
fileName=$(my_command | awk -F "[[:space:]]{2,}+" "NR==$i {print \$2}" | egrep "^[[:alnum:]]")
And you forgot the other ).
EDIT
As an update to your requirement you could just pass the rows to a single awk command instead of a repeatitive one inside a loop:
#!/bin/bash
ROWS=(11 12)
function my_command {
# This function just emulates my_command and should be removed later.
echo " id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3"
}
awk -- '
BEGIN {
input = ARGV[1]
while (getline line < input) {
sub(/^ +/, "", line)
split(line, a, / +/)
for (i = 2; i < ARGC; ++i) {
if (a[1] == ARGV[i]) {
printf "%s %s\n", a[1], a[2]
break
}
}
}
exit
}
' <(my_command) "${ROWS[#]}"
That awk command could be condensed to one line as:
awk -- 'BEGIN { input = ARGV[1]; while (getline line < input) { sub(/^ +/, "", line); split(line, a, / +/); for (i = 2; i < ARGC; ++i) { if (a[1] == ARGV[i]) {; printf "%s %s\n", a[1], a[2]; break; }; }; }; exit; }' <(my_command) "${ROWS[#]}"
Or better yet just use Bash instead as a whole:
#!/bin/bash
ROWS=(11 12)
while IFS=$' ' read -r LINE; do
IFS='|' read -ra FIELDS <<< "${LINE// +( )/|}"
for R in "${ROWS[#]}"; do
if [[ ${FIELDS[0]} == "$R" ]]; then
echo "${R} ${FIELDS[1]}"
break
fi
done
done < <(my_command)
It should give an output like:
11 File Name1
12 Fi leNa me2
Shell variables aren't expanded inside single-quoted strings. Use the -v option to set an awk variable to the shell variable:
fileName=$(my_command | awk -v i=$i -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]])"
This method avoids having to escape all the $ characters in the awk script, as required in konsolebox's answer.
As you already heard, you need to populate an awk variable from your shell variable to be able to use the desired value within the awk script so thi:
awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]]"
should be this:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
Also, though, you don't need awk AND grep since awk can do anything grep van do so you can change this part of your script:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
to this:
awk -v i="$i" -F "[[:space:]]{2,}+" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
and you don't need a + after a numeric range so you can change {2,}+ to just {2,}:
awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
Most importantly, though, instead of invoking awk once for every invocation of my_command, you can just invoke it once for all of them, i.e. instead of this (assuming this does what you want):
i=1
for num in rows
do
fileName=$(my_command | awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}')
echo "$num $fileName"
$((i++))
done
you can do something more like this:
for num in rows
do
my_command
done |
awk -F '[[:space:]]{2,}' '$2~/^[[:alnum:]]/{print NR, $2}'
I say "something like" because you don't tell us what "my_command", "rows" or "num" are so I can't be precise but hopefully you see the pattern. If you give us more info we can provide a better answer.
It's pretty inefficient to rerun my_command (and awk) every time through the loop just to extract one line from its output. Especially when all you're doing is printing out part of each line in order. (I'm assuming that my_command really is exactly the same command and produces the same output every time through your loop.)
If that's the case, this one-liner should do the trick:
paste -d' ' <(printf '%s\n' $rows) <(my_command |
awk -F '[[:space:]]{2,}+' '($2 ~ /^[::alnum::]/) {print $2}')

put awk code in bash and sort the result

I have a awk code for combining 2 files and add the result to the end of file.txt using ">>"
my code
NR==FNR && $2!=0 {two[$0]++;j=1; next }{for(i in two) {split(i,one,FS); if(one[3] == $NF){x=$4;sub( /[[:digit:]]/, "A", $4); print j++,$1,$2,$3,x,$4 | "column -t" ">>" "./Desktop/file.txt"}}}
i want put my awk to bash script and finaly sort my file.txt and save sorted result to file.txt again using >
i tried this
#!/bin/bash
command=$(awk '{NR==FNR && $2!=0 {two[$0]++;j=1; next }{for(i in two) {split(i,one,FS); if(one[3] == $NF){x=$4;sub( /[[:digit:]]/, "A", $4); print $1,$2,$3,$4 | "column -t" ">>" "./Desktop/file.txt"}}}}')
echo -e "$command" | column -t | sort -s -n -k4 > ./Desktop/file.txt
but it gives me error "for reading (no such a file or directory)"
where is my mistake?
Thanks in advance
1) you aren't specifying the input files for your awk script. This:
command=$(awk '{...stuff...}')
needs to be:
command=$(awk '{...stuff...}' file1 file2)
2) You move your awk condition "NR == ..." inside the action part so it will no longer behave as a condition.
3) Your awk script output is going into "file.txt" so "command" is empty when you echo it on the subsequent line.
4) You have unused variables x and j
5) You pass the arg FS to split() unnecessarily.
etc...
I THINK what you want is:
command=$( awk '
NR==FNR && $2!=0 { two[$0]++; next }
{
for(i in two) {
split(i,one)
if(one[3] == $NF) {
sub(/[[:digit:]]/, "A", $4)
print $1,$2,$3,$4
}
}
}
' file1 file2 )
echo -e "$command" | column -t >> "./Desktop/file.txt"
echo -e "$command" | column -t | sort -s -n -k4 >> ./Desktop/file.txt
but it's hard to tell.

I have two files, how should i compare these files using shell and awk script

I have two files
Content of file A
paybackFile_537214-760887_000_20120801.xml
paybackFile_354472-544899_000_20120801.xml
paybackFile_62-11033_000_20120801.xml
paybackFile_831669-837544_000_20120801.xml
===========================================
Total file(s) - 4
===========================================
Content of file B
14/08/2012 12:36:01: MSG: File paybackFile_537214-760887_000_20120801.xml.gpg decrypted successfully.
13/08/2012 11:36:01: MSG: File paybackFile_62-11033_000_20120801.xml.gpg not decrypted successfully.
Here i have names of .xml files.
From file A we check that **.xml file is present in file B and also check whether it has been decrypted successfully.
Could you please help me with this.
Thanks in advance.
Regards,
Smita
awk 'FNR==NR{a[$2".gpg"];next}(($5 in a) && ($0~/decrypted/))' filea fileb
Create a script named compare.awk. Paste this inside:
FILENAME=="fileB" && $5 ~ /xml/ {
if ($6 == "decrypted" && $7 == "successfully.") {
decrypted_file[$5] = 1;
} else {
decrypted_file[$5] = 2;
}
}
FILENAME=="fileA" && $2 ~ /xml/ {
if (decrypted_file[$2".gpg"] == 1) {
print $2" exist and decrypted";
} else if (decrypted_file[$2".gpg"] == 2) {
print $2" exist but not decrypted";
} else {
print $2" not exist in fileB";
}
}
Call it by:
awk -F' ' -f compare.awk fileB fileA
[EDIT] For shell without awk script, (still need grep, sed, cut and wc tho):
#!/bin/bash
TESTA=`grep ".xml" fileA | cut -d' ' -f2`
TESTB=`grep ".xml" fileB | cut -d' ' -f5,6,7 | sed 's/ /-/g'`
DECRYPT_YES=""
DECRYPT_NO=""
for B in ${TESTB}
do
DECRYPT_B=`echo ${B} | sed 's/.*gpg-decrypted-successfully\./1/'`
if [ ${DECRYPT_B} == "1" ]
then
DECRYPT_YES=${DECRYPT_YES}" "`echo ${B} | sed 's/\.gpg.*//g'`
else
DECRYPT_NO=${DECRYPT_NO}" "`echo ${B} | sed 's/\.gpg.*//g'`
fi
done
for FILE_A in ${TESTA}
do
if [ `echo ${DECRYPT_YES} | grep "${FILE_A}" | wc -l` == 1 ]
then
echo ${FILE_A}" exist and decrypted"
elif [ `echo ${DECRYPT_NO} | grep "${FILE_A}" | wc -l` == 1 ]
then
echo ${FILE_A}" exist but not decrypted"
else
echo ${FILE_A}" not exist"
fi
done
Here's a script:
#!/bin/sh
FILEA=fileA
FILEB=fileB
awk -F" " ' { print $2 } ' $FILEA > .tmpfileA
awk -F" " ' { print $5 } ' $FILEB | sed 's/\.gpg//' | grep 'decrypted successfully' > .tmpfileB
diff .tmpfileA .tmpfileB
rm -f .tmpfileA
rm -f .tmpfileB
All you'll need to change is the variables FILEA and FILEB
When executing it with the inputs you provided it gives the following result:
$ testAB.ksh
2d1
< paybackFile_521000-845442_000_20120701.xml
$

Resources