bash variables from nested for loops in awk - bash

I want to simply use the two for loop variables in my awk code but I can't. Please help or guide me in the right direction.
for i in {30,60,100};
do
for j in {7,8};
do
awk -v x=$i -v y=$j '{if ($NF <=x) print $0}' S_$i.txt > S_$i_$j.txt;
done;
done
This was the error I received.
awk: fatal: cannot open file S_.txt for reading (No such file or directory). I saw this error.

S_$i_$j.txt is trying to access a variable named $i_. Use S_${i}_${j}.txt instead but also always quote your shell variables so it should really be:
awk -v x="$i" -v y="$j" '{if ($NF <= x) print $0}' "S_${i}.txt" > "S_${i}_${j}.txt"
or more awkishly:
awk -v x="$i" -v y="$j" '$NF <= x' "S_${i}.txt" > "S_${i}_${j}.txt"
and note that you never use y inside your awk script so it could just be:
awk -v x="$i" '$NF <= x' "S_${i}.txt" > "S_${i}_${j}.txt"
but then it's not clear why you'd want to create 2 copies of your output with each inner loop.
Whatever you're doing, though, could almost certainly be done much faster with a single call to awk than calling it multiple times within shell loops!
The problem you asked about has absolutely nothing to do with for loop variables in my awk code btw, it's all shell fundamentals.

Thanks for your quick response.
However, I tried the following and it worked:
for i in {30,60,100};
do
for j in {7,8};
do
awk -v x=$i -v y=$j '{if ($NF <=x) print $0}' "S_"$j".txt" > "S_"$j"_"$i".txt";
done;
done;
Additionally, I realized that S_30.txt didn't exist. So when I changed it to "S_"$j".txt" it worked fine. My bad on that one.

Related

splitting file with awk command

I was trying to split a file into a training data set and a test data set. I have this error
awk: can't open file -v source line number 1.
The command line was as follows:
awk -v lines=$(wc -l < data/yelp/yelp_review.v8.csv) -v fact=0.80 'NR <= lines * fact {print > "train.txt"; next} {print > "val.txt"}' data/yelp/yelp_review.v8.csv
Anybody enlightens me why it was a problem on macbook?
Well .. miken32 has already identified what went wrong with your first attempt. I can't improve on his explanation of the problem.
My suggestion would be that rather than having wc provide your line count, you just do that job with awk itself. Something like this:
awk -v fact=0.8 'NR==FNR{lines++;next} FNR<=lines*fact{print>"train.txt";next} {print>"val.txt"}' "$file" "$file"
Though I'd probably write it more like this:
awk -v fact=0.8 'NR==FNR{lines++;next} {out="val.txt"} FNR<=lines*fact{out="train.txt"} {print > out}' "$file" "$file"
You can decide whether greater elegance is gained by brevity or avoidance of a next. :-)
What does the output from wc -l < data/yelp/yelp_review.v8.csv look like? Something like this perhaps?
74
So what's going to happen when you drop that into your command?
awk -v lines= 74 -v fact=0.80 ...
As you can see, this isn't going to parse well. Always quote any variable data you use:
awk -v lines="$(wc -l < data/yelp/yelp_review.v8.csv)" -v fact=0.80 ...
Awk is smart enough to trim the spaces from the number before using it.

Assign bash variables from one awk command?

Hoping someone can help me make my awk commands more efficient please!
Let's say my text file has around 30 lines of this type of thing:
ENTIRE:11.3.28.4.0
OSVER:Solaris11
VARFREE:3G
I'm assigning these to variables in a bash script like this:
ENTIRE=$(awk -F\: '$1 ~ /ENTIRE/ {print $2}' $HOSTFILE)
RELEASE=$(awk -F\: '$1 ~ /RELEASE/ {print $2}' $HOSTFILE)
OSVER=$(awk -F\: '$1 ~ /OSVER/ {print $2}' $HOSTFILE)
Because I have around 30 of these, it means awk is run 30 times, which is slow, and clearly not the best way.
Can anyone suggest how I can build these into one awk command please?
Thanks in advance!
You don't need awk at all. If modifying the original file isn't an option, use a while loop and the declare command to define each variable.
while IFS=: read name value; do
declare "$name=$value"
done < "$HOSTFILE"
An example:
$ IFS=: read name value <<< "foo:bar"
$ declare "$name=$value"
$ echo "$foo"
bar

i need to use variable in instead of direct date in shell script awk

i need to use variable in instead of direct date.
cat file | awk -F, '{ if ($1>"2012-08-20 11:30" && $1<"2012-08-22 16:00") print }'
thanks in advance
Based on your shown code, could you please try following and let me know if this helps you.(In lack of samples I haven't tested it)
awk -v date1="2012-08-20 11:30" -v date2="2012-08-22 16:00" -F, '($1>date1 && $1<date2)' Input_file
In case your variables are coming from shell to awk then following could help you on same, you could change date subtraction order as per your need too:
date1="2012-08-20 11:30"
date2="2012-08-22 16:00"
awk -v date_1="$date1" -v date_2="$date2" -F, '($1>date_1 && $1<date_2)' Input_file

File name substitution using awk and for loop

Hi I am trying to write dynamic filenames using variable substitution and I unable to figure out what am i missing here.
for i in `cat justPid.csv`
do
awk -v var="$i" -F"," '{if ($1==var) {print $0 }}' uniqPid.csv > "$i"file.txt
done
I have also tried the one below and many other combinations but it wont print multiple file names based on the $i.
for i in `cat justPid.csv`
do
awk -v var="$i" -F"," '{if ($1==var) {print $0 }}' uniqPid.csv > ${i}_file.txt
done
Any suggestions?
Edit:
my original intent is to split a 27gb file into manageable chunks based on PID (identifier in the file) so that it can be loaded onto R Studio for analysis. I am working on my laptop and not on a server hence the need to break them into small files.
Also I am using the ("new") ubuntu bash shell on windows.
The smaller test files I am working on look like what Jithin has posted. I will try out the suggestions and will update this post!
$cat justPid.csv
aaaa
bbbb
cccc
$cat uniqPid.csv
aaaa,1234567890
aaaa,aaaaaaaaaa
aaaa,bbbbbbbbbb
bbbb,1234567890
cccc,1234567890
dddd,cccccccccc
ffff,1234567890
I am not quite sure this is what you are looking for, let
input files
$cat justPid.csv
aaaa
bbbb
cccc
$cat uniqPid.csv
aaaa,1234567890
aaaa,aaaaaaaaaa
aaaa,bbbbbbbbbb
bbbb,1234567890
cccc,1234567890
dddd,cccccccccc
ffff,1234567890
script using for loop
for i in $(cat justPid.csv)
do
awk -v var=${i} -F, '$1==var' uniqPid.csv > ${i}_file.txt
done
script using while loop
while read -r i
do
awk -v var=${i} -F, '$1==var' uniqPid.csv > ${i}_file.txt
done < justPid.csv
Output
$ cat aaaa_file.txt
aaaa,1234567890
aaaa,aaaaaaaaaa
aaaa,bbbbbbbbbb
$ cat bbbb_file.txt
bbbb,1234567890
$ cat cccc_file.txt
cccc,1234567890
note: It is not advised to use for loop, see the link Use a while loop and the read command , Don't Read Lines With For
Without sample input/output it's just an untested guess but I THINK all you need is either::
awk -F, '{print > ($1"_file.txt")}' uniqPid.csv
or maybe:
awk -F, 'NR==FNR{a[$1];next} $1 in a{print > ($1"_file.txt")}' justPid.csv uniqPid.csv
So far I don't see any reason for a loop at all. You might need to close the output files as you go but we can address that if/when you provide sample input/output and tell us whether or not you have GNU awk.

Find string in col 1, print col 2 in awk

I'm on a Mac, and I want to find a field in a CSV file adjacent to a search string
This is going to be a single file with a hard path; here's a sample of it:
84:a5:7e:6c:a6:b0, AP-ATC-151g84
84:a5:7e:6c:a6:b1, AP-A88-131g84
84:a5:7e:73:10:32, AP-AG7-133g56
84:a5:7e:73:10:30, AP-ADC-152g81
84:a5:7e:73:10:31, AP-D78-152e80
so if my search string is "84:a5:7e:73:10:32"
I want to get returned "AP-AG7-133g56"
I had been working within an Applescript, but maybe a shell script will do.
I just need the proper syntax for opening the file and having awk search it. Again, I'm weak conceptually on how shell commands run, how they must be executed, etc
This errors, gives me ("command not found"):
set the_file to "/Users/Paw/Desktop/AP-Decoder 3.app/Contents/Resources/BSSIDtable.csv"
set the_val to "70:56:81:cb:a2:dc"
do shell script "'awk $1 ~ the_val {print $2} the_file'"
Thank you for coddling me...
This is a relatively simple:
awk '$1 == "70:56:81:cb:a2:dc," {print "The answer is "$2}' 'BSSIDtable.csv'
(the "The answer is " text can be omitted if you only wish to see only the data, but this shows you how to get more user-friendly output if desired).
The comma is included since awk uses white space for separators so the comma becomes part of column 1.
If the thing you're looking for is in a shell variable, you can use -v to provide that to awk as an awk variable:
lookfor="70:56:81:cb:a2:dc,"
awk -v mac=$lookfor '$1 == mac {print "The answer is "$2}' 'BSSIDtable.csv'
As an aside, your AppleScript solution is probably not working because the $1/$2 are being interpreted as shell variable rather than awk variables. If you insist on using AppleScript, you will have to figure out how to construct a shell command that quotes the awk commands correctly.
My advice is to just use the shell directly, the number of people proficient in that almost certainly far outnumber those proficient in AppleScript :-)
if sed is available (normaly on mac, event if not tagged in OP)
simple but read all the file
sed -n 's/84:a5:7e:73:10:32,[[:blank:]]*//p' YourFile
quit after first occurence (so average of 50% faster on huge file)
sed -n -e '/84:a5:7e:73:10:32,[[:blank:]]*/!b' -e 's///p;q' YourFile
awk
awk '/^84:a5:7e:73:10:32/ {print $2}'
# OR using a variable for batch interaction
awk -v Src='84:a5:7e:73:10:32' '$1 == Src {print $2}'
# OR assuming that case is unknow
awk -v Src='84:a5:7e:73:10:32' 'BEGIN{IGNORECASE=1} $1 == Src {print $2}'
by default it take $0 as compare test if a regex is present, just add the ^ to take first field content

Resources