Using a variable in awk command to name file - shell

I am splitting up a file with awk command, I want to name the file using a variable however I have not had much luck. Here is the line:
awk '/STX/ {f="$tp"++i;} {print > f}' $tp.mixed
this just creates files with $tp# as name.
I read the post "How to use shell variables in awk script" but was unable to figure out how to apply that to me question.

You can use this awk command to pass variable from command line:
awk -v tp="$tp" '/STX/{close(f); f=tp (++i)} f{print > f} END{close(f)}' "$tp.mixed"
Also important is to close the files you're opening for writing output. By calling close we are avoiding memory leak due to large number of opened files.

anubhava answered my question in the comments:
awk -v tp="$tp" '/STX/ {if (f) close(f); f = tp (++i)} {print > f} END{if (f) close(f)}' $tp.mixed
works
thank you everyone for your help

Related

bash variables from nested for loops in awk

I want to simply use the two for loop variables in my awk code but I can't. Please help or guide me in the right direction.
for i in {30,60,100};
do
for j in {7,8};
do
awk -v x=$i -v y=$j '{if ($NF <=x) print $0}' S_$i.txt > S_$i_$j.txt;
done;
done
This was the error I received.
awk: fatal: cannot open file S_.txt for reading (No such file or directory). I saw this error.
S_$i_$j.txt is trying to access a variable named $i_. Use S_${i}_${j}.txt instead but also always quote your shell variables so it should really be:
awk -v x="$i" -v y="$j" '{if ($NF <= x) print $0}' "S_${i}.txt" > "S_${i}_${j}.txt"
or more awkishly:
awk -v x="$i" -v y="$j" '$NF <= x' "S_${i}.txt" > "S_${i}_${j}.txt"
and note that you never use y inside your awk script so it could just be:
awk -v x="$i" '$NF <= x' "S_${i}.txt" > "S_${i}_${j}.txt"
but then it's not clear why you'd want to create 2 copies of your output with each inner loop.
Whatever you're doing, though, could almost certainly be done much faster with a single call to awk than calling it multiple times within shell loops!
The problem you asked about has absolutely nothing to do with for loop variables in my awk code btw, it's all shell fundamentals.
Thanks for your quick response.
However, I tried the following and it worked:
for i in {30,60,100};
do
for j in {7,8};
do
awk -v x=$i -v y=$j '{if ($NF <=x) print $0}' "S_"$j".txt" > "S_"$j"_"$i".txt";
done;
done;
Additionally, I realized that S_30.txt didn't exist. So when I changed it to "S_"$j".txt" it worked fine. My bad on that one.

Find string in col 1, print col 2 in awk

I'm on a Mac, and I want to find a field in a CSV file adjacent to a search string
This is going to be a single file with a hard path; here's a sample of it:
84:a5:7e:6c:a6:b0, AP-ATC-151g84
84:a5:7e:6c:a6:b1, AP-A88-131g84
84:a5:7e:73:10:32, AP-AG7-133g56
84:a5:7e:73:10:30, AP-ADC-152g81
84:a5:7e:73:10:31, AP-D78-152e80
so if my search string is "84:a5:7e:73:10:32"
I want to get returned "AP-AG7-133g56"
I had been working within an Applescript, but maybe a shell script will do.
I just need the proper syntax for opening the file and having awk search it. Again, I'm weak conceptually on how shell commands run, how they must be executed, etc
This errors, gives me ("command not found"):
set the_file to "/Users/Paw/Desktop/AP-Decoder 3.app/Contents/Resources/BSSIDtable.csv"
set the_val to "70:56:81:cb:a2:dc"
do shell script "'awk $1 ~ the_val {print $2} the_file'"
Thank you for coddling me...
This is a relatively simple:
awk '$1 == "70:56:81:cb:a2:dc," {print "The answer is "$2}' 'BSSIDtable.csv'
(the "The answer is " text can be omitted if you only wish to see only the data, but this shows you how to get more user-friendly output if desired).
The comma is included since awk uses white space for separators so the comma becomes part of column 1.
If the thing you're looking for is in a shell variable, you can use -v to provide that to awk as an awk variable:
lookfor="70:56:81:cb:a2:dc,"
awk -v mac=$lookfor '$1 == mac {print "The answer is "$2}' 'BSSIDtable.csv'
As an aside, your AppleScript solution is probably not working because the $1/$2 are being interpreted as shell variable rather than awk variables. If you insist on using AppleScript, you will have to figure out how to construct a shell command that quotes the awk commands correctly.
My advice is to just use the shell directly, the number of people proficient in that almost certainly far outnumber those proficient in AppleScript :-)
if sed is available (normaly on mac, event if not tagged in OP)
simple but read all the file
sed -n 's/84:a5:7e:73:10:32,[[:blank:]]*//p' YourFile
quit after first occurence (so average of 50% faster on huge file)
sed -n -e '/84:a5:7e:73:10:32,[[:blank:]]*/!b' -e 's///p;q' YourFile
awk
awk '/^84:a5:7e:73:10:32/ {print $2}'
# OR using a variable for batch interaction
awk -v Src='84:a5:7e:73:10:32' '$1 == Src {print $2}'
# OR assuming that case is unknow
awk -v Src='84:a5:7e:73:10:32' 'BEGIN{IGNORECASE=1} $1 == Src {print $2}'
by default it take $0 as compare test if a regex is present, just add the ^ to take first field content

How do I write an awk print command in a loop?

I would like to write a loop creating various output files with the first column of each input file, respectively.
So I wrote
for i in $(\ls -d /home/*paired.isoforms.results)
do
awk -F"\t" {print $1}' $i > $i.transcript_ids.txt
done
As an example if there were 5 files in the home directory named
A_paired.isoforms.results
B_paired.isoforms.results
C_paired.isoforms.results
D_paired.isoforms.results
E_paired.isoforms.results
I would like to print the first column of each of these files into a seperate output file, i.e. I would like to have 5 output files called
A.transcript_ids.txt
B.transcript_ids.txt
C.transcript_ids.txt
D.transcript_ids.txt
E.transcript_ids.txt
or any other name as long as it is 5 different names and I can still link them back to the original files.
I understand, that there is a problem with the double usage of $ in both the awk and the loop command, but I don't know how to change that.
Is it possible to write a command like this in a loop?
This should do the job:
for file in /home/*paired.isoforms.results
do
base=${file##*/}
base=${base%%_*}
awk -F"\t" '{print $1}' $file > $base.transcript_ids.txt
done
I assume that there can be spaces in the first field since you set the delimiter explicitly to tab. This runs awk once per file. There are ways to do it running awk once for all files, but I'm not convinced the benefit is significant. You could consider using cut instead of awk '{print $1}', too. Note that using ls as you did is less satisfactory than using globbing directly; it runs foul of file names with oddball characters (spaces, tabs, etc) in the name.
You can do that entirely in awk:
awk -F"\t" '{split(FILENAME,a,"_"); out=a[1]".transcript_ids.txt"; print $1 > out}' *_paired.isoforms.results
If your input files don't have names as indicated in the question, you'd have to split on something else ( as well as use a different pattern match for the input files ).
My original answer is actually doing extra name resolution every time something is printed. Here's a version that only updates the output filename when FILENAME changes:
awk -F"\t" 'FILENAME!=lf{split(FILENAME,a,"_"); out=a[1]".transcript_ids.txt"; lf=FILENAME} {print $1 > out}' *_paired.isoforms.results

Self-contained awk script: Saving to file, calling file

For a lab, I wrote a shell script that used awk to do some stuff. Rereading the lab's directions, it seems that I was supposed to write a self-contained awk script. I'm working on translating my bash script into awk, and I'm having a problem right now:
I want to save the output of an awk command to a new file, and then I want to use that output as input for another awk command.
In my bash script, I have this:
awk '/Blocked SPAM/' maillog > spamlog
cat spamlog | awk '{print $0}' RS=' '
It takes all the lines from maillog that contain the string "Blocked SPAM" and saves this to a new file titled spamlog. Then it opens spamlog and replaces every space character ' ' with a new line.
For my awk script, maillog is the file that is passed to the script from shell. My attempt at writing analogous code:
/Blocked SPAM/ > spamlog`
-f spamlog {print $0} RS=' '
I don't really know what I'm doing with my awk script since I'm having trouble finding useful resources for self-contained awk scripts.
awk '/Blocked SPAM/{ print > "spamlog"; gsub( " ","\n"); print }' maillog
Personally, I prefer to invoke that directly from a shell script, but you can easily make it an awk script by writing:
#!/usr/bin/awk -f
/Blocked SPAM/{ print > "spamlog"; gsub( " ","\n"); print }
Invoke that script with 'maillog' as an argument.

AWK: redirecting script output from script to another file with dynamic name

I know I can redirect awk's print output to another file from within a script, like this:
awk '{print $0 >> "anotherfile" }' 2procfile
(I know that's dummy example, but it's just an example...)
But what I need is to redirect output to another file, which has a dynamic name like this
awk -v MYVAR"somedinamicdata" '{print $0 >> "MYWAR-SomeStaticText" }' 2procfile
And the outpus should be redirected to somedinamicdata-SomeStaticText.
I know I can do it via:
awk '{print $0 }' 2procfile >> "$MYVAR-somedinamicdata"
But the problem is that it's a bigger awk script, and I have to output to several files depending on certain conditions (and this awk script is called from another bash, and it passes some dynamic variable via the -v switch... and son on.
Is it possible anyhow?
Thanks in advance.
i think
awk -v MYVAR="somedinamicdata" '{print $0 >> (MYVAR "-SomeStaticText") }' 2procfile
should do it. String concatenation in awk is just put one after another.

Resources