awk, printf, and an equation in a variable - bash

I am dynamically generation an equation for awk. Here's an example
//normally eq is dynamically generated
eq='$1+$2'
awk -v eq="${eq}" '{printf "%f\n", eq}' myTwoColumns.txt
Output
0
0
.
.
.
0
If it helps, when I add a "$" in front of "eq" in the awk statement:
awk -v eq="${eq}" '{printf "%f\n", $eq}' myTwoColumns.txt
the output is the first column. How do I get this to behave like I want it to? Thanks.
Edit: Here's what worked for me
eq='$1+$2'
awk "{printf \"%f\n\", $eq}" myTwoColumns.txt

Because, the awk program is embedded around single quotes, the interpolation of the variable is done at run time instead of compile time.
You will probably need to wrap your command in double quotes. For example, for a file like this:
$ cat file
1 2
1 3
2 4
$ eq='$1+$2'
$ awk "{ print $eq }" file
3
4
6
The variable will get expanded first and then be presented to awk to execute it.
As suggested by Ed Morton in comments you can do the following as well:
awk '{ print '"$eq"' }' file

Related

Prepend text to specific line numbers with variables

I have spent hours trying to solve this. There are a bunch of answers as to how to prepend to all lines or specific lines but not with a variable text and a variable number.
while [ $FirstVariable -lt $NextVariable ]; do
#sed -i "$FirstVariables/.*/$FirstVariableText/" "$PWD/Inprocess/$InprocessFile"
cat "$PWD/Inprocess/$InprocessFile" | awk 'NR==${FirstVariable}{print "$FirstVariableText"}1' > "$PWD/Inprocess/Temp$InprocessFile"
FirstVariable=$[$FirstVariable+1]
done
Essentially I am looking for a particular string delimiter and then figuring out where the next one is and appending the first result back into the following lines... Note that I already figured out the logic I am just having issues prepending the line with the variables.
Example:
This >
Line1:
1
2
3
Line2:
1
2
3
Would turn into >
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
You can do all that using below awk one liner.
Assuming your pattern starts with Line, then the below script can be used.
> awk '{if ($1 ~ /Line/ ){var=$1;print $0;}else{ if ($1 !="")print var $1}}' $PWD/Inprocess/$InprocessFile
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
Here is how the above script works:
If the first record contains word Line then it is copied into an awk variable var. From next word onwards, if the record is not empty, the newly created var is appended to that record and prints it producing the desired result.
If you need to pass the variables dynamically from shell to awk you can use -v option. Like below:
awk -v var1=$FirstVariable -v var2=$FirstVariableText 'NR==var{print var2}1' > "$PWD/Inprocess/Temp$InprocessFile"
The way you addressed the problem is by parsing everything both with bash and awk to process the file. You make use of bash to extract a line, and then use awk to manipulate this one line. The whole thing can actually be done with a single awk script:
awk '/^Line/{str=$1; print; next}{print (NF ? str $0 : "")}' inputfile > outputfile
or
awk 'BEGIN{RS="";ORS="\n\n";FS=OFS="\n"}{gsub(FS,OFS $1)}1' inputfile > outputfile

cut string in a specific column in bash

How can I cut the leading zeros in the third field so it will only be 6 characters?
xxx,aaa,00000000cc
rrr,ttt,0000000yhh
desired output
xxx,aaa,0000cc
rrr,ttt,000yhh
or here's a solution using awk
echo " xxx,aaa,00000000cc
rrr,ttt,0000000yhh"|awk -F, -v OFS=, '{sub(/^0000/, "", $3)}1'
output
xxx,aaa,0000cc
rrr,ttt,000yhh
awk uses -F (or FS for FieldSeparator) and you must use OFS for OutputFieldSeparator) .
sub(/srchtarget/, "replacmentstring", stringToFix) is uses a regular expression to look for 4 0s at the front of (^) the third field ($3).
The 1 is a shorthand for the print statement. A longhand version of the script would be
echo " xxx,aaa,00000000cc
rrr,ttt,0000000yhh"|awk -F, -v OFS=, '{sub(/^0000/, "", $3);print}'
# ---------------------------------------------------------^^^^^^
Its all related to awk's /pattern/{action} idiom.
IHTH
If you can assume there are always three fields and you want to strip off the first four zeros in the third field you could use a monstrosity like this:
$ cat data
xxx,0000aaa,00000000cc
rrr,0000ttt,0000000yhh
$ cat data |sed 's/\([^,]\+\),\([^,]\+\),0000\([^,]\+\)/\1,\2,\3/
xxx,0000aaa,0000cc
rrr,0000ttt,000yhh
Another more flexible solution if you don't mind piping into Python:
cat data | python -c '
import sys
for line in sys.stdin():
print(",".join([f[4:] if i == 2 else f for i, f in enumerate(line.strip().split(","))]))
'
This says "remove the first four characters of the third field but leave all other fields unchanged".
Using awks substr should also work:
awk -F, -v OFS=, '{$3=substr($3,5,6)}1' file
xxx,aaa,0000cc
rrr,ttt,000yhh
It just take 6 characters from 5 position in field 3 and set it back to field 3

Print a comma except on the last line in Awk

I have the following script
awk '{printf "%s", $1"-"$2", "}' $a >> positions;
where $a stores the name of the file. I am actually writing multiple column values into one row. However, I would like to print a comma only if I am not on the last line.
Single pass approach:
cat "$a" | # look, I can use this in a pipeline!
awk 'NR > 1 { printf(", ") } { printf("%s-%s", $1, $2) }'
Note that I've also simplified the string formatting.
Enjoy this one:
awk '{printf t $1"-"$2} {t=", "}' $a >> positions
Yeh, looks a bit tricky at first sight. So I'll explain, first of all let's change printf onto print for clarity:
awk '{print t $1"-"$2} {t=", "}' file
and have a look what it does, for example, for file with this simple content:
1 A
2 B
3 C
4 D
so it will produce the following:
1-A
, 2-B
, 3-C
, 4-D
The trick is the preceding t variable which is empty at the beginning. The variable will be set {t=...} only on the next step of processing after it was shown {print t ...}. So if we (awk) continue iterating we will got the desired sequence.
I would do it by finding the number of lines before running the script, e.g. with coreutils and bash:
awk -v nlines=$(wc -l < $a) '{printf "%s", $1"-"$2} NR != nlines { printf ", " }' $a >>positions
If your file only has 2 columns, the following coreutils alternative also works. Example data:
paste <(seq 5) <(seq 5 -1 1) | tee testfile
Output:
1 5
2 4
3 3
4 2
5 1
Now replacing tabs with newlines, paste easily assembles the date into the desired format:
<testfile tr '\t' '\n' | paste -sd-,
Output:
1-5,2-4,3-3,4-2,5-1
You might think that awk's ORS and OFS would be a reasonable way to handle this:
$ awk '{print $1,$2}' OFS="-" ORS=", " input.txt
But this results in a final ORS because the input contains a newline on the last line. The newline is a record separator, so from awk's perspective there is an empty last record in the input. You can work around this with a bit of hackery, but the resultant complexity eliminates the elegance of the one-liner.
So here's my take on this. Since you say you're "writing multiple column values", it's possible that mucking with ORS and OFS would cause problems. So we can achieve the desired output entirely with formatting.
$ cat input.txt
3 2
5 4
1 8
$ awk '{printf "%s%d-%d",t,$1,$2; t=", "} END{print ""}' input.txt
3-2, 5-4, 1-8
This is similar to Michael's and rook's single-pass approaches, but it uses a single printf and correctly uses the format string for formatting.
This will likely perform negligibly better than Michael's solution because an assignment should take less CPU than a test, and noticeably better than any of the multi-pass solutions because the file only needs to be read once.
Here's a better way, without resorting to coreutils:
awk 'FNR==NR { c++; next } { ORS = (FNR==c ? "\n" : ", "); print $1, $2 }' OFS="-" file file
awk '{a[NR]=$1"-"$2;next}END{for(i=1;i<NR;i++){print a[i]", " }}' $a > positions

Awk changes tabs to spaces

Data:
Sandnes<space>gecom<tab>Hansen<tab>Ola<space>Timoteivn<space>10
I am substituting a specific column (ex:2th column) value with a variable in a file. So I am using the command:
varz="zipval"
awk -v VAR=$varz '{$2=VAR}1' OutputFile.log
The awk substitute all the tabs to space after processing. So I have used OFS="\t" .
But it removes every space to tabs
Sandnes<tab>gecom<tab>Hansen<tab>zipval<tab>Timoteivn<tab>10
How to handle it.
Thanks
Your problem is that awk splits your input on FS=[ \t]+ and then reassembles it with OFS=' ' or OFS='\t'. I don't think you can get around doing an extra split. Something like this works:
<data awk -v VAR="$varz" 'BEGIN { FS=OFS="\t" } { split($1, a, " +"); $1 = a[1]" "VAR } 1'
Output:
Sandnes zipval^IHansen^IOla Timoteivn 10
Use this script to pass column no to your awk script:
varz="zipval"
awk -v VAR=$varz -v N=6 '{sub($N, VAR)}1' OutputFile.log
The below is working fine at my place:
> setenv var "hi"
> echo "1 2 3 4 5 6 7" | awk -v var1=$var '{$6=var1}1'
1 2 3 4 5 hi 7
>
You didn't post your desired output or even tell us which specific text you wanted replaced ("2th field" could mean several things) so this is a guess, but assuming your input file is tab-separated fields, you just need to quote your shell variable and assign FS as well as OFS:
varz="zipval"
awk -v VAR="$varz" 'BEGIN{FS=OFS="\t"} {$2=VAR} 1' OutputFile.log
I'd also recommend you don't use all-upper case for your variable name since that's used to identify awk builtin variables (NR, NF, etc.).

Deleting the first two lines of a file using BASH or awk or sed or whatever

I'm trying to delete the first two lines of a file by just not printing it to another file. I'm not looking for something fancy. Here's my (failed) attempt at awk:
awk '{ (NR > 2) {print} }' myfile
That throws out the following error:
awk: { NR > 2 {print} }
awk: ^ syntax error
Example:
contents of 'myfile':
blah
blahsdfsj
1
2
3
4
What I want the result to be:
1
2
3
4
Use tail:
tail -n+3 file
from the man page:
-n, --lines=K
output the last K lines, instead of the last 10; or use -n +K
to output lines starting with the Kth
How about:
tail +3 file
OR
awk 'NR>2' file
OR
sed '1,2d' file
You're nearly there. Try this instead:
awk 'NR > 2 { print }' myfile
awk is rule based, and the rule appears bare (i.e., without braces) before the block it woud execute if it passes.
Also as Jaypal has pointed out, in awk if all you want to do is print the line that matches the rules you can even omit the action, thus simplifying the command to:
awk 'NR > 2' myfile
awk is based on pattern{action} statements. In your case, the pattern is NR>2 and the action you want to perform is print. This action is also the default action of awk.
So even though
awk 'NR>2{print}' filename
would work fine, you can shorten it to
awk 'NR>2' filename.

Resources