I have following script and trying to run it:
BEGIN {
start = 0
}
{
if (match($0, "<WorkflowProcess ")) {
startTag++
}
if ((startTag < 2) || (endTag == startTag)) {
print
}
if (match($0, "</WorkflowProcess>")) {
endTag++
}
}
However I always get this error:
awk: syntax error near line 6
awk: illegal statement near line 6
awk: syntax error near line 10
awk: bailing out near line 10
Any thoughts? I have tried to convert it via dos2unix and also with tr -d '\r' but it's still the same issue. The input parameter is in my opinion corect when I am sending a fullpath with file name and extention (/export/home/test/file.txt). All files have 0777.
How do you try to run that program?
If you use awk "... all that program ...", then the shell will expand $0 to its own path, which probably has a leading /... Although, now that I look at it, that should fail earlier with the internal ". Still, it would be useful to see the precise command line.
By the way, why are you calling match? It would be much more idiomatic to write:
awk '
/<WorkflowProcess / { ++startTag }
startTag < 2 || startTag == endTag { print }
/</WorkflowProcess>/ { ++endTag }
'
which avoids the explicit use of $0 altogether.
On SunOS nawk is often the better choice :
nawk -f script.awk /export/home/test/file.txt
Just an idea, in the BEGIN rule you initialize start, not startTag, but then you increment startTag in the next rule. I know, this works in GNU awk and all, but maybe you should try initializing startTag.
Related
I'm studying awk pretty fiercely to write a git diffn implementation which will show line numbers for git diff, and I want confirmation on whether or not this Wikipedia page on awk is wrong [Update: I've now fixed this part of that Wikipedia page, but this is what it used to say]:
(pattern)
{
print 3+2
print foobar(3)
print foobar(variable)
print sin(3-2)
}
Output may be sent to a file:
(pattern)
{
print "expression" > "file name"
}
or through a pipe:
(pattern)
{
print "expression" | "command"
}
Notice (pattern) is above the opening brace. I'm pretty sure this is wrong but need to know for certain before editing the page. What I think that page should look like is this:
/regex_pattern/ {
print 3+2
print foobar(3)
print foobar(variable)
print sin(3-2)
}
Output may be sent to a file:
/regex_pattern/ {
print "expression" > "file name"
}
or through a pipe:
/regex_pattern/ {
print "expression" | "command"
}
Here's a test to "prove" it. I'm on Linux Ubuntu 18.04.
1. test_awk.sh
gawk \
'
BEGIN
{
print "START OF AWK PROGRAM"
}
'
Test and error output:
$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
gawk: cmd. line:3: BEGIN blocks must have an action part
But with this:
2. test_awk.sh
gawk \
'
BEGIN {
print "START OF AWK PROGRAM"
}
'
It works fine!:
$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
START OF AWK PROGRAM
Another example (fails to provide expected output):
3. test_awk.sh
gawk \
'
/hey/
{
print $0
}
'
Erroneous output:
$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
hey1
hey1
hello
hey2
hey2
But like this:
4. test_awk.sh
gawk \
'
/hey/ {
print $0
}
'
It works as expected:
$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
hey1
hey2
Updates: after solving this problem, I just added these sections below:
Learning material:
In the process of working on this problem, I just spent several hours and created these examples: https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world/tree/master/awk. These examples, comments, and links would prove useful to anyone getting started learning awk/gawk.
Related:
git diff with line numbers and proper code alignment/indentation
"BEGIN blocks must have an action part" error in awk script
The whole point of me learning awk at all in the first place was to write git diffn. I just got it done: Git diff with line numbers (Git log with line numbers)
I agree with you that the Wikipedia page is wrong. It's right in the awk manual:
A pattern-action statement has the form
pattern { action }
A missing { action } means print the line; a missing pattern always matches. Pattern-action statements are separated by newlines or semicolons.
...
Statements are terminated by semicolons, newlines or right braces.
This the man page for the default awk on my Mac. The same information is in the GNU awk manual, it's just buried a little deeper. And the POSIX specification of awk states
An awk program is composed of pairs of the form:
pattern { action }
Either the pattern or the action (including the enclosing brace characters) can be omitted.
A missing pattern shall match any record of input, and a missing action shall be equivalent to:
{ print }
You can see in you examples that instead of semicolons at the end of statements you can separate them with new lines. When you have
/regex/
{ ...
}
it's equivalent to /regex/; {...} which is equal to /regex/{print $0} {...} as you tested the behavior.
Note that BEGIN and END are special markers and they need action statements explicitly since for BEGIN {print $0} is not possible as the default action. That's why the open curly brace should be on the same line. Perhaps due to convenience but it's all consistent.
I am trying to collate a series of .csv log files that are named by date (e.g., 2019-02-24.csv). There are a bunch of them, so I'm trying to script the process. I've crafted an AWK script that combines individual files:
awk ' FNR==1 { while (/"_time",PIN,FULLNAME,OFFICE,Acronym,Name/) getline; } 1 { print } ' 2019-01-01.csv >> usage_history.csv
But I am failing when I try to string the AWK commands together with a control loop in BASH:
for i in {01..28}; do echo "awk ' FNR==1 { while (/\"_time\",PIN,FULLNAME,OFFCODE,Acronym,Name/) getline; } 1 { print } ' 2019-01-$i.csv >> user_history.csv"; done
When I run this, it prints out the correct commands to the command line, but the awk scripts are not executed (they only get printed). If I run it without echo, I get errors telling me that the file doesn't exist; though all files are present:
bash: awk ' FNR==1 { while (/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/) getline; } 1 { print } ' 2019-01-01.csv >> user_history.csv: No such file or directory
What am I missing in my loop?
Here is a condensed sample of the command and the error messages:
$ for i in {01..02}; do "awk ' FNR==1 { while (/\"_time\",PIN,FULLNAME,OFFCODE,Acronym,Name/) getline; } 1 { print } ' 2019-01-$i.csv >> user_history.csv"; done
bash: awk ' FNR==1 { while (/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/) getline; } 1 { print } ' 2019-01-01.csv >> user_history.csv: No such file or directory
bash: awk ' FNR==1 { while (/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/) getline; } 1 { print } ' 2019-01-02.csv >> user_history.csv: No such file or directory
Could you please try following.
awk '!/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/' 2019-01-[0-9]*.csv >> user_history.csv
Here following are the points why one could use this approach:
1- Use of for loop and calling awk command in that each time will be a overkill. We should use smart approach when awk could read multiple files then we should sue it.
2- Now comes the getline part which you tried in your code, so if we want to negate any string then simply negate it by using !/string_to_be_skipped/ so it will look for only those lines which are NOT having this string.
3- While mentioning file(multiple files) to single awk command I used 2019-01-[0-9]*.csv why because since you have NOT told if files will be created daily basis or not so in case we give it a loop style and that specific file is NOT present then we will get an error. For an example let's say I use following awk command where I intentionally removed file named(2019-01-02.csv).
awk '........' 2019-01-{01..29}.csv
awk: cannot open 2019-01-02.csv (No such file or directory)
So to avoid these kind of situations I have used 2019-01-[0-9]*.csv where it will only look for files which have digits after 2019-01-0 and will loop NOT run in a loop and complaint us that some xyz etc file is missing.
Try this:
for i in {01..28}; do awk '!/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/' 2019-01-$i.csv >>user_history.csv;done
The commands after do should not be quoted.
And what you were doing essentially equals to ignore the title lines.
The {print} after 1 is unnecessary -- single 1 implies {print}. The 1 is to provide a true.
-- When there's only an expression but no block, the block implies to {print}.
-- And only a regexp equals $0~/regex/, and here I negated it.
If there's no other command inside the loop, you can simplify the loop with one awk command:
awk '!/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/' 2019-01-{01..28}.csv >>user_history.csv
But this one will throw error and stop executing when one of the files not existed.
Another way is:
awk '!/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/' 2019-01-[0-3][0-9].csv >>user_history.csv
This one will only match filenames, instead of loop for them.
It won't stop executing nor throw error, So if there's file missing you wouldn't know. And it will match extra files if exist.
For example it will read 2019-01-34.csv if it exists.
So if you want the warnings (warnings won't affect the results), but don't want the commands to stop, then use the first for loop one.
Pitfalls:
[0-3][1-9] won't match 10,20 and 30, but will match 32 to 39.
[0-9]* will match any longer number, but with 20 to 29 before 3 or likewise, it's string order.
Thanks to #Tiw and #RavinderSingh13 for their guidance. Here is the final awk script that is working well for my case where I have daily files from multiple days, months, and years (only 2018 and 2019 in this case):
awk '!/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/' 201[8-9]-[0-1][0-2]-[0-3][0-9].csv >> user_history.csv
I need to change the 'data_directory' attribute in the postgresql.conf file permanently. I tried to get this done using awk command as shown below. (I have commented the existing attribute to make room for the new entry below.)
sudo awk '/data_directory = '/var/lib/postgresql/10/main'/ { print; print "data_directory = 'someDir/postgresql/10/main'"; next }1' /home/ubuntu/postgresql.conf
But I receive the syntax error right below the first print as shown below.
awk: cmd. line:1: /data_directory = /var/lib/postgresql/10/main/ { print; print "data_directory = AquilaData/postgresql/10/main"; next }1
awk: cmd. line:1: ^ syntax error
Can't figure out why. Any help is appreciated. Thank you.
Edit: I need to retain the single quotes around directory path.
Here there are two issues to take care - 1. Single quote, 2. that '/' in the path.
I defined a sample test.conf as below, /var/lib will be replaced with /sample inside the single quotes.
$ cat test.conf
data_directory = '/var/lib/postgresql/10/main'
Then defined two variables
$ SRC="data_directory = '/var/lib/postgresql/10/main'"
$ DST="data_directory = '/sample/postgresql/10/main'"
Single line sed command does the job. I used '+' as separator and double quote for sed.
$ sed -i "s+$SRC+$DST+g" test.conf
See the result.
$ cat test.conf
data_directory = '/sample/postgresql/10/main'
/ is used to enclose a test with regex, but / char is found in the pathname /var/lib/postgresql/10/main, this is why this is causing the syntax error.
Either try to define a variable line in the BEGIN which is executed only once before actually parsing the file:
sudo awk 'BEGIN { line="data_directory = '/var/lib/postgresql/10/main'" } ; $0 ~ line { ....
Or escape the / in the pathname used in the regex:
udo awk '/data_directory = '[/]var[/]lib[/]postgresql[/]10[/]main'/ { ...
/data_directory = /var/lib/postgresql/10/main/
The filter command works as /string/. There are multiple / characters in the string you want to filter - you need to escape them. Try:
/data_directory = \/var\/lib\/postgresql\/10\/main/
The \/ tells awk that this / is not the end of filter command, but it's a part of string to filter.
sudo awk '/data_directory = \/var\/lib\/postgresql\/10\/main/ { print "data_directory = someDir/postgresql/10/main"; next }1' /home/ubuntu/postgresql.conf
However I would probably go with:
awk -v VAR="someDir/postgresql/10/main" '/^data_directory = /{ print "data_directory = ",VAR; next; }1'
A example can be found on tutorialspoint.
I'm trying to split a file using AWK one-line but the code below that I came with is not working properly.
awk '
BEGIN { idx=0; file="original_file.split." }
/^REC_DELIMITER.(HIGH|TOP)$/ { idx++ }
/^REC_DELIMITER.TOP$/,/^REC_DELIMITER.(HIGH|TOP)$/ { print > file sprintf("%03d", idx) }
' original_file
Test file is "original_file":
REC_DELIMITER.TOP
lineA1
lineA2
lineA3
REC_DELIMITER.HIGH
lineB1
lineB2
lineB3
REC_DELIMITER.TOP
lineC1
lineC2
lineC3
REC_DELIMITER.HIGH
lineD1
lineD2
lineD3
AWK code above is for REC_DELIMITER.TOP and it is giving me these files:
original_file.split.001:
REC_DELIMITER.TOP
original_file.split.003:
REC_DELIMITER.TOP
however, I'm trying to get this:
original_file.split.001:
REC_DELIMITER.TOP
lineA1
lineA2
lineA3
original_file.split.003:
REC_DELIMITER.TOP
lineC1
lineC2
lineC3
There will be other record delimiters, and when needed, we can run for them like REC_DELIMITER.HIGH, this way getting files like below:
original_file.split.002:
REC_DELIMITER.HIGH
lineB1
lineB2
lineB3
original_file.split.004:
REC_DELIMITER.HIGH
lineD1
lineD2
lineD3
Any help guys is very appreciate, I have been trying to get this working past few days and AWK code above is the best I was able to get. I need now help from AWK masters. :)
Thank you!
You can try something like this:
awk '
/REC_DELIMITER\.TOP/ {
a=1
b=0
file = sprintf (FILENAME".split.%03d",++n)
}
/REC_DELIMITER\.HIGH/ {
b=1
a=0
file = sprintf (FILENAME".split.%03d",++n)
}
a {
print $0 > file
}
b {
print $0 > file
}' file
You need something like this (untested):
awk -v dtype="TOP" '
BEGIN { dbase = "^REC_DELIMITER\\."; delim = dbase dtype "$" }
$0 ~ dbase { inBlock=0 }
$0 ~ delim { inBlock=1; idx++ }
inBlock { print > sprintf("original_file.split.%03d", idx) }
' original_file
awk -vRS=REC_DELIMITER '/^.TOP\n/{print RS $0 > sprintf("original_file.split.%03d",n)};!++n' original_file
(Give or take an extra newline at the end.)
Generally, when input is supposed to be treated as a series of multi-line records with a special line as delimiter, the most direct approach is to set RS (and often ORS) to that delimiter.
Normally you'd want to add newlines to its beginning and/or end, but this case is a little special so it's easier without them.
Edited to add: You need GNU Awk for this. Standard Awk considers only the first character of RS.
I made some changes so the different delimiters go to the their own file, even when they occur later in the file. make a file like splitter.awk with the contents below, the chmod +x it and run it with ./splitter.awk original_file
#!/usr/bin/awk -f
BEGIN {
idx=0;
file="original_file.split.";
out=""
}
{
if($0 ~ /^REC_DELIMITER.(TOP|HIGH)/){
if (!cnt[$0]) {
cnt[$0] = ++idx;
}
out=cnt[$0];
}
print > file sprintf("%03d", out)
}
I'm not very used to AWK, however, plasticide's answer put me towards right direction and I finally got AWK script working as requirements.
In below code, first IF turn echo to 0 if a demilier is found. Second IF turn echo to 1 if the wanted delimiter is found, then the want ones are are split from file.
I know regex could be something like /^(REC_(DELIMITER\.(TOP|HIGH|LOW)|NO_CATEGORY)$/ but since regex is created dynamically via shellscript that reads from an specific file a list of delimiters, it will look more like in AWK below.
awk 'BEGIN {
idx=0; echo=1; file="original_file.split."
}
{
#All the delimiters to consider in given file
if($0 ~ /^(REC_DELIMITER.TOP|REC_DELIMITER.HIGH|REC_DELIMITER.LOW|REC_NO_CATEGORY)$/) {
echo=0
}
#Delimiters that should actually be pulled
if($0 ~ /^(REC_DELIMITER.HIGH|REC_DELIMITER.LOW)$/ {
idx++; echo=1
}
#Print to a file is match wanted delimmiter
if(echo) {
print > file idx
}
}' original_file
Thank you all. I really appreciate it very much.
I have an awk script that I have defined thus:
#!/usr/bin/env awk
BEGIN { if (!len) len = 1; end = start + len }
{ for (i = start; i < end; i++) { print $1 } }
I have saved it as columns and chmod +x'd it. I want invoke it so that start and end are defined as it traverses over a file. I was thinking this should work:
cat some_file | columns -v start=2
But it doesn't. Help!
Try using:
#!/usr/bin/awk -f
as an interpreter
env is the easiest way to handle this problem:
#!/usr/bin/env -S awk -f
to add more options, and to ensure no interference with your arguments, and awk's arguments:
#!/usr/bin/env -S awk -F: -f ${_} --
BEGIN {
# delete argv[1], which == ENVIRON[_]
delete ARGV[1]
} # rest of my awk program
as env has a POSIX standard, this shbang should get you around the difficulties of non-standard shbang implementations across unixen.
EDIT
after having written this I realized that '-S' is a non-POSIX compliant FreeBSD env extension. So shell wrapper is probably the way to go, unfortunate as that is.
Unfortunately, this is not easy to solve in a portable way. The standard technique looks like this (substitute /usr/bin/awk for your awk path):
#!/usr/bin/awk -f
BEGIN { if (!len) len = 1; end = start + len }
{ for (i = start; i < end; i++) { print $1 } }
The hard-coded awk path and non-standard -f flag, makes this not portable across all *nixes. If you are only ever going to run your script on one machine, then this may work fine. However, to make a portable awk script, you will need to wrap it in a shell script. Here are two ways that you can do it in one file:
The first way is standard and easy to read:
#!/bin/sh
awk '
BEGIN { if (!len) len = 1; end = start + len }
{ for (i = start; i < end; i++) { print $1 } }
' "$#"
Unfortunately, this falls short in two key ways:
If your awk script contains a ' character, you will need to type it like this: '"'"' to "escape" it.
If you are using a text editor with syntax highlighting, you will not get your awk script properly highlighted.
Another solution, is to use sed to strip out the sh wrapper:
#!/bin/sh
exec awk "$(sed '1,2d' "$0")" "$#"
BEGIN { if (!len) len = 1; end = start + len }
{ for (i = start; i < end; i++) { print $1 } }
This is something like a two line shabang header. It calls awk using the file from line 3 down as the script argument. This allows you to keep your pretty syntax highlighting and you can still use ' characters to your heart's content. The two downsides I see are:
Arguably this is non-intuitive to read.
If you are using a code linter, it may not like this.
Below is the answer for this problem -
#!/bin/awk -f