AWK - Unable to update the files - bash

I have 300 txt files in my directory in the following format
regional_vol_WM_atlas[1-300].txt
651328 651328
553949 553949
307287 307287
2558 2558
The following awk script was supposed to create a new file by performing calculation on fourth row of each existing files in my directory .
#!/bin/bash
awk=/usr/bin/awk
awkcommand='
FNR == 1 {
newfilename = FILENAME ; sub(".txt", "_prop.txt", newfilename)
printf "" > newf
ilename
}
FNR == 4 {
$1=($1/0.824198)*0.8490061
$2=($2/0.824198)*0.8490061
}
{
print >> newfilename
}
'regional_vol_WM_atlas[0-9].txt regional_vol_WM_atlas[0-9][0-9].txt regional_vol_WM_atlas1[0-4][0-9].txt regional_vol_WM_atlas15[02].txt
Unfortunately i could not update any file in the directory ,when i run the file, i am incurring following error
dev#dev-OptiPlex-780:/media/dev/Daten/Task1/subject1/t1$ '/media/dev/Daten/Task1/subject1/t1/Method'
/media/dev/Daten/Task1/subject1/t1/Method: line 18: regional_vol_WM_atlas10.txt: command not found
Could you please correct me where i am wrong

Your script is not calling awk. It defines a variable named awk and then tries to execute the file regional_vol_WM_atlas10.txt with the variable awkcommand set in its environment. Alas, that file is not in your PATH, so bash cannot find it. You need to instead do:
awk "$awkcommand" file1 file2 ...
(where file1, file2, etc. are the input files you want to use as input.)
Also, note that your current script is appending the literal text regional_vol_WM_atlas[0-9].txt to the end of the awk command (or if a file exists which matches that glob, the name of that file is being appended), which you do not want. Overall, what you were trying to do should have been written:
#!/bin/bash
awkcommand='
FNR == 1 {
newfilename = FILENAME ; sub(".txt", "_prop.txt", newfilename)
printf "" > newfilename
}
FNR == 4 {
$1=($1/0.824198)*0.8490061
$2=($2/0.824198)*0.8490061
}
{
print >> newfilename
}
'
awk "$awkcommand" regional_vol_WM_atlas[0-9].txt \
regional_vol_WM_atlas[0-9][0-9].txt \
regional_vol_WM_atlas1[0-4][0-9].txt \
regional_vol_WM_atlas15[02].txt

The problem is that a variable can be assigned for a command, for example:
x='hello' some_command
Which in effect is what bash thinks you are trying to do. The culprit is the whitespace, which acts as a command separator, so just escape (prefix with a \) the whitespace in the list of filenames:
#!/bin/bash
awk=/usr/bin/awk
awkcommand='
FNR == 1 {
newfilename = FILENAME ; sub(".txt", "_prop.txt", newfilename)
printf "" > newf
ilename
}
FNR == 4 {
$1=($1/0.824198)*0.8490061
$2=($2/0.824198)*0.8490061
}
{
print >> newfilename
}
'\ regional_vol_WM_atlas[0-9].txt\ regional_vol_WM_atlas[0-9][0-9].txt\ regional_vol_WM_atlas1[0-4][0-9].txt\ regional_vol_WM_atlas15[02].txt
The only thing I have altered is the final line.

Related

Retreive specific values from file

I have a file test.cf containing:
process {
withName : teq {
file = "/path/to/teq-0.20.9.txt"
}
}
process {
withName : cad {
file = "/path/to/cad-4.0.txt"
}
}
process {
withName : sik {
file = "/path/to/sik-20.0.txt"
}
}
I would like to retreive value associated at the end of the file for teq, cad and sik
I was first thinking about something like
grep -E 'teq' test.cf
and get only second raw and then remove part of recurrence in line
But it may be easier to do something like:
for a in test.cf
do
line=$(sed -n '{$a}p' test.cf)
if line=teq
#next line using sed -n?
do print nextline &> teq.txt
else if line=cad
do print nextline &> cad.txt
else if line=sik
do print nextline &> sik.txt
done
(obviously it doesn't work)
EDIT:
output wanted:
teq.txt containing teq-0.20.9, cad.txt containing cad-4.0 and sik.txt containing sik-20.0
Is there a good way to do that? Thank you for your comments
Based on your given sample:
awk '/withName/{close(f); f=$3 ".txt"}
/file/{sub(/.*\//, ""); sub(/\.txt".*/, "");
print > f}' ip.txt
/withName/{close(f); f=$3 ".txt"} if line contains withName, save filename in f using the third field. close() will close any previous file handle
/file/{sub(/.*\//, ""); sub(/\.txt".*/, ""); if line contains file, remove everything except the value required
print > f print the modified line and redirect to filename in f
if you can have multiple entries, use >> instead of >
Here is a solution in awk:
awk '/withName/{name=$3} /file =/{print $3 > name ".txt"}' test.cf
/withName/{name=$3}: when I see the line containing "withName", I save that name
When I see the line with "file =", I print

Split CSV into two files based on column matching values in an array in bash / posh

I have a input CSV that I would like to split into two CSV files. If the value of column 4 matches any value in WLTarray it should go in output file 1, if it doesn't it should go in output file 2.
WLTarray:
"22532" "79994" "18809" "21032"
input CSV file:
header1,header2,header3,header4,header5,header6,header7,header8
"83","6344324","585677","22532","Entitlements","BX","22532:718","36721"
"83","1223432","616454","79994","Compliance Stuff","DR","79994:64703","206134"
"83","162217","616454","83223","Data Enrichment","IEO","83223:64701","206475"
"83","267216","616457","79994","Compliance Engine","ABC","79994:64703","206020"
output CSV file1:
header1,header2,header3,header4,header5,header6,header7,header8
"83","6344324","585677","22532","Entitlements","BX","22532:718","36721"
"83","1223432","616454","79994","Compliance Stuff","DR","79994:64703","206134"
"83","267216","616457","79994","Compliance Engine","ABC","79994:64703","206020"
output CSV file2:
header1,header2,header3,header4,header5,header6,header7,header8
"83","162217","616454","83223","Data Enrichment","IEO","83223:64701","206475"
I've been looking at awk to filter this (python & perl not an option in my environment) but I think there is probably a much smarter way:
declare -a WLTarray=("22532" "79994" "18809" "21032")
for WLTvalue in "${WLTarray[#]}" #Everything in the WLTarray will go to $filename-WLT.tmp
do
awk -F, '($4=='$WLTvalue'){print}' $filename.tmp >> $filename-WLT.tmp #move the lines to the WLT file
# now filter to remove non matching values? why not just move the rows entirely?
done
With regular awk you can make use of split and substr (to handle double-quote removal for comparison) and split the csv file as you indicate. For example you can use:
awk 'BEGIN { FS=","; s="22532 79994 18809 21032"
split (s,a," ") # split s into array a
for (i in a) # loop over each index in a
b[a[i]]=1 # use value in a as index for b
}
FNR == 1 { # first record, write header to both output files
print $0 > "output1.csv"
print $0 > "output2.csv"
next
}
substr($4,2,length($4)-2) in b { # 4th field w/o quotes in b?
print $0 > "output1.csv" # write to output1.csv
next
}
{ print $0 > "output2.csv" } # otherwise write to output2.csv
' input.csv
Where:
in the BEGIN {...} rule you set the field separator (FS) to break on comma, and split the string containing your desired output1.csv field 4 matches into the array a, then loops over the values in a using them for the indexes in array b (to allow a simple i in b check);
the first rule is applied to the first records in the file (the header line) which is simply written out to both output files;
the next rule removes the double-quotes surrounding field-4 and then checks if the number in field-4 matches an index in array b. If so the record is written to output1.csv otherwise it is written to output2.csv.
Example Input File
$ cat input.csv
header1,header2,header3,header4,header5,header6,header7,header8
"83","6344324","585677","22532","Entitlements","BX","22532:718","36721"
"83","1223432","616454","79994","Compliance Stuff","DR","79994:64703","206134"
"83","162217","616454","83223","Data Enrichment","IEO","83223:64701","206475"
"83","267216","616457","79994","Compliance Engine","ABC","79994:64703","206020"
Resulting Output Files
$ cat output1.csv
header1,header2,header3,header4,header5,header6,header7,header8
"83","6344324","585677","22532","Entitlements","BX","22532:718","36721"
"83","1223432","616454","79994","Compliance Stuff","DR","79994:64703","206134"
"83","267216","616457","79994","Compliance Engine","ABC","79994:64703","206020"
$ cat output2.csv
header1,header2,header3,header4,header5,header6,header7,header8
"83","162217","616454","83223","Data Enrichment","IEO","83223:64701","206475"
You can use gawk like this:
test.awk
#!/usr/bin/gawk -f
BEGIN {
split("22532 79994 18809 21032", a)
for(i in a) {
WLTarray[a[i]]
}
FPAT="[^\",]+"
}
NR > 1 {
if ($4 in WLTarray) {
print >> "output1.csv"
} else {
print >> "output2.csv"
}
}
Make it executable and run it like this:
chmod +x test.awk
./test.awk input.csv
using grep with a filter file as input was the simplest answer.
declare -a WLTarray=("22532" "79994" "18809" "21032")
for WLTvalue in "${WLTarray[#]}"
do
awkstring="'\$4 == "\"\\\"$WLTvalue\\\"\"" {print}'"
eval "awk -F, $awkstring input.csv >> output.WLT.csv"
done
grep -v -x -f output.WLT.csv input.csv > output.NonWLT.csv

Using awk to manipulate data from two sources

As part of CI\CD process in my team, I want to generate dynamic commands script from a file containing paths to some resources.
The file paths.txt contains the paths, separated by new lines. For every line in this file, a command should be generated, unless it starts with "JarPath/..."
example:
JarPath/DontTouchMe.jar
path/to/some/resource/View/PutMeInScript.msgflow
path/to/some/resource/Control/MeAlso.map
The file mapping.txt contains a key-values pairs. the key is a phrase to be matched with a path from paths.txt, and it's value is required for the generated command.
example:
View viewEG.bar
Control controlEG.bar
Lines in paths.txt are not sorted, and some paths can match a single value in mapping.txt.
Only the first match in the mapping.txt file that matches the first possible parse in the path should be considered. I don't care if later line in mapping also matches, nor if later directory in the path matches other line.
The to-be-matched parse at the path is not at fixed location (e.g after the 4th "/")
Final result in the script file should be:
mqsicreatebar -data ./ -b viewEG.bar -o /path/to/some/resource/View/PutMeInScript.msgflow
mqsicreatebar -data ./ -b controlEG.bar -o /path/to/some/resource/Control/MeAlso.map
Since the command line takes data from two sources (paths.txt and a value pair from mapping.txt) I couldn't wrap it into single awk command, nor pipeline it to single bash line. I wrote:
pathVar="paths.txt"
touch deltaFile.txt
while IFS= read -r line
do
awk -v var=$line" 'var ~ $1 && var !~ /^JarPath/ {print $2, " ", var ;exit}' mapping.txt >> deltaFile.txt
done < "$pathVar"
IFS=$'\n'
awk '{print "mqsicreatebar -data ./ -b", $1, "-o", $2 }' deltaFile.txt > script.sh
Well, it works, but is there a better way to do this?
Given your comment below that Only the first match in the mapping.txt file that matches the first possible parse in the path should be considered. The key dir can appear anywhere this is what you need:
$ cat tst.awk
NR==FNR {
keys[++numKeys] = $1
map[$1] = $2
next
}
!/^JarPath/ {
numDirs = split($0,dirs,"/")
val = ""
for (dirNr=1; (dirNr<=numDirs) && (val==""); dirNr++) {
dir = dirs[dirNr]
for (keyNr=1; (keyNr<=numKeys) && (val==""); keyNr++) {
key = keys[keyNr]
if (dir == key) {
val = map[dir]
}
}
}
printf "mqsicreatebar -data ./ -b \047%s\047 -o \047%s\047\n", val, $0
}
$ awk -f tst.awk mapping.txt paths.txt
mqsicreatebar -data ./ -b 'viewEG.bar' -o 'path/to/some/resource/View/PutMeInScript.msgflow'
mqsicreatebar -data ./ -b 'controlEG.bar' -o 'path/to/some/resource/Control/MeAlso.map'

Find, Replace, Remove - with in file

I'm currently using this code:
awk 'BEGIN { s = \"{$CNEW}\" } /WORD_MATCH/ { $0 = s; n = 1 } 1; END { if(!n) print s }' filename > new_filename
To find a match on WORD_MATCH and then replace that line with $CNEW in a file called filename the results are written to new_filename
This all works well. But I have an issue where I may want to DELETE the line instead of replace it.
So I set $CNEW = '' which works in that I get a blank line in the file, but not actually removing the line.
Is there anyway to adapt the AWK command to allow the removal of the line ?
The total aim is :
If there isn't a line in the file containing WORD_MATCH add one, based on $CNEW
If there is a line in the file containing WORD_MATCH update that line with the new value from $CNEW
If $CNEW ='' then delete the line contain WORD_MATCH.
There will only be one line in he file containing WORD_MATCH
Thanks
awk -v s="$CNEW" '/WORD_MATCH/ { n=1; if (s) $0=s; else next; } 1; END { if(s && !n) print s }' file
How it works
-v s="$CNEW"
This creates s as an awk variable with the value $CNEW. Note that the use of -v neatly eliminates the quoting problems that can occur by trying to define s in a BEGIN block.
/WORD_MATCH/ { n=1; if (s) $0=s; else next; }
If the current line matches WORD_MATCH, then set n to 1. If s is non-empty, then set the current line to s. If not, skip the rest of the commands and start over on the next line.
1
This is cryptic shorthand for print the line.
END { if(s && !n) print s }
At the end of the file, if n is still not 1 and s is non-empty, then print s.

Persistent AWK Program

I have been tasked with writing a BASH script to filter log4j files and pipe them over netcat to another host. One of the requirements is that the script must keep track of what it has already sent to the server and not send it again due to licensing constraints on the receiving server (the product on the server is licensed on a data-per-day model).
To achieve the filtering I'm using AWK encapsulated in a BASH script. The BASH component works fine - it's the AWK program that's giving me grief when I try to get it to remember what has already been sent to the server. I am doing this by grabbing the time stamp of a line each time a line matches my pattern. At the end of the program the last time stamp is written to a hidden file in current working directory. On successive runs of the program AWK will read this file in to a variable. Now each time a line matches the pattern it's time stamp is also compared to the one in the variable. If it is newer it is printed, otherwise it is not.
Desired Output:
INFO 2012-11-07 09:57:12,479 [[artifactid].connector.http.mule.default.receiver.02] org.mule.api.processor.LoggerMessageProcessor: MsgID=5017f1ff-1dfa-48c7-a03c-ed3c29050d12 InteractionStatus=Accept InteractionDateTime=2012-08-07T16:57:33.379+12:00 Retailer=CTCT RequestType=RemoteReconnect
Hidden File:
2012-10-11 12:08:19,918
So that's the theory, now my issue.
The script works fine for contrived/ trivial examples such as:
INFO 2012-11-07 09:57:12,479 [[artifactid].connector.http.mule.default.receiver.02] org.mule.api.processor.LoggerMessageProcessor: MsgID=5017f1ff-1dfa-48c7-a03c-ed3c29050d12 InteractionStatus=Accept InteractionDateTime=2012-08-07T16:57:33.379+12:00 Retailer=CTCT RequestType=RemoteReconnect
However, if I run it over a full blown log file with stack traces etc in it then the indentation levels appear to wreck havoc on my program. The first run of the program will produce the desired results - matching lines will be printed and the latest time stamp written to the hidden file. Running it again is when the problem crops up. The output of the program contains the indented lines from stack traces etc (see the block below) and I can't figure out why. This then stuffs the hidden file as the last matching line doesn't contain a time stamp and some garbage is written to it making any further runs pointless.
Undesired output:
at package.reverse.domain.SomeClass.someMethod(SomeClass.java:233)
at package.reverse.domain.processor.SomeClass.process(SomeClass.java:129)
at package.reverse.domain.processor.someClass.someMethod(SomeClassjava:233)
at package.reverse.domain.processor.SomeClass.process(SomeClass.java:129)
Hidden file after:
package.reverse.domain.process(SomeClass.java:129)
My awk program:
FNR == 1 {
CMD = "basename " FILENAME
CMD | getline FILE;
FILE = "." FILE ".last";
if (system("[ -f "FILE" ]") == 0) {
getline FIRSTLINE < FILE;
close(FILE);
print FIRSTLINE;
}
else {
FIRSTLINE = "1970-01-01 00:00:00,000";
}
}
$0 ~ EXPRESSION {
if (($2 " " $3) > FIRSTLINE) {
print $0;
LASTLINE=$2 " " $3;
}
}
END {
if (LASTLINE != "") {
print LASTLINE > FILE;
}
}
Any assistance with finding out why this is happening would be greatly appreciated.
UPDATE:
BASH Script:
#!/bin/bash
while getopts i:r:e:h:p: option
do
case "${option}"
in
i) INPUT=${OPTARG};;
r) RULES=${OPTARG};;
e) PATFILE=${OPTARG};;
h) HOST=${OPTARG};;
p) PORT=${OPTARG};;
?) printf "Usage: %s: -i <\"file1.log file2.log\"> -r <\"rules1.awk rules2.awk\"> -e <\"patterns.pat\"> -h <host> -p <port>\n" $0;
exit 1;
esac
done
#prepare expression with sed
EXPRESSION=`cat $PATFILE | sed ':a;N;$!ba;s/\n/|/g'`;
EXPRESSION="^(INFO|DEBUG|WARNING|ERROR|FATAL)[[:space:]]{2}[[:digit:]]{4}\\\\-[[:digit:]]{1,2}\\\\-[[:digit:]]{1,2}[[:space:]][[:digit:]]{1,2}:[[:digit:]]{2}:[[:digit:]]{2},[[:digit:]]{3}.*"$EXPRESSION".*";
#Make sure the temp file is empty
echo "" > .temp;
#input through awk.
for file in $INPUT
do
awk -v EXPRESSION="$EXPRESSION" -f $RULES $file >> .temp;
done
#send contents of file to splunk indexer over udp
cat .temp;
#cat .temp | netcat -t $HOST $PORT;
#cleanup temporary files
if [ -f .temp ]
then
rm .temp;
fi
Patterns File (The stuff I want to match):
Warning
Exception
Awk script as above.
Example.log
info 2012-09-04 16:00:11,638 [[adr-com-adaptor-stub].connector.http.mule.default.receiver.02] nz.co.amsco.interop.multidriveinterop: session not initialised
error 2012-09-04 16:00:11,639 [[adr-com-adaptor-stub].connector.http.mule.default.receiver.02] nz.co.amsco.adrcomadaptor.processor.comadaptorprocessor: nz.co.amsco.interop.exceptions.systemdownexception
nz.co.amsco.interop.exceptions.systemdownexception
at nz.co.amsco.adrcomadaptor.processor.comadaptorprocessor.getdeviceconfig(comadaptorprocessor.java:233)
at nz.co.amsco.adrcomadaptor.processor.comadaptorprocessor.process(comadaptorprocessor.java:129)
at org.mule.processor.chain.defaultmessageprocessorchain.doprocess(defaultmessageprocessorchain.java:99)
at org.mule.processor.chain.abstractmessageprocessorchain.process(abstractmessageprocessorchain.java:66)
at org.mule.processor.abstractinterceptingmessageprocessorbase.processnext(abstractinterceptingmessageprocessorbase.java:105)
at org.mule.processor.asyncinterceptingmessageprocessor.process(asyncinterceptingmessageprocessor.java:90)
at org.mule.processor.chain.defaultmessageprocessorchain.doprocess(defaultmessageprocessorchain.java:99)
at org.mule.processor.chain.abstractmessageprocessorchain.process(abstractmessageprocessorchain.java:66)
at org.mule.processor.AbstractInterceptingMessageProcessorBase.processNext(AbstractInterceptingMessageProcessorBase.java:105)
at org.mule.interceptor.AbstractEnvelopeInterceptor.process(AbstractEnvelopeInterceptor.java:55)
at org.mule.processor.AbstractInterceptingMessageProcessorBase.processNext(AbstractInterceptingMessageProcessorBase.java:105)
Usage:
./filter.sh -i "Example.log" -r "rules.awk" -e "patterns.pat" -h host -p port
Note that host and port are both unused in this version as the output is just thrown onto stdout.
So if I run this I get the following output:
info 2012-09-04 16:00:11,638 [[adr-com-adaptor-stub].connector.http.mule.default.receiver.02] nz.co.amsco.interop.multidriveinterop: session not initialised
error 2012-09-04 16:00:11,639 [[adr-com-adaptor-stub].connector.http.mule.default.receiver.02] nz.co.amsco.adrcomadaptor.processor.comadaptorprocessor: nz.co.amsco.interop.exceptions.systemdownexception
at nz.co.amsco.adrcomadaptor.processor.comadaptorprocessor.getdeviceconfig(comadaptorprocessor.java:233)
at nz.co.amsco.adrcomadaptor.processor.comadaptorprocessor.process(comadaptorprocessor.java:129)
If I run it again on the same unchanged file I should get no output however I am seeing:
nz.co.amsco.adrcomadaptor.processor.comadaptorprocessor.process(comadaptorprocessor.java:129)
I have been unable to determine why this is happening.
You didn't provide any sample input that could reproduce your problem so let's start by just cleaning up your script and go from there. Change it to this:
BEGIN{
expression = "^(INFO|DEBUG|WARNING|ERROR|FATAL)[[:space:]]{2}[[:digit:]]{4}-[[:digit:]]{1,2}-[[:digit:]]{1,2}[[:space:]][[:digit:]]{1,2}:[[:digit:]]{2}:[[:digit:]]{2},[[:digit:]]{3}.*Exception|Warning"
# Do you really want "(Exception|Warning)" in brackets instead?
# As written "Warning" on its own will match the whole expression.
}
FNR == 1 {
tstampFile = "/" FILENAME ".last"
sub(/.*\//,".",tstampFile)
if ( (getline prevTstamp < tstampFile) > 0 ) {
close(tstampFile)
print prevTstamp
}
else {
prevTstamp = "1970-01-01 00:00:00,000"
}
nextTstamp = ""
}
$0 ~ expression {
currTstamp = $2 " " $3
if (currTstamp > prevTstamp) {
print
nextTstamp = currTstamp
}
}
END {
if (nextTstamp != "") {
print nextTstamp > tstampFile
}
}
Now, do you still have a problem? If so, show us how you run the script, i.e. the bash command you are executing, and post some small sample input that reproduces your problem.

Resources