gawk nextfile with compressed input - bash

I'm trying to use awk's nextfile statement with multiple gzipped input files.
I've googled for this before posting but it looks like I'm the only one who want to do this :D
This is what i need to do:
awk '
BEGIN{
print "start",strftime();
}
/match/{
print FILENAME,"->",$0
count++
nextfile
}
END{
print count
print "stop",strftime();
}
' /var/log/*.2015-01-23.gz
Suddenly awk can't read by itself gzipped files, so I have to use zcat and I've modified my syntax as follow:
zcat /var/log/*.2015-01-23.gz | awk '
BEGIN{
print "start",strftime();
}
/match/{
print FILENAME,"->",$0
count++
nextfile
}
END{
print count
print "stop",strftime();
}
'
But this way nextfile statement won't work because awk see just one input data flow.
My awk version:
# awk --version
GNU Awk 3.1.7
Note: what exposed is a resume of what I need to do in the END action, so don't propose to use zgrep or something else. I need awk.
Note2: Files will be elaborated togheter.
Thanks

You can fetch the files one by one, hold the Name of the file in a shell variable and print this in awk:
#!/bin/bash
for LOGFILE in /var/log/*.2015-01-23.gz
do
zcat "$LOGFILE" |
awk '
BEGIN{
print "start",strftime();
}
/match/{
print "'$LOGFILE'","->",$0
last
}
END{
print "stop",strftime();
}
'
done

Related

combining numbers from multiple text files using bash

I'm strugling to combine some data from my txt files generated in my jenkins job.
on each of the files there is 1 line, this is how each file look:
testsuite name="mytest" cars="201" users="0" bus="0" bike="0" time="116.103016"
What I manage to do for now is to extract the numbers for each txt file:
awk '/<testsuite name=/{print $3, $4, $5, $6}' my-output*.txt
Result are :
cars="193" users="2" bus="0" bike="0"
cars="23" users="2" bus="10" bike="7"
cars="124" users="2" bus="5" bike="0"
cars="124" users="2" bus="0" bike="123"
now I have a random number of files like this:
my-output1.txt
my-output2.txt
my-output7.txt
my-output*.txt
I would like to create single command just like the one I did above and to sum all of the files to have the following echo result:
cars=544 users=32 bus=12 bike=44
is there a way to do that? with a single line of command?
Using awk
$ cat script.awk
BEGIN {
FS="[= ]"
} {
gsub(/"/,"")
for (i=1;i<NF;i++)
if ($i=="cars") cars+=$(i+1)
else if($i=="users") users+=$(i+1);
else if($i=="bus") bus+=$(i+1);
else if ($i=="bike")bike+=$(i+1)
} END {
print "cars="cars,"users="users,"bus="bus,"bike="bike
}
To run the script, you can use;
$ awk -f script.awk my-output*.txt
Or, as a ugly one liner.
$ awk -F"[= ]" '{gsub(/"/,"");for (i=1;i<NF;i++) if ($i=="cars") cars+=$(i+1); else if($i=="users") users+=$(i+1); else if($i=="bus") bus+=$(i+1); else if ($i=="bike")bike+=$(i+1)}END{print"cars="cars,"users="users,"bus="bus,"bike="bike}' my-output*.txt
1st solution: With your shown samples please try following awk code, using match function in here. Since awk could read multiple files within a single program itself and your files have .txt format you can pass as .txt format to awk program itself.
Written and tested in GNU awk with its match function's capturing group capability to create/store values into an array to be used later on in program.
awk -v s1="\"" '
match($0,/[[:space:]]+(cars)="([^"]*)" (users)="([^"]*)" (bus)="([^"]*)" (bike)="([^"]*)"/,tempArr){
temp=""
for(i=2;i<=8;i+=2){
temp=tempArr[i-1]
values[i]+=tempArr[i]
indexes[i-1]=temp
}
}
END{
for(i in values){
val=(val?val OFS:"") (indexes[i-1]"=" s1 values[i] s1)
}
print val
}
' *.txt
Explanation:
In start of GNU awk program creating variable named s1 to be set to " to be used later in the program.
Using match function in main program of awk.
Mentioning regex [[:space:]]+(cars)="([^"]*)" (users)="([^"]*)" (bus)="([^"]*)" (bike)="([^"]*)"(explained at last of this post) which is creating 8 groups to be used later on.
Then once condition is matched running a for loop which runs only even numbers in it(to get required values only).
Creating array values with index of i and keep adding its own value + tempArr values to it, where tempArr is created by match function.
Similarly creating indexes array to store only key values in it.
Then in END block of this program traversing through values array and printing the values from indexes and values array as per requirement.
Explanation of regex:
[[:space:]]+ ##Matching spaces 1 or more occurrences here.
(cars)="([^"]*)" ##Matching cars=" till next occurrence of " here.
(users)="([^"]*)" ##Matching spaces followed by users=" till next occurrence of " here.
(bus)="([^"]*)" ##Matching spaces followed by bus=" till next occurrence of " here.
(bike)="([^"]*)" ##Matching spaces followed by bike=" till next occurrence of " here.
2nd solution: In GNU awk only with using RT and RS variables power here. This will make sure the sequence of the values also in output should be same in which order they have come in input.
awk -v s1="\"" -v RS='[[:space:]][^=]*="[^"]*"' '
RT{
gsub(/^ +|"/,"",RT)
num=split(RT,arr,"=")
if(arr[1]!="time" && arr[1]!="name"){
if(!(arr[1] in values)){
indexes[++count]=arr[1]
}
values[arr[1]]+=arr[2]
}
}
END{
for(i=1;i<=count;i++){
val=(val?val OFS:"") (indexes[i]"=" s1 values[indexes[i]] s1)
}
print val
}
' *.txt
You may use this awk solution:
awk '{
for (i=1; i<=NF; ++i)
if (split($i, a, /=/) == 2) {
gsub(/"/, "", a[2])
sums[a[1]] +=a[2]
}
}
END {
for (i in sums) print i "=" sums[i]
}' file*
bus=15
cars=464
users=8
bike=130
found a way to do so a bit long:
awk '/<testsuite name=/{print $3, $4, $5, $6}' my-output*.xml | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | awk '{bus+=$1;users+=$2;cars+=$3;bike+=$4 }END{print "bus=" bus " users="users " cars=" cars " bike=" bike}'
M. Nejat Aydin answer was good fit:
awk -F '[ "=]+' '/testsuite name=/{ cars+=$5; users+=$7; buses+=$9; bikes+=$11 } END{ print "cars="cars, "users="users, "buses="buses, "bikes="bikes }' my-output*.xml

awk output to file based on filter

I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv

awk OFS not producing expected value

I have a file
[root#nmk~]# cat file
abc>
sssd>
were>
I run both these variations of the awk commands
[root#nmk~]# cat file | awk -F\> ' { print $1}' OFS=','
abc
sssd
were
[root#nmk~]# cat file | awk -F\> ' BEGIN { OFS=","} { print $1}'
abc
sssd
were
[root#nmk~]#
But my expected output is
abc,sssd,were
What's missing in my commands ?
You're just a bit confused about the meaning/use of FS, OFS, RS and ORS. Take another look at the man page. I think this is what you were trying to do:
$ awk -F'>' -v ORS=',' '{print $1}' file
abc,sssd,were,$
but this is probably closer to the output you really want:
$ awk -F'>' '{rec = rec (NR>1?",":"") $1} END{print rec}' file
abc,sssd,were
or if you don't want to buffer the whole output as a string:
$ awk -F'>' '{printf "%s%s", (NR>1?",":""), $1} END{print ""}' file
abc,sssd,were
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1}' file
to print newline at the end:
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1} END{print "\n"}' file
output:
abc,sssd,were
Each line of input in awk is a record, so what you want to set is the Output Record Separator, ORS. The OFS variable holds the Output Field Separator, which is used to separate different parts of each line.
Since you are setting the input field separator, FS, to >, and OFS to ,, an easy way to see how these work is to add something on each line of your file after the >:
awk 'BEGIN { FS=">"; OFS=","} {$1=$1} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc,def
sssd,dsss
were,wolf
So you want to set the ORS. The default record separator is newline, so whatever you set ORS to effectively replaces the newlines in the input. But that means that if the last line of input has a newline - which is usually the a case - that last line will also get a copy of your new ORS:
awk 'BEGIN { FS=">"; ORS=","} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc>def,sssd>dsss,were>wolf,
It also won't get a newline at all, because that newline was interpreted as an input record separator and turned into the output record separator - it became the final comma.
So you have to be a little more explicit about what you're trying to do:
awk 'BEGIN { FS=">" } # split input on >
(NR>1) { printf "," } # if not the first line, print a ,
{ printf "%s", $1 } # print the first field (everything up to the first >)
END { printf "\n" } # add a newline at the end
' <<<$'abc>\nsssd>\nwere>'
Which outputs this:
abc,sssd,were
Through sed,
$ sed ':a;N;$!ba;s/>\n/,/g;s/>$//' file
abc,sssd,were
Through Perl,
$ perl -00pe 's/>\n(?=.)/,/g;s/>$//' file
abc,sssd,were

Shell script to retrieve the status of objects from the given output

I am trying to retrive the status of the objects from the below output(the value of Name field and the value of OpState field corresponding to the same) using shell script.For example in the above output the status of 'DP-UID-FSH' is 'up'. I want to produce an output like:
Platform: Bash on Solaris.
DP-UID-FSH is up.
DP-Cert-FSH is up.
Below is the content of the file which nees to be parsed to produce above output.
<ConfigState>saved</ConfigState></ObjectStatus><ObjectStatus xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<Class>HTTPSSourceProtocolHandler</Class>
<OpState>up</OpState>
<AdminState>enabled</AdminState>
<Name>DP-UID-FSH</Name>
<EventCode>0x00000000</EventCode>
<ErrorCode/>
<ConfigState>saved</ConfigState></ObjectStatus><ObjectStatus xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<Class>SLMAction</Class>
<OpState>up</OpState>
<AdminState>enabled</AdminState>
<Name>DP-Cert-FSH</Name>
<EventCode>0x00000000</EventCode>
<ErrorCode/>
<ConfigState>saved</ConfigState></ObjectStatus><ObjectStatus xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<Class>SLMAction</Class>
<OpState>up</OpState>
<AdminState>enabled</AdminState>
<Name>shape</Name>
<EventCode>0x00000000</EventCode>
<ErrorCode/>
saved
I am a newbee in shell script and doesnt have a clue on how this can be achieved?
The Awk solutions have gotten messy so I'd just add another answer that uses Perl. I'm not well-versed in Perl but I learn easy and this could solve it as well:
perl -lane '$state = (split(/[<>]/))[2] if /OpState/; print ((split(/[<>]/))[2] . " is $state.") if /<Name>/' file
Output:
DP-UID-FSH is up.
DP-Cert-FSH is up.
shape is up.
As jaypal suggested (thanks), split is not needed since autosplit (-a) is enabled:
perl -F'[<>]' -lane '$state = $F[2] if /OpState/; print "$F[2] is $state" if /<Name>/' file
With GNU Awk or Mawk:
awk -v RS='<OpState>' -F '[<>]' 'NR > 1 { printf "%s is %s.\n", $9, $1 }' file
Another:
awk '/OpState/ { gsub(/<\/?OpState>/, ""); s = $0; } /<Name>/ { gsub(/<\/?Name>/, ""); printf "%s is %s.\n", $0, s; }' file
Yet another:
awk -F '[<>]' '/OpState/ { s = $3; } /<Name>/ { printf "%s is %s.\n", $3, s; }' file
Output:
DP-UID-FSH is up.
DP-Cert-FSH is up.
shape is up.

How to replace full column with the last value?

I'm trying to take last value in third column of a CSV file and replace then the whole third column with this value.
I've been trying this:
var=$(tail -n 1 math_ready.csv | awk -F"," '{print $3}'); awk -F, '{$3="$var";}1' OFS=, math_ready.csv > math1.csv
But it's not working and I don't understand why...
Please help!
awk '
BEGIN { ARGV[2]=ARGV[1]; ARGC++; FS=OFS="," }
NR==FNR { last = $3; next }
{ $3 = last; print }
' math_ready.csv > math1.csv
The main problem with your script was trying to access a shell variable ($var) inside your awk script. Awk is not shell, it is a completely separate language/tool with it's own namespace and variables. You cannot directly access a shell variable in awk, just like you couldn't access it in C. To access the VALUE of a shell variable you'd do:
shellvar=27
awk -v awkvar="$shellvar" 'BEGIN{ print awkvar }'`
Some additional cleanup:
When FS and OFS have the same value, don't assign them each to that value separately, use BEGIN{ FS=OFS="," } instead for clarity and maintainability.
Do not iniatailize variables AFTER the script that uses those variables unless you have a very specifc reason to do so. Use awk -F... -v OFS=... 'script' to init those variables to separate values, not awk -F... 'script' OFS=... as it's very unnatural to init variables in the code segment AFTER you've used them and variables inited in the args list at the end are not initialized when the BEGIN section is executed which can cause bugs.
A shell variable is not expandable internally in awk. You can do this instead:
awk -F, -v var="$var" '{ $3 = var } 1' OFS=, math_ready.csv > math1.cs
And you probably can simplify your code with this:
awk -F, 'NR == FNR { r = $3; next } { $3 = r } 1' OFS=, math_ready.csv math_ready.csv > math1.csv
Example input:
1,2,1
1,2,2
1,2,3
1,2,4
1,2,5
Output:
1,2,5
1,2,5
1,2,5
1,2,5
1,2,5
Try this one liner. It doesn't depend on the column count
var=`tail -1 sample.csv | perl -ne 'm/([^,]+)$/; print "$1";'`; cat sample.csv | while read line; do echo $line | perl -ne "s/[^,]*$/$var\n/; print $_;"; done
cat sample.csv
24,1,2,30,12
33,4,5,61,3333
66,7,8,91111,1
76,10,11,32,678
Out:
24,1,2,30,678
33,4,5,61,678
66,7,8,91111,678
76,10,11,32,678

Resources