I am using awk to get compute some sums and I want to store them in a file.
here is my input file:
misses 15
hit 18
misses 20
hit 31
And I want to print the total misses, and total hits in a file.
If I run this:
awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt
I see them in the terminal.
Now I want to write the in a file:
I tried this:
#!/bin/bash
awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt
echo misses $misses > $1; #first one overwrites the previous $1 is the argument given in the command line
echo hits $hits>> $1; # all the othes append to the .txt file
but $misses, and $hits do not have value.
I also tried this:
#!/bin/bash
result= $(echo $output | awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt )
# $NF if I want the last column
echo $result
that I saw on the web, in order to see what $result will return me but I get this error:
./test2.sh: line 2: Hits:: command not found
hits and misses are only variables inside awk, not in the shell after awk exits. Just do the following:
#!/bin/bash
awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt > $1
In your second attempt, you cannot put a space after the '=':
result=$(echo $output | awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt )
simply redirect the output of the awk command:
awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt >file.txt
the redirection operator > can be appended to any shell command to redirect its standard output to a file. changing it to >> appends the command's output to the file instead of completely overwriting the file, as you noticed.
edit:
the reason this didn't work:
#!/bin/bash
awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt
echo misses $misses > $1; #first one overwrites the previous $1 is the argument given in the command line
echo hits $hits>> $1; # all the othes append to the .txt file
is because $misses and $hits are local variables to the awk script. thus the shell has no knowledge of them outside that statment, so when you try to echo them, you get blanks.
and this doesn't work:
#!/bin/bash
result= $(echo $output | awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt )
# $NF if I want the last column
echo $result
for multiple reasons.
1) when assigning variables in bash, you cannot have whitespace around the equal sign, so the second line must begin:
`result=$(echo...`
2) the echo statement inside your substitution (echo $output) is unnecessary. this is because a) $output is undefined so echo produces no output, and b) the second statement in the pipeline (the awk statement) completely ignores the standard output of the command preceding it in the pipeline anyway since you specified a filename for it to act on (t.txt). so the second line could just be:
result=$(awk 'BEGIN { hits=0; misses=0; } /^hit/{ hits+=$2}; /^misses/{misses+=$2}; END {print "Hits: " hits "\nMisses: " misses }' t.txt)
3) the echo statement at the end will display the results all on one line, despite the fact that the awk statement prints two lines. this is because you did not quote the variable in your echo statement. try this instead:
echo "$result"
as a rule in bash scripting, you should ALWAYS quote the variables you are passing or printing, unless you know for sure why you don't want to.
hope that helps you learn a bit more about what you were trying!
Here is a more compact solution:
#!/bin/bash
awk '
{tally[$1] += $2}
END {
for (outcome in tally) {
print outcome ":", tally[outcome]
}
}' t.xt > $1
You don't have to initialize variables in AWK. The first time you use it, AWK will assume 0 for number, or "" for string, depend on the context.
Related
I'm strugling to combine some data from my txt files generated in my jenkins job.
on each of the files there is 1 line, this is how each file look:
testsuite name="mytest" cars="201" users="0" bus="0" bike="0" time="116.103016"
What I manage to do for now is to extract the numbers for each txt file:
awk '/<testsuite name=/{print $3, $4, $5, $6}' my-output*.txt
Result are :
cars="193" users="2" bus="0" bike="0"
cars="23" users="2" bus="10" bike="7"
cars="124" users="2" bus="5" bike="0"
cars="124" users="2" bus="0" bike="123"
now I have a random number of files like this:
my-output1.txt
my-output2.txt
my-output7.txt
my-output*.txt
I would like to create single command just like the one I did above and to sum all of the files to have the following echo result:
cars=544 users=32 bus=12 bike=44
is there a way to do that? with a single line of command?
Using awk
$ cat script.awk
BEGIN {
FS="[= ]"
} {
gsub(/"/,"")
for (i=1;i<NF;i++)
if ($i=="cars") cars+=$(i+1)
else if($i=="users") users+=$(i+1);
else if($i=="bus") bus+=$(i+1);
else if ($i=="bike")bike+=$(i+1)
} END {
print "cars="cars,"users="users,"bus="bus,"bike="bike
}
To run the script, you can use;
$ awk -f script.awk my-output*.txt
Or, as a ugly one liner.
$ awk -F"[= ]" '{gsub(/"/,"");for (i=1;i<NF;i++) if ($i=="cars") cars+=$(i+1); else if($i=="users") users+=$(i+1); else if($i=="bus") bus+=$(i+1); else if ($i=="bike")bike+=$(i+1)}END{print"cars="cars,"users="users,"bus="bus,"bike="bike}' my-output*.txt
1st solution: With your shown samples please try following awk code, using match function in here. Since awk could read multiple files within a single program itself and your files have .txt format you can pass as .txt format to awk program itself.
Written and tested in GNU awk with its match function's capturing group capability to create/store values into an array to be used later on in program.
awk -v s1="\"" '
match($0,/[[:space:]]+(cars)="([^"]*)" (users)="([^"]*)" (bus)="([^"]*)" (bike)="([^"]*)"/,tempArr){
temp=""
for(i=2;i<=8;i+=2){
temp=tempArr[i-1]
values[i]+=tempArr[i]
indexes[i-1]=temp
}
}
END{
for(i in values){
val=(val?val OFS:"") (indexes[i-1]"=" s1 values[i] s1)
}
print val
}
' *.txt
Explanation:
In start of GNU awk program creating variable named s1 to be set to " to be used later in the program.
Using match function in main program of awk.
Mentioning regex [[:space:]]+(cars)="([^"]*)" (users)="([^"]*)" (bus)="([^"]*)" (bike)="([^"]*)"(explained at last of this post) which is creating 8 groups to be used later on.
Then once condition is matched running a for loop which runs only even numbers in it(to get required values only).
Creating array values with index of i and keep adding its own value + tempArr values to it, where tempArr is created by match function.
Similarly creating indexes array to store only key values in it.
Then in END block of this program traversing through values array and printing the values from indexes and values array as per requirement.
Explanation of regex:
[[:space:]]+ ##Matching spaces 1 or more occurrences here.
(cars)="([^"]*)" ##Matching cars=" till next occurrence of " here.
(users)="([^"]*)" ##Matching spaces followed by users=" till next occurrence of " here.
(bus)="([^"]*)" ##Matching spaces followed by bus=" till next occurrence of " here.
(bike)="([^"]*)" ##Matching spaces followed by bike=" till next occurrence of " here.
2nd solution: In GNU awk only with using RT and RS variables power here. This will make sure the sequence of the values also in output should be same in which order they have come in input.
awk -v s1="\"" -v RS='[[:space:]][^=]*="[^"]*"' '
RT{
gsub(/^ +|"/,"",RT)
num=split(RT,arr,"=")
if(arr[1]!="time" && arr[1]!="name"){
if(!(arr[1] in values)){
indexes[++count]=arr[1]
}
values[arr[1]]+=arr[2]
}
}
END{
for(i=1;i<=count;i++){
val=(val?val OFS:"") (indexes[i]"=" s1 values[indexes[i]] s1)
}
print val
}
' *.txt
You may use this awk solution:
awk '{
for (i=1; i<=NF; ++i)
if (split($i, a, /=/) == 2) {
gsub(/"/, "", a[2])
sums[a[1]] +=a[2]
}
}
END {
for (i in sums) print i "=" sums[i]
}' file*
bus=15
cars=464
users=8
bike=130
found a way to do so a bit long:
awk '/<testsuite name=/{print $3, $4, $5, $6}' my-output*.xml | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | awk '{bus+=$1;users+=$2;cars+=$3;bike+=$4 }END{print "bus=" bus " users="users " cars=" cars " bike=" bike}'
M. Nejat Aydin answer was good fit:
awk -F '[ "=]+' '/testsuite name=/{ cars+=$5; users+=$7; buses+=$9; bikes+=$11 } END{ print "cars="cars, "users="users, "buses="buses, "bikes="bikes }' my-output*.xml
I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv
I have a file like this :
"A";"1"
"A";""
"A";""
"B";"1"
"C";"1"
"C";""
"C";""
When I have the same pattern between first part of current line and previous line, I want increment the second part of my line.
like this :
"A";"1"
"A";"2"
"A";"3"
"B";"1"
"C";"1"
"C";"2"
"C";"3"
or if second part is empty I take the previous line and I increment it.
Do you have any idea how I can do this with a shell script or maybe with awk or sed command?
With perl:
$ perl -F';' -lane 'if ($F[1] =~ /"(\d+)"/) { $saved = $1; } else { $saved++; $F[1] = qq/"$saved"/; }
print join(";", #F)' example.txt
"A";"1"
"A";"2"
"A";"3"
"B";"1"
"C";"1"
"C";"2"
"C";"3"
With awk:
$ awk -F';' -v OFS=';' '
$2 ~ /"[0-9]+"/ { saved = substr($2, 2, length($2) - 2) }
$2 == "\"\"" { $2 = "\"" ++saved "\"" }
{ print }' example.txt
"A";"1"
"A";"2"
"A";"3"
"B";"1"
"C";"1"
"C";"2"
"C";"3"
I'm writing a small BASH script that reads a csv file with names on it and prompts the user for a name to be removed. The csv file looks like this:
Smith,John
Jackie,Jackson
The first and last name of the person to be removed from the list are saved in the bash variables $first_name and $last_name.
This is what I have so far:
cat file.csv | awk -F',' -v last="$last_name" -v first="$first_name" ' ($1 != last || $2 != first) { print } ' > tmpfile1
This works fine. However, it still outputs to tmpfile1 even if no employee matches that name. What I would like is to have something like:
if ($1 != last || $2 != first) { print } > tmpfile1 ; else { print "No Match Found." }
I'm new to awk and can't get that last part to work.
NOTE: I do not want to use something like grep -v "$last_name,$first_name"; I want to use a filtering function.
You can redirect right inside the awk script, and only output matches found.
awk -F',' -v last="$last_name" -v first="$first_name" '
$1==last && $2==first {next}
{print > "tmpfile"}
' file.csv
Here are some differences between your script and this....
This has awk reading your CSV directly, rather than having UUOC.
This actively skips the records you want to skip,
and prints everything else through a redirect.
Note that you could, if you wanted, specify the target to which to redirect in a variable you pass in using -v as well.
If you really want the "No match found" error, you can set a flag, then use the END special condition in awk...
awk -F',' -v last="$last_name" -v first="$first_name" '
$1==last && $2==first { found=1; next }
{ print > "tmpfile" }
END { if (!found) print "No match found." > "/dev/stderr" }
' file.csv
And if you want no tmpfile to be created if a match wasn't found, you would either need to scan the file TWICE, once to verify that there's a match, and once to print, or if there's no risk that the size of the file would be too great for available memory, you could keep a buffer:
awk -F',' -v last="$last_name" -v first="$first_name" '
$1==last && $2==first { next }
{ output = (output ? output ORS : "" ) $0 }
END {
if (output)
print output > "tmpfile"
else
print "No match found." > "/dev/stderr"
}
' file.csv
Disclaimer: I haven't tested any of these. :)
You can do two passes over the file, or you can queue up all of the file so far in memory and then just fail if you reach the END block with no match.
awk -v first="$first" last="$last" '$1 != last || $2 != first {
for (i=1; i<=n; ++i) print a[i] >>"tempfile"; p=1; split("", a); }
# No match yet, remember this line for later
!p { a[++n] = $0; next }
# If we get through to here, there was a match
p { print >>"tempfile" }
END { if (!p) { print "no match" >"/dev/stderr"; exit 1 } }' filename
This requires you to have enough memory to store the entire file (this will be required when there is no match).
With a bash script, you can test if awk print something.
If yes, remove the tmpfile.
c=$(awk -F',' -v a="$last_name" -v b="$first_name" '
$1==a && $2==b {c=1;next}
{print > "tmpfile"}
END{if (!c){print "no match"}}' infile)
[ -n "$c" ] && { echo "$c"; rm tmpfile;}
Job = grep 'Job:' | awk '{ print $3 }'
Status = grep 'Job Status:' | awk '{ print $3 }'
Both the variables are printed correctly by using two echo statements.I want a result like Job name - status in a single line.I have tried below commands. But its printing only 2nd variable like - status
echo "$Job - $Status"
echo "${Job} - ${Status}"
echo -e "${Job} - ${Status}"
please help!
You can do it with a single awk command:
awk '/Job:/ { job = $3 } /Job Status:/ { status = $3 } END { print job " - " status }' file
If Job: comes before Job Status:
awk '/Job:/ { job = $3 } /Job Status:/ { print job " - " $3; exit }' file
Or vice versa:
awk '/Job Status:/ { status = $3 } /Job Status:/ { print $3 " - " status; exit }' file
I think that should work:
echo $(awk ' /Job:/ { print $3} ' file)" - "$(awk ' /Job Status:/ { print $3} ' file)
but konsolebox's version is probably better, as there is only one awk invocation.
I think you are trying to find out how to get the result of running some command and store it in a variable. Then you want to do that twice and print both variables on the same line.
So the basic syntax is:
result=$(some command)
e.g. if
date +'%Y'
tells you the year is 2014, but you want 2014 in a variable called year, you can do
year=$(date +'%Y')
then you can echo $year like this:
echo $year
2014
So, coming to your actual question, you want two variables, one for the output of each of two commands:
job=$(grep "Job:" someFile | awk '{print $3}')
status=$(grep "Job Status:" someFile | awk '{print $3}')
then you can do:
echo $job $status
and get both things on the same line.
The other answers are saying you can avoid invoking awk twice, which is true, but doesn't explain how to capture the result of running a command into a variable. In general, you don't need to use awk and grep, because this:
grep xyz | awk ...
is equivalent to
awk '/xyz/ {...}'
but uses one fewer processes (i.e. no grep) and therefore fewer resources.
And by the way, you must not put any spaces either side of = in bash either. It is
variable=something
not
variable = something