Adding constant values using awk

Adding constant values using awk - shell

I have requirement to add constant value to 4th column if value is less than 240000. The constant value is 010000. I have written command but its not give any output. Below is sample data and script. Please help me in this.Thank in advance.
Command :
awk '{
if($4 -lt 240000)
$4= $4+010000;
}' Test.txt
Sample Data :
1039,1018,20180915,000000,0,0,A
1039,1018,20180915,010000,0,0,A
1039,1018,20180915,020000,0,0,A
1039,1018,20180915,030000,0,0,A
1039,1018,20180915,240000,0,0,A
1039,1018,20180915,050000,0,0,A
1039,1018,20180915,060000,0,0,A
1039,1018,20180915,070000,1,0,A
1039,1018,20180915,080000,0,1,A
1039,1018,20180915,090000,2,0,A
1039,1018,20180915,241000,0,0,A
1039,1018,20180915,240500,0,0,A

$ awk '
BEGIN { FS=OFS="," } # input and output field separators
{
if($4<240000) # if comparison
$4=sprintf("%06d",$4+10000) # I assume 10000 not 010000, also zeropadded to 6 chars
# $4+=10000 # if zeropadding is not required
print # output
}' file
Output:
1039,1018,20180915,010000,0,0,A
1039,1018,20180915,020000,0,0,A
1039,1018,20180915,030000,0,0,A
1039,1018,20180915,040000,0,0,A
1039,1018,20180915,240000,0,0,A
1039,1018,20180915,060000,0,0,A
1039,1018,20180915,070000,0,0,A
1039,1018,20180915,080000,1,0,A
1039,1018,20180915,090000,0,1,A
1039,1018,20180915,100000,2,0,A
1039,1018,20180915,241000,0,0,A
1039,1018,20180915,240500,0,0,A
$4+10000 not 010000 since awk 'BEGIN{ print 010000+0}' outputs 4096 as it is octal representation of of that value.

Related

BASH: Performing decimal division on a column in file and printing result in another file

I have a file (in.txt) with the following columns:
# DM Sigma Time (s) Sample Downfact
78.20 7.36 134.200512 2096883 70
78.20 7.21 144.099904 2251561 70
78.20 9.99 148.872384 2326131 150
78.20 10.77 283.249664 4425776 45
I want to write a bash script to divide all values in column 'Time' by 0.5867, get a precision up to 2 decimal points and print out the resulting values in another file out.txt
I tried using bc/awk but it gives this error.
awk: cmd. line:1: fatal: division by zero attempted
awk: fatal: cannot open file `file' for reading (No such file or directory)
Could someone help me with this? Thanks.
This is the bash script that I attempted:
cat in.txt | while read DM Sigma Time Sample Downfact; do
echo "$DM $Sigma $Time $Sample $Downfact"
pperiod = 0.5867
awk -v n=$Time 'BEGIN {printf "%.2f\n", (n/$pperiod)}'
#echo "scale=2 ; $Time / $pperiod" | bc
#echo "$subint" > out.txt
done
I expected the script to divide column 'Time' with pperiod and get the result with a precision of 2 decimal places. This result should be printed to a file named out.txt

Lots of issues with current awk code:
need to pass in the value of the $pperiod variable
need to reference the Time column by is position ($3 in this case)
BEGIN{} block is applied before any input lines are processed and has nothing to do with processing of actual input lines
there is no code to perform processing on actual input lines
need to decide what to do in the case of a divide by zero scenario (in this case we'll default answer to 0.00)
NOTE: current code generates divide by zero error because $pperiod is an undefined (awk) variable which in turn defaults to 0
additionally, pperiod = 0.5867 is invalid bash syntax
One idea for fixing current issues:
pperiod=0.5867
awk -v pp="${pperiod}" 'NR>1 {printf "%.2f\n", (pp==0 ? 0 : ($3/pp))}' in.txt > out.txt
Where:
-v pp="${pperiod}" - assign awk variable pp the value of the bash variable "${pperiod}"
NR>1 - skip header line
NR>1 {printf "%.2f\n" ...}- for each input line, other than the header line, print the result of dividing theTimecolumn (aka$3) by the value of the awkvariablepp(which holds the value of thebashvariable"${pperiod}"`)
(pp==0 ? 0 : ($3/pp)) - if pp is equal 0 we print 0 else print result of $3/pp) (this keeps us from generating a divide by zero error)
NOTE: this also eliminates the need for the cat|while loop
This generates:
$ cat out.txt
228.74
245.61
253.75
482.78

Parsing multiline program output

I've recently been working on some lab assignments and in order to collect and analyze results well, I prepared a bash script to automate my job. It was my first attempt to create such script, thus it is not perfect and my question is strictly connected with improving it.
Exemplary output of the program is shown below, but I would like to make it more general for more purposes.
>>> VARIANT 1 <<<
Random number generator seed is 0xea3495cc76b34acc
Generate matrix 128 x 128 (16 KiB)
Performing 1024 random walks of 4096 steps.
> Total instructions: 170620482
> Instructions per cycle: 3.386
Time elapsed: 0.042127 seconds
Walks accrued elements worth: 534351478
All data I want to collect is always in different lines. My first attempt was running the same program twice (or more times depending on the amount of data) and then using grep in each run to extract the data I need by looking for the keyword. It is very inefficient, as there probably are some possibilities of parsing whole output of one run, but I could not come up with any idea. At the moment the script is:
#!/bin/bash
write() {
o1=$(./progname args | grep "Time" | grep -o -E '[0-9]+.[0-9]+')
o2=$(./progname args | grep "cycle" | grep -o -E '[0-9]+.[0-9]+')
o3=$(./progname args | grep "Total" | grep -o -E '[0-9]+.[0-9]+')
echo "$1 $o1 $o2 $o3"
}
for ((i = 1; i <= 10; i++)); do
write $i >> times.dat
done
It is worth mentioning that echoing results in one line is crucial, as I am using gnuplot later and having data in columns is perfect for that use. Sample output should be:
1 0.019306 3.369 170620476
2 0.019559 3.375 170620475
3 0.021971 3.334 170620478
4 0.020536 3.378 170620480
5 0.019692 3.390 170620475
6 0.020833 3.375 170620477
7 0.019951 3.450 170620477
8 0.019417 3.381 170620476
9 0.020105 3.374 170620476
10 0.020255 3.402 170620475
My question is: how could I improve the script to collect such data in just one program execution?

You could use awk here and could get values into an array and later access them by index 1,2 and 3 in case you want to do this in a single command.
myarr=($(your_program args | awk '/Total/{print $NF;next} /cycle/{print $NF;next} /Time/{print $(NF-1)}'))
OR use following to forcefully print all elements into a single line, which will not come in new lines if someone using " to keep new lines safe for values.
myarr=($(your_program args | awk '/Total/{val=$NF;next} /cycle/{val=(val?val OFS:"")$NF;next} /Time/{print val OFS $(NF-1)}'))
Explanation: Adding detailed explanation of awk program above.
awk ' ##Starting awk program from here.
/Total/{ ##Checking if a line has Total keyword in it then do following.
print $NF ##Printing last field of that line which has Total in it here.
next ##next keyword will skip all further statements from here.
}
/cycle/{ ##Checking if a line has cycle in it then do following.
print $NF ##Printing last field of that line which has cycle in it here.
next ##next keyword will skip all further statements from here.
}
/Time/{ ##Checking if a line has Time in it then do following.
print $(NF-1) ##Printing 2nd last field of that line which has Time in it here.
}'
To access individual items you could use like:
echo ${myarr[0]}, echo ${myarr[1]} and echo ${myarr[2]} for Total, cycle and time respectively.
Example to access all elements by loop in case you need:
for i in "${myarr[#]}"
do
echo $i
done

You can execute your program once and save the output at a variable.
o0=$(./progname args)
Then you can grep that saved string any times like this.
o1=$(echo "$o0" | grep "Time" | grep -o -E '[0-9]+.[0-9]+')

Assumptions:
each of the 3x search patterns (Time, cycle, Total) occur just once in a set of output from ./progname
format of ./progname output is always the same (ie, same number of space-separated items for each line of output)
I've created my own progname script that just does an echo of the sample output:
$ cat progname
echo ">>> VARIANT 1 <<<
Random number generator seed is 0xea3495cc76b34acc
Generate matrix 128 x 128 (16 KiB)
Performing 1024 random walks of 4096 steps.
> Total instructions: 170620482
> Instructions per cycle: 3.386
Time elapsed: 0.042127 seconds
Walks accrued elements worth: 534351478"
One awk solution to parse and print the desired values:
$ i=1
$ ./progname | awk -v i=${i} ' # assign awk variable "i" = ${i}
/Time/ { o1 = $3 } # o1 = field 3 of line that contains string "Time"
/cycle/ { o2 = $5 } # o2 = field 5 of line that contains string "cycle"
/Total/ { o3 = $4 } # o4 = field 4 of line that contains string "Total"
END { printf "%s %s %s %s\n", i, o1, o2, o3 } # print 4x variables to stdout
'
1 0.042127 3.386 170620482

Simplify an awk "nth column sum"

Could you help me simplify:
awk 'BEGIN{FS=OFS=","}{rank=1/((1/$6)+(1/$10)+(1/$14)+(1/$18)+(1/$22));print $0,rank}' test.csv
I know the for loop should be:
for(i=6; i<=NF; i+=4)
But I don't know how to make a repeating pattern in AWK. Also not sure how awk handles dividing by zero.
Sample data:
04/12/10 01:15,1291425300,279,41,6,24,71,39,12,1,356,25,4,29,32,10,1,1,170,27,16,8
21/05/14 16:45,1400690700,147,28,80,13,99,7,121,11,107,19,132,12,119,24,40,10,154,25,161,20
09/10/07 09:45,1191923100,152,56,201,35,115,47,157,29,149,47,119,19,131,40,30,11,216,136,213,64
08/06/07 00:30,1181262600,133,47,268,41,93,26,282,40,151,30,249,39,160,46,191,45,164,64,216,42
13/11/09 06:15,1258092900,1043,1462,1163,1456,789,1111,930,1143,954,1460,1366,1469,831,891,728,954,1092,1316,1381,1492
10/03/98 19:30,889558200,789,1240,1176,1262,,,,,,,,,,,,,162,271,1006,283
Sample output:
04/12/10 01:15,1291425300,279,41,6,24,71,39,12,1,356,25,4,29,32,10,1,1,170,27,16,8,0.454308093994778
21/05/14 16:45,1400690700,147,28,80,13,99,7,121,11,107,19,132,12,119,24,40,10,154,25,161,20,2.49273678094131
09/10/07 09:45,1191923100,152,56,201,35,115,47,157,29,149,47,119,19,131,40,30,11,216,136,213,64,4.50004789527607
08/06/07 00:30,1181262600,133,47,268,41,93,26,282,40,151,30,249,39,160,46,191,45,164,64,216,42,8.2601610016789
13/11/09 06:15,1258092900,1043,1462,1163,1456,789,1111,930,1143,954,1460,1366,1469,831,891,728,954,1092,1316,1381,1492,252.467979545275
10/03/98 19:30,889558200,789,1240,1176,1262,,,,,,,,,,,,,162,271,1006,283,#DIV/0!

Like this:
BEGIN{FS=OFS=","}{rank=0;for(i=6;i<=22;i+=4)rank+=($i ? 1/$i : 0);print $0,rank}

$ awk '
BEGIN { FS=OFS="," }
{
for(i=6;i<=NF;i+=4) # every 4th column
if($i+0==0) { # if there is a 0 divisor
rank="#DIV/0!" # set rank to something static
break # break from for
}
else
rank+=1/$i # sum every 4th
print $0,rank # output
rank=0 # reset
}' file
Outputs (didn't check if they were right):
04/12/10 01:15,1291425300,279,41,6,24,71,39,12,1,356,25,4,29,32,10,1,1,170,27,16,8,2.20115
21/05/14 16:45,1400690700,147,28,80,13,99,7,121,11,107,19,132,12,119,24,40,10,154,25,161,20,0.401166
09/10/07 09:45,1191923100,152,56,201,35,115,47,157,29,149,47,119,19,131,40,30,11,216,136,213,64,0.22222
08/06/07 00:30,1181262600,133,47,268,41,93,26,282,40,151,30,249,39,160,46,191,45,164,64,216,42,0.121063
13/11/09 06:15,1258092900,1043,1462,1163,1456,789,1111,930,1143,954,1460,1366,1469,831,891,728,954,1092,1316,1381,1492,0.0039609
10/03/98 19:30,889558200,789,1240,1176,1262,,,,,,,,,,,,,162,271,1006,283,#DIV/0!

Carving data from log file

I have a log file containing the data below:
time=1460196536.247325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=13ms requests=517 option1=0 option2=0 errors=0 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278
I am trying to write bashscript where I try to carve these values for each line in the log file and write it to a second file:
time (converted to local time GMT+2)
latency99
requests
errors
Desired output in second file:
time latency99 requests errors
12:08:56 13 517 0
Is the easiest way to use regex for this?

Here's a Bash solution for version 4 and above, using an associative array:
#!/bin/bash
# Assoc array to hold data.
declare -A data
# Log file ( the input file ).
logfile=$1
# Output file.
output_file=$2
# Print column names for required values.
printf '%-20s %-10s %-10s %-10s\n' time latency99 requests errors > "$output_file"
# Iterate over each line in $logfile
while read -ra arr; do
# Insert keys and values into 'data' array.
for i in "${arr[#]}"; do
data["${i%=*}"]="${i#*=}"
done
# Convert time to GMT+2
gmt2_time=$(TZ=GMT+2 date -d "#${data[time]}" '+%T')
# Print results to stdout.
printf '%-20s %-10s %-10s %-10s\n' "$gmt2_time" "${data[latency99]%ms}" "${data[requests]}" "${data[errors]}" >> "$output_file"
done < "$logfile"
As you can see, the script accepts two arguments. The first one is the file name of the logfile, and the second is the output file to which parsed data will be inserted line by line for each row in the logfile.
Please notice that I used GMT+2 as the value to the TZ variable.
Use the exact area as the value instead. Like, for example, TZ="Europe/Berlin".
You might want to use the tool tzselect to know the correct string value of your area.
In order to test it, I created the following logfile, containing 3 different rows of input:
time=1260196536.242325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=10ms requests=100 option1=0 option2=0 errors=1 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278
time=1460246536.244325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=20ms requests=200 option1=0 option2=0 errors=2 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278
time=1260236536.147325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=30ms requests=300 option1=0 option2=0 errors=3 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278
Let's run the test ( script name is sof ):
$ ./sof logfile parsed_logfile
$ cat parsed_logfile
time latency99 requests errors
12:35:36 10 100 1
22:02:16 20 200 2
23:42:16 30 300 3
EDIT:
According to OP request as can be seen in the comments, and as discussed further in chat, I edited the script to include the following features:
Remove ms suffix from latency99's value.
Read input from a logfile, line by line, parse and output results to a
selected file.
Include column names only in the first row of output.
Convert the time value to GMT+2.

Here is a awk script for you. Say the logfile is mc.log and the script is saved as mc.awk, you would run it like this: awk -f mc.awk mc.log with GNU awk.
mc.awk:
BEGIN{
OFS="\t"
# some "" to align header and values in output
print "time", "", "latency99", "requests", "errors"
}
function getVal( str) {
# strip leading "key=" and trailing "ms" from str
gsub(/^.*=/, "", str)
gsub(/ms$/, "", str)
return str
}
function fmtTime( timeStamp ){
val=getVal( timeStamp )
return strftime( "%H:%M:%S", val)
}
{
# some "" to align header and values in output
print fmtTime($1), getVal($4), "", getVal($5), "", getVal($8)
}

Here's an awk version (not GNU). Converting the date would require a call to an external program:
#!/usr/bin/awk -f
BEGIN {
FS="([[:alpha:]]+)?[[:blank:]]*[[:alnum:]]+="
OFS="\t"
print "time", "latency99", "requests", "errors"
}
{
print $2, $5, $6, $9
}

Bash script to read a file and add the contents

I have the following contents in the file, and I want to filter Executor Deserialize Time and add all the values to get the final result. How can I do that?
{"Event":"SparkListenerTaskEnd","Stage ID":0,"Stage Attempt ID":0,"Task Type":"ShuffleMapTask","Task End Reason":{"Reason":"Success"},"Task Info":{"Task ID":29,"Index":29,"Attempt":0,"Launch Time":1453927221831,"Executor ID":"1","Host":"172.17.0.226","Locality":"ANY","Speculative":false,"Getting Result Time":0,"Finish Time":1453927230401,"Failed":false,"Accumulables":[]},"Task Metrics":{"Host Name":"172.17.0.226","Executor Deserialize Time":9,"Executor Run Time":8550,"Result Size":2258,"JVM GC Time":18,"Result Serialization Time":0,"Memory Bytes Spilled":0,"Disk Bytes Spilled":0,"Shuffle Write Metrics":{"Shuffle Bytes Written":0,"Shuffle Write Time":4425,"Shuffle Records Written":0},"Input Metrics":{"Data Read Method":"Hadoop","Bytes Read":134283264,"Records Read":100890}}}
{"Event":"SparkListenerTaskEnd","Stage ID":0,"Stage Attempt ID":0,"Task Type":"ShuffleMapTask","Task End Reason":{"Reason":"Success"},"Task Info":{"Task ID":30,"Index":30,"Attempt":0,"Launch Time":1453927222232,"Executor ID":"1","Host":"172.17.0.226","Locality":"ANY","Speculative":false,"Getting Result Time":0,"Finish Time":1453927230493,"Failed":false,"Accumulables":[]},"Task Metrics":{"Host Name":"172.17.0.226","Executor Deserialize Time":7,"Executor Run Time":8244,"Result Size":2258,"JVM GC Time":16,"Result Serialization Time":0,"Memory Bytes Spilled":0,"Disk Bytes Spilled":0,"Shuffle Write Metrics":{"Shuffle Bytes Written":0,"Shuffle Write Time":4190,"Shuffle Records Written":0},"Input Metrics":{"Data Read Method":"Hadoop","Bytes Read":134283264,"Records Read":100886}}}
{"Event":"SparkListenerTaskEnd","Stage ID":0,"Stage Attempt ID":0,"Task Type":"ShuffleMapTask","Task End Reason":{"Reason":"Success"},"Task Info":{"Task ID":31,"Index":31,"Attempt":0,"Launch Time":1453927222796,"Executor ID":"1","Host":"172.17.0.226","Locality":"ANY","Speculative":false,"Getting Result Time":0,"Finish Time":1453927230638,"Failed":false,"Accumulables":[]},"Task Metrics":{"Host Name":"172.17.0.226","Executor Deserialize Time":5,"Executor Run Time":7826,"Result Size":2258,"JVM GC Time":18,"Result Serialization Time":0,"Memory Bytes Spilled":0,"Disk Bytes Spilled":0,"Shuffle Write Metrics":{"Shuffle Bytes Written":0,"Shuffle Write Time":3958,"Shuffle Records Written":0},"Input Metrics":{"Data Read Method":"Hadoop","Bytes Read":134283264,"Records Read":101004}}}

grep -P -o "Executor Deserialize Time.:[0-9]+" file.txt |
cut -d: -f2 | awk '{ sum+=$1} END {print sum}'
grep that bit the part of each line with the field you want.
Split it to just grab the number.
Use awk to sum up all the values

awk -v RS=, '/^"Executor Deserialize Time":/ {split($0,a,":"); tot+=a[2]} END{print tot}' file
Set RS (the record separator) to ,.
Match records that match the desired field name.
Split the current record on :.
Add the second split field to our total.
Print the total at the END.
Or same idea but set FS (the field separator) instead
awk -F , '{for (i=1;i<=NF;i++) {if ($i ~ /^"Executor Deserialize Time":/) {split($i,a,":"); tot+=a[2]}}} END{print tot}' file
Set FS to ,.
Loop over every field from 1 to NF.
Match the desired fields.
Split the current record on :.
Add the second split field to our total.
Print the total at the END.
If you only want this for a given value of Stage ID then you could use this:
awk -v stage=0 -F , '{
ds=0; val=0
for (i=1;i<=NF;i++) {
split($i,a,":")
if (a[1] == "\"Executor Deserialize Time\"") {
val=a[2]
}
if ((a[1] == "\"Stage ID\"") && (a[2] == stage)) {
ds++
}
if (ds && val) {
tot+=val
next
}
}
}
END{print tot}' file
Which tracks whether we've seen the both necessary values for each line and only sums when we have. It uses the stage variable to do this so you can control this from outside the awk script (the -v stage=0 argument).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Adding constant values using awk - shell

Related

BASH: Performing decimal division on a column in file and printing result in another file

Parsing multiline program output

Simplify an awk "nth column sum"

Carving data from log file

Bash script to read a file and add the contents

Categories

Resources