Using bash to extract numbers and convert to CSV file - bash

I am quite new in using bash to extract but I am not what search terms to look for my problem. I like to extract data for some variables from a very large log file.
Sample of logfile
temp[min,max]=[ 24.0000000000000 .. 834.230000000000 ]
CHANGE working on TEMS
RMS(TEMS)= 6.425061887244621E-002 DIFMAX: 0.896672707535103
765 1 171
CHANGE working on PHI
RMS(PHI )= 1.92403467949391 DIFMAX: 62.3113693145351
765 1 170
CHANGE working on TEMP
RMS(TEMP)= 6.425061887244621E-002 DIFMAX: 0.896672707535103
765 1 171
PMONI working
TIMSTP working
COPEQE working : INFO
DELT = 630720000.000000 sec
Courant-Number in x,y,z:
Max. : 5.05 , 0.00 , 6.93
Min. : 0.00 , 0.00 , 0.00
Avg. : 0.568E-02, 0.00 , 0.383
PROBLEM: Courant-Number(s) greater than 1 : 11.9802093558263
max. TEMP-Peclet in X: 653 1
170
max. TEMP-Peclet in Y: 653 1
170
Temperature-Peclet-Number in x,y,z:
Max. : 0.357 , 0.00 , 0.313E-01
Min. : 0.00 , 0.00 , 0.00
Avg. : 0.307E-03, 0.00 , 0.435E-03
Temperature-Neumann-Number in x,y,z:
Max.: 64.9 , 64.9 , 64.9
Min.: 0.619E-02, 0.619E-02, 0.619E-02
Avg.: 35.5 , 35.5 , 35.5
PROBLEM: Temp-Neumann-Number greater than 0.5 : 194.710793368065
(Dominating: Courant-Number)
DRUCK working
KOPPX working
#########################################################################
STRESS PERIOD: 1 1
1 of 100 <<<<<
Time Step: 50 ( 1.0% of 0.315E+13 sec )(0.631E+09 sec )
#########################################################################
### Continues on ###
I managed to extract the lines relating to the variables I am looking for using bash.
grep -A 3 'Courant-Number in x,y,z' logfile.log > courant.txt
grep -A 2 'Max.' courant.txt > courant.txt
to get this...
Max. : 0.146E+04, 0.00 , 0.169E+04
Min. : 0.00 , 0.00 , 0.00
Avg. : 1.15 , 0.00 , 0.986
--
Max. : 0.184E+04, 0.00 , 0.175E+04
Min. : 0.00 , 0.00 , 0.00
Avg. : 1.13 , 0.00 , 1.05
--
Max. : 0.163E+04, 0.00 , 0.172E+04
Min. : 0.00 , 0.00 , 0.00
Avg. : 1.13 , 0.00 , 1.17
I would like to convert this data to a CSV file with the following columns, thus making a total of 9 columns.
Max_x | Max_y | Max_z | Min_x | Min_y | Min_z | Avg_x | Avg_y | Avg_z
I would like to continue to use bash to get this data. Any inputs will be most appreciated.
Thanks!

You've got a good start. I had a much worse solution a bit earlier, but then I learned about paste -d.
grep -A 3 'Courant-Number in x,y,z' logfile.log |
grep -A 2 'Max.' |
grep -v -- '--' |
sed 's/^.*://' |
paste -d "," - - - |
sed 's/ *//g'
find courant number + 3 lines
find max + 2 following lines
get rid of lines that have '--'
get rid of min: max: avg:
join every three lines with commas
get rid of whitespace

Related

Split multiple lines after matching a pattern

Sorry for a newbie question but i have a file that looks like this and wanted to capture a line after a certain string which is /aggr.
/aggr0_usts_nz_3001/plex0/rg0:
9g.10.0 0 4.08 0.00 .... . 4.08 1.00 41 0.00 .... .
1a.10.1 0 4.08 0.00 .... . 4.08 1.00 10 0.00 .... .
9g.10.4 0 4.08 0.00 .... . 4.08 1.00 49 0.00 .... .
1a.10.1 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
9g.10.4 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
/aggr1_usts_nz_3001/plex0/rg0:
1e.00.0 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
9o.01.44 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
1e.00.1 4 994.04 994.04 1.44 119 0.00 .... . 0.00 .... .
9o.01.41 4 981.91 981.91 1.41 141 0.00 .... . 0.00 .... .
1e.00.4 4 811.19 811.19 1.14 149 0.00 .... . 0.00 .... .
9o.01.14 4 809.99 809.99 1.14 119 0.00 .... . 0.00 .... .
1e.00.1 4 980.86 980.86 1.19 144 0.00 .... . 0.00 .... .
9o.01.11 4 998.89 998.89 1.11 140 0.00 .... . 0.00 .... .
/aggr1_usts_nz_3001/plex0/rg1:
9a.10.14 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
9e.40.14 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
1g.11.14 4 999.10 999.10 1.16 110 0.00 .... . 0.00 .... .
1o.41.14 4 996.90 996.90 1.44 118 0.00 .... . 0.00 .... .
9a.10.11 4 911.11 911.11 1.44 116 0.00 .... . 0.00 .... .
9e.40.11 4 919.48 919.48 1.11 141 0.00 .... . 0.00 .... .
1g.11.11 4 900.44 900.44 1.16 146 0.00 .... . 0.00 .... .
1o.41.11 1 694.19 694.19 1.19 109 0.00 .... . 0.00 .... .
9a.10.14 4 941.44 941.44 1.61 111 0.00 .... . 0.00 .... .
i wanted to take out a certain line after say for example /aggr0 and redirect it to a file. so sample would be, file1 will have this information
/aggr0_usts_nz_3001/plex0/rg0:
9g.10.0 0 4.08 0.00 .... . 4.08 1.00 41 0.00 .... .
1a.10.1 0 4.08 0.00 .... . 4.08 1.00 10 0.00 .... .
9g.10.4 0 4.08 0.00 .... . 4.08 1.00 49 0.00 .... .
1a.10.1 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
9g.10.4 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
and then file2 would be this information and so on.
/aggr1_usts_nz_3001/plex0/rg0:
1e.00.0 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
9o.01.44 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
1e.00.1 4 994.04 994.04 1.44 119 0.00 .... . 0.00 .... .
9o.01.41 4 981.91 981.91 1.41 141 0.00 .... . 0.00 .... .
1e.00.4 4 811.19 811.19 1.14 149 0.00 .... . 0.00 .... .
9o.01.14 4 809.99 809.99 1.14 119 0.00 .... . 0.00 .... .
1e.00.1 4 980.86 980.86 1.19 144 0.00 .... . 0.00 .... .
9o.01.11 4 998.89 998.89 1.11 140 0.00 .... . 0.00 .... .
so it's like segregating the information on the file.
had this command below but since lines after aggr are not the same, it will only show what was defined (which is 7)
for i in `cat sample.txt`; do
echo $i | grep aggr* -A 7
done
but it's only showing what I grepped.
this command here prints 2 lines after matching the pattern however what i want is to redirect it to a file.
awk '/aggr/{x=NR+2}(NR<=x){print}' sample.txt
any idea how can I accomplish it.
You may use this awk:
awk '/^\/aggr/ {close(fn); fn = "file" ++fNo ".txt"} {print > fn}' file
This might work for you (GNU csplit):
csplit -n1 -ffile file '/^\/aggr/' '{*}'
This will produce 4 file from your example file0 file1 file2 file3, where file0 is empty. If you don't mind numbers starting from zero, use:
csplit -zn1 -ffile file '/^\/aggr/' '{*}'
This will elide the first empty file.
For a sed solution using bash:
sed -En '/^\/aggr/!b;x;s/^/1/;x;:a;H;$!{n;/^\/aggr/!ba};x
s/^(\S+)\n(.*)/echo "\2" > file\1;echo $((\1+1))/e;x;$!ba' file
In essence - this gathers up a split of the file in the hold space and writes it out when it encounters the next split or the at the end of the file.
The file number is prepped when the first split is encountered as the first line of the hold space and after the split is written out, the file number is also incremented using standard bash arithmetic and replaces the contents of the hold space.

Time.time in unity

i see a video of how to move a cube like snake game move
HI
in this video ( https://www.youtube.com/watch?v=aT2zNLSFQEk&list=PLLH3mUGkfFCVNs51eK8ftCAlI3hZQ95tC&index=11 ) he declare float name **lastMove **with no value (zero by default) and use it in condition and **minus **it with Time.time then assign it to **lastMove **.
my question is what is the effect of lastMove in condition when it has no value?
if i clear it from "if statement" the game run fast but if remain in "if statement" time passed very slower
What he does is check continuously if time - lastMove is bigger than a given predefined interval (timeInBetweenMoves). Time keeps increasing each frame while lastMove is fixed. So at some point this condition will be true. When it is, he updates lastMove with the value of time to "reset the loop" = to make the minus difference lower than the interval again.The point of doing this is to move only at a fixed interval (0.25 secs) instead of every frame. Like this:
interval = 0.25 (timeBetweenMoves)
time (secs) | lastMove | time - lastMove
-----------------------------------------
0.00 | 0 | 0
0.05 | 0 | 0.05
0.10 | 0 | 0.10
0.15 | 0 | 0.15
0.20 | 0 | 0.20
0.25 | 0 | 0.25
0.30 | 0 | 0.30 ---> bigger than interval: MOVE and set lastMove to this (0.30)
0.35 | 0.30 | 0.5
0.40 | 0.30 | 0.10
0.45 | 0.30 | 0.15
0.50 | 0.30 | 0.20
0.55 | 0.30 | 0.25
0.60 | 0.30 | 0.30 ---> bigger than interval: MOVE and set lastMove to time (0.60)
0.65 | 0.60 | 0.5
0.70 | 0.60 | 0.10
...
This is kind of a throttling.

Why does if condition print zero value for non-zero condition?

I am unable to get the required output from bash code.
I have a text file:
1 0.00 0.00
2 0.00 0.00
3 0.00 0.08
4 0.00 0.00
5 0.04 0.00
6 0.00 0.00
7 -3.00 0.00
8 0.00 0.00
The required output should be only non-zero values:
0.08
0.04
-3.0
This is my code:
z=0.00
while read line
do
line_o="$line"
openx_de=`echo $line_o|awk -F' ' '{print $2,$3}'`
IFS=' ' read -ra od <<< "$openx_de"
for i in "${od[#]}";do
if [ $i != $z ]
then
echo "openx_default_value is $i"
fi
done
done < /openx.txt
but it also gets the zero values.
To get only the nonzero values from columns 2 and 3, try:
$ awk '$2+0!=0{print $2} $3+0!=0{print $3}' openx.txt
0.08
0.04
-3.00
How it works:
$2+0 != 0 {print $2} tests to see if the second column is nonzero. If it is nonzero, then the print statement is executed.
We want to do a numeric comparison between the second column, $2, and zero. To tell awk to treat $2 as a number, we first add zero to it and then we do the comparison.
The same is done for column 3.
Using column names
Consider this input file:
$ cat openx2.txt
n first second
1 0.00 0.00
2 0.00 0.00
3 0.00 0.08
4 0.00 0.00
5 0.04 0.00
6 0.00 0.00
7 -3.00 0.00 8 0.00 0.00
To print the column name with each value found, try:
$ awk 'NR==1{two=$2; three=$3; next} $2+0!=0{print two,$2} $3+0!=0{print three,$3}' openx2.txt
second 0.08
first 0.04
first -3.00
awk '{$1=""}{gsub(/0.00/,"")}NF{$1=$1;print}' file
0.08
0.04
-3.00

iostat & steal time

I am trying to catch some data from iostat output:
# iostat -m
avg-cpu: %user %nice %system %iowait %steal %idle
9.92 0.00 14.17 0.01 0.00 75.90
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 6.08 0.00 0.04 2533 261072
dm-0 1.12 0.00 0.00 1290 30622
dm-1 0.00 0.00 0.00 1 0
dm-2 1.22 0.00 0.00 0 33735
dm-3 7.22 0.00 0.03 1213 196713
How can I match the "0.00" value?
Numbers aren't separated by a tab or a constant number of spaces.
Also the value can be 3 digits 0.00 or 4 digits 45.00, etc.
Any idea how to match it using bash?
Try this, using awk:
iostat | awk 'NR==3 { print $5 }'
NR==3 will operate on the third line, and $5 prints column 5. Verify that the proper column is being selected by playing around with the number, i.e. using your output and print $4 should yield 0.01.

awk search and calculate standard deviation different results

I am working to take the output of sar and calculate the standard deviation of a column. I can perform this successfully with a single column in a file. However when I calculate this same column in a file where I am stripping out the 'bad' lines like the title lines and avg lines, it is giving me a different value.
Here are the files I am performing this on:
/tmp/saru.tmp
# cat /tmp/saru.tmp
Linux 2.6.32-279.el6.x86_64 (progserver) 09/06/2012 _x86_64_ (4 CPU)
11:09:01 PM CPU %user %nice %system %iowait %steal %idle
11:10:01 PM all 0.01 0.00 0.05 0.01 0.00 99.93
11:11:01 PM all 0.01 0.00 0.06 0.00 0.00 99.92
11:12:01 PM all 0.01 0.00 0.05 0.01 0.00 99.93
11:13:01 PM all 0.01 0.00 0.05 0.00 0.00 99.93
11:14:01 PM all 0.01 0.00 0.04 0.00 0.00 99.95
11:15:01 PM all 0.01 0.00 0.06 0.00 0.00 99.92
11:16:01 PM all 0.01 0.00 2.64 0.01 0.01 97.33
11:17:01 PM all 0.02 0.00 21.96 0.00 0.08 77.94
11:18:01 PM all 0.02 0.00 21.99 0.00 0.08 77.91
11:19:01 PM all 0.02 0.00 22.10 0.00 0.09 77.78
11:20:01 PM all 0.02 0.00 22.06 0.00 0.09 77.83
11:21:01 PM all 0.02 0.00 22.10 0.03 0.11 77.75
11:22:01 PM all 0.01 0.00 21.94 0.00 0.09 77.95
11:23:01 PM all 0.02 0.00 22.15 0.00 0.10 77.73
11:24:01 PM all 0.02 0.00 22.02 0.00 0.09 77.87
11:25:01 PM all 0.02 0.00 22.03 0.00 0.13 77.82
11:26:01 PM all 0.02 0.00 21.96 0.01 0.14 77.86
11:27:01 PM all 0.02 0.00 22.00 0.00 0.09 77.89
11:28:01 PM all 0.02 0.00 21.91 0.00 0.09 77.98
11:29:01 PM all 0.03 0.00 22.02 0.02 0.08 77.85
11:30:01 PM all 0.14 0.00 22.23 0.01 0.13 77.48
11:31:01 PM all 0.02 0.00 22.26 0.00 0.16 77.56
11:32:01 PM all 0.03 0.00 22.04 0.01 0.10 77.83
Average: all 0.02 0.00 15.29 0.01 0.07 84.61
/tmp/sarustriped.tmp
# cat /tmp/sarustriped.tmp
0.05
0.06
0.05
0.05
0.04
0.06
2.64
21.96
21.99
22.10
22.06
22.10
21.94
22.15
22.02
22.03
21.96
22.00
21.91
22.02
22.23
22.26
22.04
The Calculation based on /tmp/saru.tmp:
# awk '$1~/^[01]/ && $6~/^[0-9]/ {sum+=$6; array[NR]=$6} END {for(x=1;x<=NR;x++){sumsq+=((array[x]-(sum/NR))**2);}print sqrt(sumsq/NR)}' /tmp/saru.tmp
10.7126
The Calculation based on /tmp/sarustriped.tmp ( the correct one )
# awk '{sum+=$1; array[NR]=$1} END {for(x=1;x<=NR;x++){sumsq+=((array[x]-(sum/NR))**2);}print sqrt(sumsq/NR)}' /tmp/sarustriped.tmp
9.96397
Could someone assist and tell me why these results are different and is there a way to get the corrected results with a single awk command. I am trying to do this for performance so not using a separate command like grep or another awk command is preferable.
Thanks!
UPDATE
so I tried this ...
awk '
$1~/^[01]/ && $6~/^[0-9]/ {
numrec += 1
sum += $6
array[numrec] = $6
}
END {
for(x=1; x<=numrec; x++)
sumsq += ((array[x]-(sum/numrec))^2)
print sqrt(sumsq/numrec)
}
' saru.tmp
and it works correctly for the sar -u output I was working with. I do not see why it would not work with other 'lists'. To make it short, trying to work with sar -r column 5. it is giving a wrong answer again... Output is giving 1.68891 but actual deviation is .107374... this is the same command that worked with sar -u..... if you need files I can provide. Just not sure how to make a new 'full' comment... so i just edited the old one...thanks!
I think the bug is that your first awk line (the one that operates on saru.tmp) does not ignore the invalid lines, so when you do math using NR your result depends on the number of skipped lines. When you remove all of the invalid/skipped lines the result is the same from both programs. So in the first command, you should use the number of valid lines rather than NR in your math.
How about this?
awk '
$1 ~ /^[01]/ && $6~/^[0-9]/ {
numrec += 1
sum += $6
array[numrec] = $6
}
END {
for(x=1; x<=numrec; x++)
sumsq += (array[x]-(sum/numrec))^2
print sqrt(sumsq/numrec)
}
' saru.tmp
For debugging problems like this, the simplest technique is to print out some basic data. You might print the number of items, and the sum of the values, and the sum of the squares of the values (or sum of the squares of the deviations from the mean). This will likely tell you what's different between the two runs. Sometimes, it might help to print out the values you're accumulating as you're accumulating the data. If I had to guess, I'd suspect you are counting inappropriate lines (blanks, or the decoration lines), so the counts are different (and maybe the sums too).
I have a couple of (non-standard) programs to do the calculations. Given the 23 relevant lines from the multi-column output in a file data, I ran:
$ colnum -c 6 data | pstats
# Count = 23
# Sum(x1) = 3.557200e+02
# Sum(x2) = 7.785051e+03
# Mean = 1.546609e+01
# Std Dev = 1.018790e+01
# Variance = 1.037934e+02
# Min = 4.000000e-02
# Max = 2.226000e+01
$
The standard deviation here is the sample standard deviation rather than the population standard deviation; the difference is dividing by (N-1) for the sample and N for the population.

Resources