bash search output for similar text and perform calculation between the 2 - bash

I am working on a script that will run a pm2 list and assign it to a variable, wait X seconds and run it again assigning it to a different variable. Then I run those through a comm <(echo "$pm2_1") <(echo "$pm2_2") -3 that gives me only the output that is different between the 2 in a nice format
name ID restart count
prog-name 0 1
prog-name 0 2
prog-name-live 10 1
prog-name-live 10 8
prog-name-live 3 1
prog-name-live 3 4
prog-name-live 6 1
prog-name-live 6 6
What I need is a way to compare the restart counts on the 2 lines with similar IDs.. EX
name ID restart count
prog-name 0 1
prog-name 0 2
prog-name-worker 10 1
prog-name-worker 10 8
Any ideas would be very helpful!
Thanks

awk supports hash hope that helps
awk '{k=$1" "$2; a[k]=$3; print k, a[k]}'
here is example of using it to find difference, you can try any logic
awk '{k=$1" "$2; if (a[k]==0)a[k]=$3; else {a[k]-=$3; q=a[k]>0?a[k]:a[k]*-1;print k,q}}'

Related

Loop through a file and paste columns next to one another

Given I have a python script as follows:
#!/usr/bin/python
for i in range(1,4):
print i
I want to run it in a bash loop for 3 times but I want to add the output as columns rather than concatenating. Is there a way to achieve this?
Output:
1 1 1
2 2 2
3 3 3
Like this?:
$ for i in {1..3} ; do echo $i $i $i ; done
1 1 1
2 2 2
3 3 3
You are looking for the pr command:
for i in 1 2 3 ; do
python a.py
done | pr -t -3
Output:
1 1 1
2 2 2
3 3 3
Btw, to get the numbers from 1 to 3 you need to use:
range(1,4) # <-- 4, not 3!
in Python

Add up every 5 rows in a column of integers BASH

I am writing a parser, and have to so some fancy stuff. I am trying not to use python, but I might have to at this point.
Given an STDOUT that looks like this:
1
0
2
3
0
0
1
0
0
2
0
3
0
4
0
5
0
2
.
.
.
For about 100,000 lines. What I need to do is add up every 5, like so:
1 - start
0 |
2 | - 6
3 |
0 - end
0 - start
1 |
0 | - 3
0 |
2 - end
0 - start
3 |
0 | - 7
4 |
0 - end
5
0
2
.
.
.
The -, |, start, end, are all for visual representation, I just need it in a column list:
6
3
7
.
.
.
I currently have a method of doing this by using an increment head -n $i and tail -n 5 to cut 5 rows out of the list, then I use paste -sd+ - | bc to add up all the values. But this is wayyyy to slow because there is 100,000 columns.
If anyone has anything to add I would appreciate it. Let me know if more info is needed.
Thank you
It looks like awk is a natural tool to use:
awk '{ sum += $1 } NR % 5 == 0 { print sum; sum = 0 }'
Add values in column 1 to sum. If the record number modulo 5 is 0, print the sum and reset it to 0. Note that if the last group of records is short (1-4 elements in the group), their sum is not printed. If you want the sum for the short group printed, add END { if (NR % 5 != 0) print sum } to the script.
Since this makes a single pass over the data file using a single command, it will be hard to beat it. Using Perl might be a little faster. I don't know how Python would fare against either Awk or Perl.
You can use awk for it.
Say file named file1 contains
1
0
2
3
0
0
1
0
0
2
0
3
0
4
0
5
0
.
.
.
So the awk command goes like:
awk 'begin{sum=0;} {sum=sum+1;if(NR%5==0){print sum;sum=0;}}' file1

Way to grab a line based on lines value

I have an example like so:
1 2 3 4 5 6 7 8 9 10 2.2
1 3 2 3 2 3 2 3 2 33 1.1
11 values per line, all single spaced.
The occasional random character thrown in, but that's it. I'm trying to find a way to copy the line in which the last value is less than a some user/predetermined value. Something akin to a 'grep if $last <= 2', but I can't think of one nor can I find one.
Thanks for any help!
Simple awk use case:
awk -v val=2 '$NF < val' file
Output:
1 3 2 3 2 3 2 3 2 33 1.1

Length of a sequence of numbers using seq in shell

I am new to shell scripting and I am trying a simple task of getting the length of a sequence of numbers generated using seq.
With the help of a related post here: How to find the array length in unix shell? I was able to do this -
a=(1 2 3 4 5)
echo ${#a[#]} #length of a
5 #length of a = 5 (This is fine !!)
However when I try to do a similar thing using seq ..
b=$(seq 1 1 10)
echo $b
1 2 3 4 5 6 7 8 9 10
echo ${#b[#]}
1 #the length of b is 1, while I expect it to be 10
Why does this happen ? Are the variable types a and b different? is b not an array ?
I am sure I am missing something very trivial here, help is greatly appreciated.
Thanks
Ashwin
You need to store the output in an array to find the length of the array:
$ b=($(seq 1 1 10))
$ echo ${#b[#]}
10
Saying b=$(seq 1 1 10) doesn't produce an array.
Try
echo ${b[0]}
It will be 1 2 3 4 5 6 7 8 9 10 because all your values are stored in first element of array a as a string.
b=($(seq 1 1 10))
will do what you want.

Converting a series of matrix files into an index of coordinates in awk

I have a time series of files 0000.vx.dat, 0000.vy.dat, 0000.vz.dat; ...; 0077.vx.dat, 0077.vy.dat, 0077.vz.dat... Each file is a space-separated 2D matrix. I would like to take each triplet of files and combine them all into a coordinate-based data format, i.e.:
[timestep + 1] [i] [j] [vx(i,j)] [vy(i,j)] [vz(i,j)]
Each file number corresponds to a particular time step. Given the amount of data I have in this time series (~ 4 GB), bash wasn't cutting it so it seemed to be time to head over to awk... specifically mawk. It was pretty stupid to try this in bash but here is
my ill-fated attempt:
for x in $(seq 1 78)
do
tfx=${tf[$x]} # an array of padded zeros
for y in $(seq 1 1568)
do
for z in $(seq 1 1344)
do
echo $x $y $z $(awk -v i=$z -v j=$y "FNR == i {print j}" $tfx.vx.dat) $(awk -v i=$z -v j=$y "FNR == i {print j}" $tfx.vy.dat) $(awk -v i=$z -v j=$y "FNR == i {print j}" $tfx.vz.dat) >> $file
done
done
done
edit: Thank you, ruakh, for pointing out that I had kept j in shell variable format with a $ in front! This is just a snippet of the original script, but I guess would be considered the guts of it!
Suffice it to say this would have taken about six months because of all the memory overhead in bash associated with O(MxN) algorithms, subshells and pipes and whatnot. I was looking for more along the lines of a day at most. Each file is around 18 MB, so it should not be that much of a problem. I would be happy with doing this one timestep at a time in awk provided that I get one output file per timestep. I could just cat them all together without much issue afterwords, I think. It is important, though, that the time step number be the first item on the coordinate list. I could achieve this with an awk -v argument (see above) in with a bash routine. I do not know how to look up specific elements of matrices in three separate files and put them all together into one output. This is the main hurdle I would like to overcome. I was hoping mawk could provide a nice balance between effort and computational speed. If this seems to be too much for an awk script, I could go to something lower level, and would appreciate any of those answering letting me know I should just go to C instead.
Thank you in advance! I really like awk, but am afraid I am a novice.
The three files, 0000.vx.dat, 0000.vy.dat, and 0000.vz.dat would read as follows (except huge and of the correct dimensions):
0000.vx.dat:
1 2 3
4 5 6
7 8 9
0000.vy.dat:
10 11 12
13 14 15
16 17 18
0000.vz.dat:
19 20 21
22 23 24
25 26 27
I would like to be able to input:
awk -v t=1 -f stackoverflow.awk 0000.vx.dat 0000.vy.dat 0000.vz.dat
and get the following output:
1 1 1 1 10 19
1 1 2 2 11 20
1 1 3 3 12 21
1 2 1 4 13 22
1 2 2 5 14 23
1 2 3 6 15 24
1 3 1 7 16 25
1 3 2 8 17 26
1 3 3 9 18 27
edit: Thank you, shellter, for suggesting I put the desired input and output more clearly!
Personally, I use gawk to process most of my text files. However, since you have requested a mawk compatible solution, here's one way to solve your problem. Run, in your present working directory:
for i in *.vx.dat; do nawk -f script.awk "$i" "${i%%.*}.vy.dat" "${i%%.*}.vz.dat"; done
Contents of script.awk:
FNR==1 {
FILENAME++
c=0
}
{
for (i=1;i<=NF;i++) {
c++
a[c] = (a[c] ? a[c] : FILENAME FS NR FS i) FS $i
}
}
END {
for (j=1;j<=c;j++) {
print a[j] > sprintf("%04d.dat", FILENAME)
}
}
When you run the above, the results should be a single file for each set of three files containing your coordinates. These output files will have the filenames in the form: timestamp + 1 ".dat". I decided to pad these filenames with four 0's for your convenience. But you can change this to whatever format you like. Here's the results I get from the sample data you've posted. Contents of 0001.dat:
1 1 1 1 10 19
1 1 2 2 11 20
1 1 3 3 12 21
1 2 1 4 13 22
1 2 2 5 14 23
1 2 3 6 15 24
1 3 1 7 16 25
1 3 2 8 17 26
1 3 3 9 18 27

Resources