How can I sort set of points using awk or sed within a given interval - bash

How can I sort the following file.txt that contains a set of points using awk or sed
1.0 -0.6486
0.8 -0.2384
-0.2 0.0750
-0.2 0.0750
0.6 0.0754
0.4 0.3150
0.2 0.4985
0.1 0.5742
-0.1 0.7003
-0.2 0.7528
-0.4 0.8416
-0.6 0.9133
-0.8 0.9721
-1.0 1.0208
1.0 2.4600
0.8 2.5526
0.6 2.6431
0.4 2.7286
0.2 2.8070
0.1 2.8433
-0.1 2.9098
-0.2 2.9400
-0.4 2.9948
-0.6 3.0428
-0.8 3.0849
-1.0 3.1218
I want to delete the lines {-0.2, 0.0750} and (-0.2, 0.0750) and have final form of the file as:
1.0 -0.6486
0.8 -0.2384
0.6 0.0754
0.4 0.3150
0.2 0.4985
0.1 0.5742
-0.1 0.7003
-0.2 0.7528
-0.4 0.8416
-0.6 0.9133
-0.8 0.9721
-1.0 1.0208
1.0 2.4600
0.8 2.5526
0.6 2.6431
0.4 2.7286
0.2 2.8070
0.1 2.8433
-0.1 2.9098
-0.2 2.9400
-0.4 2.9948
-0.6 3.0428
-0.8 3.0849
-1.0 3.1218
I need a script that can check from 1.0 to -1.0 in steps on 0.1 and delete any line not following the decreasing order, in this case the two points -0.2 and write the 'sorted' set to a new file.
I am new to Linux.

First off, your question has a problem, in that you say you want to "delete any line not following the decreasing order", but -0.2 is a decrease from 0.8. But if you meant decreasing order to refer to "in steps on 0.1", then there is another problem, which is that most of your data points decrease in steps of 0.2, and only a few decrease by 0.1.
The unstated requirement from your data is that an empty line resets the script so that it starts from 1.0 again.
So I've parameterized your question as "The x value in each sub-list must decrease by 1 or 2 steps of 0.1", and written the following awk one liner (I added the newlines for clarity, but you can leave them out to use it directly from the command line, as I did):
awk 'function NewList() {
xPrev=1.0+delta
};
BEGIN {
delta=0.1;
maxSteps=2;
epsilon=0.0000001;
NewList()
};
NF {
if (($1-epsilon <= xPrev - delta) && ($1+epsilon >= xPrev - delta*maxSteps)) {
print;
xPrev=$1
}
next
};
{
NewList();
print ""
}'
Note that the "epsilon" variable gets around the inaccuracy of computer floating-point math; without it, the program may (did on my system) stop printing output after a few points, due to tiny (and not displayed) differences from the decimal values.

Related

How to sort for most negative values and most positive values across columns?

I am trying to create a new column in my dataframe based on the maximum values across 3 columns. However, depending on the values within each row, I want it to sort for either the most negative value or the most positive value. If the average for an individual row across the 3 columns is greater than 0, I want it to report the most positive value. If it is less than 0, I want it to report back the most negative value.
Here is an example of the dataframe
A B C
-0.30 -0.45 -0.25
0.25 0.43 0.21
-0.10 0.10 0.25
-0.30 -0.10 0.05
And here is the desired output
A B C D
-0.30 -0.45 -0.25 -0.45
0.25 0.43 0.21 0.43
-0.10 0.10 0.25 0.25
-0.30 -0.10 0.05 -0.30
I had first tried playing around with something like
data %>%
mutate(D = pmax(abs(A), abs(B), abs(C)))
But that just returns a column with the greatest of the absolute values where everything is positive.
Thanks in advance for your help, and apologies if the formatting of the question is off, I don't use this site a lot. Happy to clarify anything as well.

Given transition matrix for Markov chain of 5 states, find first passage time and recurrence time

Transition matrix for a Markov chain:
0.5 0.3 0.0 0.0 0.2
0.0 0.5 0.0 0.0 0.5
0.0 0.4 0.4 0.2 0.0
0.3 0.0 0.2 0.0 0.5
0.5 0.2 0.0 0.0 0.3
This is a transition matrix with states {1,2,3,4,5}. States {1,2,5} are recurrent and states {3,4} are transient. How can I (without using the fundamental matrix trick):
Compute the expected number of steps needed to first return to state 1, conditioned on starting in state 1
Compute the expected number of steps needed to first reach any of the states {1,2,5}, conditioned on starting in state 3.
If you don't want to use the fundamental matrix, you can do two things:
Create a function that simulates the Markov chain until the stopping condition is met and that returns the number of steps. Take the average over a large number of runs to get the expectation.
Introduce dummy absorbing states in your transition matrix and repeatedly calculate p = Pp where p is a vector with 1 in the index of starting state and 0 elsewhere. With some accounting you can get the expected values that you want.

in matlab generate n random numbers between 0 and 1 that sum of them is less equal than one

I want to generate n random numbers between 0 and 1 that sum of them is less equal than one.
Sum(n random number between 0 and 1) <= 1
n?
For example: 3 random number between 0 and 1:
0.2 , 0.3 , 0.4
0.2 + 0.3 + 0.4 = 0.9 <=1
It sounds like you would need to generate the numbers separately while keeping track of the previous numbers. We'll use your example:
Generate the first number between 0 and 1 = 0.2
1.0 - 0.2 = 0.8: Generate the next number between 0 and 0.8 = 0.3
0.8 - 0.3 = 0.5: Generate the next number between 0 and 0.5 = 0.4

Delete a row if the absolute value of all columns are less than 1?

I need to delete rows a table (.csv) only if in all column absolute values for that row are less than 1, how can I accomplish this?
Example
Year Parameter1 Parameter2 Parameter3 Parameter4
1 -0.3 0.1 -2.5 1.0
2 -0.3 0.1 0.8 0.1
3 -0.3 0.1 -3.8 1.6
4 -0.6 0.5 -0.2 0.4
5 0.3 -0.1 -0.5 1.3
And I want to output to result in:
Year Parameter1 Parameter2 Parameter3 Parameter4
1 -0.3 0.1 -2.5 1.0
3 -0.3 0.1 -3.8 1.6
5 0.3 -0.1 -0.5 1.3
Thanks in advance!

Multiple palettes and empty labels from file entries using matrix with image in gnuplot

I have a file with a 4x4 score matrix and I'd like to plot the upper triangular with one color palette and the lower triangular with a different one, overlaying the score value (MWE at the bottom).
The original file looks like this
0.00 0.65 0.65 0.25
0.25 0.00 0.75 0.25
0.50 0.60 0.00 0.25
0.75 0.25 0.10 0.00
First, I created two separate files and used multiplot to have 2 different palettes.
FILE1 (upper triangular)
0.00 0.65 0.65 0.25
nan 0.00 0.75 0.25
nan nan 0.00 0.25
nan nan nan 0.00
FILE2 (lower triangular)
0.00 nan nan nan
0.25 0.00 nan nan
0.50 0.60 0.00 nan
0.75 0.25 0.10 0.00
Second, I plot the score values with
using 1:2:( sprintf('%.2f', $3 ) )
However, the 'nan' isn't interpreted as blank/empty and skipped but written onto the plot.
Any idea how to skip the nans and make gnuplot plot empty labels from individual entries of the data files?
The ternary operator in the following fashion do not seem to do the job
using 1:2:( $3 == 'nan' ? 1/0 : sprintf('%.2f', $3 ))
Thanks.
set multiplot
set autoscale fix
unset key
set datafile missing "nan"
set cbrange [0:1]
unset colorbox
set palette defined (0 "white", 0.1 "#9ecae1", 1.0 "#3182bd")
plot FILE1 matrix with image, \
FILE1 matrix using 1:2:( sprintf('%.2f', $3) ) with labels font ',16'
set palette defined (0 "white", 0.1 "#a1d99b", 1.0 "#31a354")
plot FILE2 matrix with image, \
FILE2 matrix using 1:2:( sprintf('%.2f', $3) ) with labels font ',16'
unset multiplot
You don't need to use multiplot and two separate files (I also couldn't get this working with the labels).
Just define a single palette, which contains as negative values one palette and as positive values the other palette. Based on the x and y-value from the single file you show first, you can now distinguish if the color value should be taken from the negative or from the positive palette part:
set autoscale fix
set cbrange [-1:1]
unset colorbox
unset key
set palette defined (-1.0 "#31a354", -0.1 "#a1d99b", 0 "white", 0.1 "#9ecae1", 1.0 "#3182bd")
plot 'FILE' matrix using 1:2:($1<$2 ? -$3 : $3) with image,\
'' matrix using 1:2:(sprintf('%.2f', $3)) with labels font ',16'

Resources