Fortran Sort Method Numerical recipies - sorting

i m using C to Call Fortran,
my fortran is calling sort() method
*-----------------------------------------------------------------------
* SUBROUTINE sort(A,n)
* Subroutine de la librairie "Numerical Recipes"
* (C) Copr. 1986-92 Numerical Recipes Software
*-----------------------------------------------------------------------
SUBROUTINE sort(arr,n)
INTEGER n,M,NSTACK
REAL arr(n)
PARAMETER (M=7,NSTACK=50)
INTEGER i,ir,j,jstack,k,l,istack(NSTACK)
REAL a,temp
jstack=0
l=1
ir=n
1 if(ir-l.lt.M)then
do 12 j=l+1,ir
a=arr(j)
do 11 i=j-1,1,-1
if(arr(i).le.a)goto 2
arr(i+1)=arr(i)
11 continue
i=0
2 arr(i+1)=a
12 continue
if(jstack.eq.0)return
ir=istack(jstack)
l=istack(jstack-1)
jstack=jstack-2
else
k=(l+ir)/2
temp=arr(k)
arr(k)=arr(l+1)
arr(l+1)=temp
if(arr(l+1).gt.arr(ir))then
temp=arr(l+1)
arr(l+1)=arr(ir)
arr(ir)=temp
endif
if(arr(l).gt.arr(ir))then
temp=arr(l)
arr(l)=arr(ir)
arr(ir)=temp
endif
if(arr(l+1).gt.arr(l))then
temp=arr(l+1)
arr(l+1)=arr(l)
arr(l)=temp
endif
i=l+1
j=ir
a=arr(l)
3 continue
i=i+1
if(arr(i).lt.a)goto 3
4 continue
j=j-1
if(arr(j).gt.a)goto 4
if(j.lt.i)goto 5
temp=arr(i)
arr(i)=arr(j)
arr(j)=temp
goto 3
5 arr(l)=arr(j)
arr(j)=a
jstack=jstack+2
if(jstack.gt.NSTACK)pause 'NSTACK too small in sort'
if(ir-i+1.ge.j-l)then
istack(jstack)=ir
istack(jstack-1)=i
ir=j-1
else
istack(jstack)=j-1
istack(jstack-1)=l
l=i
endif
endif
goto 1
END
And if i call sort method many times, i have a segfault in this method :(
It's legacy code but i'm trust in it because it come from numerical recipies.
But i'm suspicious about some things, in particular this line :
if(jstack.gt.NSTACK)pause 'NSTACK too small in sort'
If i'm in this case, my programm will in pause? how it's possible that a sort method do this?
And if this line is suspicious, how can i trust the entire code?
does anyone know a problem with this sort subroutine? does anybody know another method to doing sort in fortran? Because i can replace this sort method by another one but i'm new in fortran and i can't write another one.
I add that no problem if i run this method in mono thread, but if i run it in multi thread environement, problem is here. sorry to don't mentionned when i wrote my question but i see this after writing it.
DEBUG information
current thread: t#41
[1] __lwp_kill(0x0, 0x6, 0x0, 0x6, 0xffbffeff, 0x0), at 0xff2caa58
[2] raise(0x6, 0x0, 0xff342f18, 0xff2aa378, 0xffffffff, 0x6), at 0xff265a5c
[3] abort(0x7400, 0x1, 0x0, 0xfcb78, 0xff3413d8, 0x0), at 0xff24194c
[4] os::abort(0x1, 0x0, 0xff011084, 0xfefdc000, 0x7d94, 0x7c00), at 0xfee7d3cc
[5] VMError::report_and_die(0x0, 0xff038640, 0xff031ff4, 0x1, 0xfee81b94, 0xff031ff4), at 0xfef0cd58
[6] JVM_handle_solaris_signal(0xb, 0xacffefe0, 0xacffed28, 0x8000, 0xff030fa0, 0x2013d8), at 0xfea73d48
[7] __sighndlr(0xb, 0xacffefe0, 0xacffed28, 0xfea7325c, 0x0, 0x1), at 0xff2c6e78
---- called from signal handler with signal 11 (SIGSEGV) ------
[8] sort_(0xfe2b1350, 0xfe2b135c, 0xfe2b1000, 0x1c00, 0x443bfc7b, 0xfe292484), at 0xfe27e498
[9] mediane_(0xa9c1624c, 0xacfff2ac, 0xa9c16060, 0xa9c05c34, 0x0, 0x19), at 0xfe27a38c
(dbx) frame 8
0xfe27e498: sort_+0x01d8: ld [%l4 + %l1], %f4
(dbx) disassemble
0xff2caa58: __lwp_kill+0x0008: bcc,a,pt %icc,__lwp_kill+0x18 ! 0xff2caa68
0xff2caa5c: __lwp_kill+0x000c: clr %o0
0xff2caa60: __lwp_kill+0x0010: cmp %o0, 91
0xff2caa64: __lwp_kill+0x0014: move %icc,0x4, %o0
0xff2caa68: __lwp_kill+0x0018: retl
0xff2caa6c: __lwp_kill+0x001c: nop
0xff2caa70: __lwp_self : mov 164, %g1
0xff2caa74: __lwp_self+0x0004: ta %icc,0x00000008
0xff2caa78: __lwp_self+0x0008: retl
0xff2caa7c: __lwp_self+0x000c: nop
in m in solaris with dbx=> gdb on
i try to inspect adress but what can i type to have interesting informations?
After adding -g option to f90 compiler, in dbx i can see value or var and see the result:
t#88 (l#88) terminated by signal ABRT (Abort)
0xff2caa58: __lwp_kill+0x0008: bcc,a,pt %icc,__lwp_kill+0x18 ! 0xff2caa68
Current function is sort
578 temp=arr(k)
(dbx) print n
n = 19
(dbx) print arr
arr =
(1) 725.0666
(2) 741.5034
(3) 730.8196
(4) 754.3707
(5) 741.718
(6) 741.718
(7) 741.8914
(8) 745.9141
(9) 744.6705
(10) 741.718
(11) 745.8358
(12) 743.3788
(13) 746.2706
(14) 746.2706
(15) 750.1498
(16) 754.3707
(17) 754.3707
(18) 754.3707
(19) 748.2084
(dbx) print istack
istack =
(1) 7
(2) 12
(3) 17
(4) 18
(5) 8
(6) 9
(7) 1
(8) 4
(9) 0
(10) 0
(11) 0
(12) 0
(13) 0
(14) 0
(15) 0
(16) 0
(17) 0
(18) 0
(19) 0
(20) 0
(21) 0
(22) 0
(23) 0
(24) 0
(25) 0
(26) 0
(27) 0
(28) 0
(29) 0
(30) 0
(31) 0
(32) 0
(33) 0
(34) 0
(35) 0
(36) 0
(37) 0
(38) 0
(39) 0
(40) 0
(41) 0
(42) 0
(43) 0
(44) 0
(45) 0
(46) 0
(47) 0
(48) 0
(49) 0
(50) 0
(dbx) print jstack
jstack = -31648
(dbx)
how its possible taht jstack have a -31648 val! istack have only 50 element and istack(jstack) retrun me a abd value! how its possible? :)
thanks by advance

Too large for a comment:
Unfortunately the stack trace does not show the exact line of failure. What is the evidence that the error is at the line you indicated. I could imagine there could be error in calling the run-time system, sou you should try to change the PAUSE statement to write, read *, or something similar.
I did some tests with your subroutine in gfortran:
parameter (n = 100000)
dimension b(n)
do i=1,1000
call random_number(b)
call sort(b,n)
end do
end
with different array sizes and loop bounds. This calls the sort with many different inputs. I enabled all checks and sanitizations and didn't encounter a single problem.
Edit:
It works with OpenMP too:
parameter (n = 100000)
real,allocatable :: b(:)
!$omp parallel private(b)
allocate(b(n))
!$omp do
do i=1,1000
call random_number(b)
call sort(b,n)
end do
!$omp end do
!$omp end parallel
end

Related

reduce the cache misses by increasing size of array - why does this work?

Given this piece of code from the textbook Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson):
float dotprod(float x[8], float y[8])
{
float sum = 0.0;
int i;
for (i = 0; i < 8; i++)
sum += x[i] * y[i];
return sum;
}
And this is the cache information:
sizeof(float) = 4.
Array x begins at memory address 0x0 and is stored in row-major order. y follows x.
In each case below, the cache is initially empty.
The only memory accesses are to the entries of the array x and y. All other variables are stored in registers.
Cache size 32 bytes and cache block size is 16 bytes. The cache is direct mapped cache.
We're given the explanation that the miss rate is 100%, due to constant thrashing. Then they go on to say this:
I am wondering why float x[12] would be their choice instead of it being one of these options for example:
x[9]
x[10]
x[11]
x[16]
Like why would none of these 4 options work (I was told that it wouldn't work well?, but I am not sure why as the person didn't give an explanation) and only x[12] would work as a replacement for the x[8] array. Or do they give varying changes in the percentage of cache misses present in the code?
In a direct mapped cache of 32 bytes with 16-byte blocks you will have 2 blocks. This means that for a given address you will have 1 bit to identify the block index, 4 bits for the block offset used to identify the particular byte inside the block (2 for word offset and 2 for byte offset), and the rest will be the tag.
The cache would look like this:
Valid? | Tag | Index | Data
-------+-----+-------+-----
| | 0 | (16 bytes)
-------+-----+-------+-----
| | 1 | (16 bytes)
-------+-----+-------+-----
Let's assume 8 bit addresses for simplicity (with longer addresses you only get a longer tag, nothing else changes). In case of 8 bit addresses you would have 3 bits for the tag, 1 for the block index and 4 for the block offset. So, given an address, e.g. 0x34, you can decompose it like so:
block offset
tag |
001|1|0100
|
block index
Now assume that the two arrays are in memory one right after the other, like this:
x[0] | x[1] | ... | x[7] | y[0] | y[1] | ... | y[7]
If x starts at address 0x0, we would have the following situation:
element address element address
x[0] 000|0|0000 (0) y[0] 001|0|0000 (32)
x[1] 000|0|0100 (4) y[1] 001|0|0100 (36)
x[2] 000|0|1000 (8) y[2] 001|0|1000 (40)
x[3] 000|0|1100 (12) y[3] 001|0|1100 (44)
x[4] 000|1|0000 (16) y[4] 001|1|0000 (48)
x[5] 000|1|0100 (20) y[5] 001|1|0100 (52)
x[6] 000|1|1000 (24) y[6] 001|1|1000 (56)
x[7] 000|1|1100 (28) y[7] 001|1|1100 (60)
| |
block index block index
As you can see, the problem here is that the block index is always the same between any two elements of x and y with the same array index. This means a cache miss will happen for every single array access, since you first access x[i] and then y[i] (or the opposite), and each time you have values for the wrong array in the cache block.
Now suppose you add the appropriate padding after the end of x:
x[0] | x[1] | ... | x[7] | ...PADDING... | y[0] | y[1] | ... | y[7]
You are now in a much better situation:
element addr element addr
x[0] 000|0|0000 (0) y[0] 01|1|0000 (48)
x[1] 000|0|0100 (4) y[1] 01|1|0100 (52)
x[2] 000|0|1000 (8) y[2] 01|1|1000 (56)
x[3] 000|0|1100 (12) y[3] 01|1|1100 (60)
x[4] 000|1|0000 (16) y[4] 10|0|0000 (64)
x[5] 000|1|0100 (20) y[5] 10|0|0100 (68)
x[6] 000|1|1000 (24) y[6] 10|0|1000 (72)
x[7] 000|1|1100 (28) y[7] 10|0|1100 (76)
| |
block index block index
Indeed, this situation is optimal, you will only have 4 cache misses (x[0], y[0], x[4], y[4]).
Why wouldn't the other options work? Well, let's see.
=== WITH x[9] =======================================
element address element address
x[0] 000|0|0000 (0) y[0] 001|0|0100 (36)
x[1] 000|0|0100 (4) y[1] 001|0|1000 (40)
x[2] 000|0|1000 (8) y[2] 001|0|1100 (44)
x[3] 000|0|1100 (12) y[3] 001|1|0000 (48)
x[4] 000|1|0000 (16) y[4] 001|1|0100 (52)
x[5] 000|1|0100 (20) y[5] 001|1|1000 (56)
x[6] 000|1|1000 (24) y[6] 001|1|1100 (60)
x[7] 000|1|1100 (28) y[7] 010|0|0000 (64)
=== WITH x[10] ======================================
element address element address
x[0] 000|0|0000 (0) y[0] 001|0|1000 (40)
x[1] 000|0|0100 (4) y[1] 001|0|1100 (44)
x[2] 000|0|1000 (8) y[2] 001|1|0000 (48)
x[3] 000|0|1100 (12) y[3] 001|1|0100 (52)
x[4] 000|1|0000 (16) y[4] 001|1|1000 (56)
x[5] 000|1|0100 (20) y[5] 001|1|1100 (60)
x[6] 000|1|1000 (24) y[6] 010|0|0000 (64)
x[7] 000|1|1100 (28) y[7] 010|0|0100 (68)
=== WITH x[11] =====================================
element address element address
x[0] 000|0|0000 (0) y[0] 001|0|1100 (44)
x[1] 000|0|0100 (4) y[1] 001|1|0000 (48)
x[2] 000|0|1000 (8) y[2] 001|1|0100 (52)
x[3] 000|0|1100 (12) y[3] 001|1|1000 (56)
x[4] 000|1|0000 (16) y[4] 001|1|1100 (60)
x[5] 000|1|0100 (20) y[5] 010|0|0000 (64)
x[6] 000|1|1000 (24) y[6] 010|0|0100 (68)
x[7] 000|1|1100 (28) y[7] 010|0|1000 (72)
=== WITH x[16] =====================================
element address element address
x[0] 000|0|0000 (0) y[0] 010|0|0000 (64)
x[1] 000|0|0100 (4) y[1] 010|0|0100 (68)
x[2] 000|0|1000 (8) y[2] 010|0|1000 (72)
x[3] 000|0|1100 (12) y[3] 010|0|1100 (76)
x[4] 000|1|0000 (16) y[4] 010|1|0000 (80)
x[5] 000|1|0100 (20) y[5] 010|1|0100 (84)
x[6] 000|1|1000 (24) y[6] 010|1|1000 (88)
x[7] 000|1|1100 (28) y[7] 010|1|1100 (92)
As you can easily tell from the above, none of those choices of padding are optimal. x[16] is basically identical to the worst case, you have too much padding. x[9] only solves the problem for 1/4 of the elements, x[10] for 2/4 of the elements and x[11] for 3/4 of the elements.
So, exactly as you say, those options give varying changes in the percentage of cache misses present in the code, but none of them is optimal (i.e. lowest miss rate possible).
The only best configuration is one where you have any padding such that x starts with block index 0 and y starts with block index 1, then x[4] is at block index 1 and y[4] is at block index 0. Either this, or the complete opposite. So good values for the size of array x are 8 * N + 4 for any N >= 1, with the smallest being 12.

What is << stand for in ruby with integer

What is use of << I understand in array it is used for push but here I am not clear what is purpose of this in following code. Where it is being used integer.
def array_pack(a)
a.reverse.reduce(0) { |x, b| (x << 8) + b }
end
array_pack([24, 85, 0]) # will print 21784
like if I x is 8 and I write 8 << 8 it gives me response of 2048 so is it converting in bytes? or what exact is its purpose.
It is a Bitwise LEFT shift operator.
Definition:
The LEFT SHIFT operator << shifts each bit of a number to the left by n positions.
Example:
If you do 7 << 2 = 28
7 in Base 2: 0000 0111
128 64 32 16 8 4 2 1
----------------------
7: 0 0 0 0 0 1 1 1
Now shift each bit to the left by 2 positions
128 64 32 16 8 4 2 1
----------------------
28: 0 0 0 1 1 1 0 0
Why?
Bitwise operators are widely used for low-level programming on embedded systems to apply a mask (in this case to integer)
Benefits
See this SO answer: link
View Source for more details: link
As the documentation says, Integer:<< - Returns the integer shifted left "X" positions, or right if "X" is negative. In your scenario is shifts 8 positions to the left.
Here is how it works:
8.to(2) => "1000"
Now let's shift "1000" 8 positions to the left
(8 << 8).to_s(2) => "100000000000"
If you count the 0 above you will see it added 8 after "1000".
Now, let's see how it returns 2048
"100000000000".to_i(2) => 2048

cross validated predictions using glmnet

Does anyone know if glmnet produces cross-validated predictions ie predictions based on the fold that was left out of the model building (what one usually thinks of as cross-validated) rather than cross-validated predictions being predictions all from the same model based on an optimal lambda which is established by cross-validation ?
predict.cv.glmnet just passes the 'glmnet' fit for all of the data to predict.glmnet as you suspect.
However, the argument keep returns predictions for the training data (fitted values) based on the left-out datasets. The fold each record is assigned to is recorded as the element foldid.
> library(glmnet)
> # keep prevalidated array
> cvf1 <- cv.glmnet(x = as.matrix(mtcars[, c("disp", "hp", "mpg")]),
+ y = mtcars$am, family = "binomial", keep = TRUE)
> dim(mtcars)
# [1] 32 11
> length(cvf1$lambda)
# [1] 84
> # leave-n out fitted predictions
> # 84 columns, 2 columns padded with NAs
> dim(cvf1$fit.preval)
# [1] 32 86
> # performance of cross-validated model predictions
> round(mtcars$am - cvf1$fit.preval[, cvf1$lambda == cvf1$lambda.min])
# [1] 1 1 0 0 0 0 0 0 -1 0 0 0 0 0 0
# [16] 0 0 0 0 0 -1 0 0 0 0 0 0 0 1 0
# [31] 0 0
> cvf1$foldid
# [1] 1 6 6 1 1 8 9 6 2 5 9 4 4 2 2
# [16] 10 5 2 3 4 10 3 1 3 10 9 7 8 7 8
# [31] 7 5

Convert binary values to a decimal matrix

Suppose that I have a matrix a= [1 3; 4 2], I convert this matrix to binary format using this code:
a=magic(2)
y=dec2bin(a,8)
e=str2num(y(:))';
The result is :
y =
00000001
00000100
00000011
00000010
e =
Columns 1 through 17
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Columns 18 through 32
0 0 0 0 1 0 0 0 0 1 1 1 0 1 0
Now when I want to get back my original matrix I inverse the functions :
s=num2str(e(:))';
r=bin2dec(s)
The results I got is:
r =
1082
What can I do to get the orignal matrix? not a number
Thank you in advance
You are doing extra processes which destroyed the original structure:
a=magic(2)
y=dec2bin(a,8)
r=bin2dec(y)
Here r is your answer since y has removed the matrix structure of a. To recreate your matrix, you need to:
originalmatrix = reshape(r,size(a))
originalmatrix =
1 3
4 2
I finally got the right solution for my problem and I want to share it in case anyone need it :
a_back=reshape(bin2dec(num2str(reshape(e, 4, []))), 2, 2)
a =
1 3
4 2

How can I draw a triangle in an image in MATLAB?

I need to draw a triangle in an image I have loaded. The triangle should look like this:
1 0 0 0 0 0
1 1 0 0 0 0
1 1 1 0 0 0
1 1 1 1 0 0
1 1 1 1 1 0
1 1 1 1 1 1
But the main problem I have is that I do not know how I can create a matrix like that. I want to multiply this matrix with an image, and the image matrix consists of 3 parameters (W, H, RGB).
You can create a matrix like the one in your question by using the TRIL and ONES functions:
>> A = tril(ones(6))
A =
1 0 0 0 0 0
1 1 0 0 0 0
1 1 1 0 0 0
1 1 1 1 0 0
1 1 1 1 1 0
1 1 1 1 1 1
EDIT: Based on your comment below, it sounds like you have a 3-D RGB image matrix B and that you want to multiply each color plane of B by the matrix A. This will have the net result of setting the upper triangular part of the image (corresponding to all the zeroes in A) to black. Assuming B is a 6-by-6-by-3 matrix (i.e. the rows and columns of B match those of A), here is one solution that uses indexing (and the function REPMAT) instead of multiplication:
>> B = randi([0 255],[6 6 3],'uint8'); % A random uint8 matrix as an example
>> B(repmat(~A,[1 1 3])) = 0; % Set upper triangular part to 0
>> B(:,:,1) % Take a peek at the first plane
ans =
8 0 0 0 0 0
143 251 0 0 0 0
225 40 123 0 0 0
171 219 30 74 0 0
48 165 150 157 149 0
94 96 57 67 27 5
The call to REPMAT replicates a negated version of A 3 times so that it has the same dimensions as B. The result is used as a logical index into B, setting the non-zero indices to 0. By using indexing instead of multiplication, you can avoid having to worry about converting A and B to the same data type (which would be required to do the multiplication in this case since A is of type double and B is of type uint8).

Resources