I'm just starting off in Fortran and I'm trying to generate a very simple 4x4 Hilbert matrix but my output is a bunch of zeroes, apart from a 1 at the start. Any help would be much appreciated.
Here's the code:
PROGRAM HILBERT
IMPLICIT NONE
REAL :: a(4,4)
REAL :: i, j
!Initialise matrix a with non zeroes
i = 1.0
j = 1.0
DO i = 1,4,1
a(i,4) = 0.0
DO j = 1,4,1
a=1.0/(i+j-1.0) !Attempt to generate Hilbert matrix
WRITE(6,'(f3.0)',advance='no') a(i,j) !formatted output
END DO
write(6,*) !formatted output
END DO
END PROGRAM HILBERT
Starting from your given code:
PROGRAM HILBERT
IMPLICIT NONE
REAL :: a(4,4)
REAL :: i, j
DO i = 1,4 ! The third argument is not necessary if equal to 1
DO j = 1,4
a(i,j) = 1.0/(i+j-1.0) ! generate Hilbert matrix
WRITE(6,'(f8.5)',advance='no') a(i,j) !formatted output
END DO
write(6,*) !formatted output
END DO
END PROGRAM HILBERT
The initialization of i and j is not necessary, because the loop does this automatically.
Your initialization of a(i,4) = 0 should be a(i,j) = 0, but this is not necessary. Following M.S.B.'s advice, you could do this initialization at the beginning with a = 0, but this is not necessary in this case.
A very important thing: Your format code doesn't show the fractional part of the numbers.
And: According to the newer standards, loop variables (i and j in this case) have to be integers, because the behaviour of real numbers is ambiguous.
array=scaler sets all elements of the array to the scaler. Try a(i,j)=...
In addtion to the corrections #M.S.B. and #Stefan point out, you can also more efficiently (in terms of code and error-proneness) create such a matrix with the forall statement:
program hilbert
implicit none
real a(4,4)
integer i,j
forall(i=1:4, j=1:4) a(i,j) = 1/(i+j-1.0)
! As #Stefan says, the format is an important part of the answer
print '(4F8.5)', a
end program hilbert
Related
I am searching informations about the so called "Strauss-Shamir method" for scalar multiplication upon elliptic curves. It is a method to compute k1 · P + k2 · Q in around log2(k) additions and doublings, where k1, k2 < k.
Unfortunately, while this algorithm is cited left right and center, the actual algorithm (and, dare I hope, its analysis) is not cited anywhere. Is there someone that could explain it to me, or at least give me a link on pseudocode/analysis?
Many thanks in advance!
To multiply a number P by an n-bit scalar k, you can use doubling and addition according to the binary expansion of k. Let's say that k=9. In binary, that's 1001, and you can calculate 9P like this:
R=0
R=R*2+P //the most significant '1' bit
R=R*2 //next bit is 0
R=R*2 //next bit is 0
R=R*2+P //next bit is 1
The Strauss-Shamir trick is just calculating aP + bQ by doing the additions inside the chain instead of outside. Let's say that, in binary, a=1001 and b=0011`. Then we just do this:
R=0
R=R*2+P //bits from a,b = 1,0
R=R*2 //bits from a,b = 0,0
R=R*2+Q //bits from a,b = 0,1
R=R*2+P+Q //bits from a,b = 1,1
If you precompute P+Q, then this doesn't take much longer than a single multiplication.
As a simple example to illustrate my point, I am trying to solve the following equation f(t+1) = f(t) + f(t)*Tr (f^2) starting at t=0 where Tr is the trace of a matrix (sum of diagonal elements). Below I provide a basic code. My code compiles with no errors but is not updating the solution as I want. My expected result is also below which I calculated by hand (it's very easy to check by hand via matrix multiplication).
In my sample code below I have two variables that store solution, g is for f(t=0) which I implement, and then I store f(t+1) as f.
complex,dimension(3,3) :: f,g
integer :: k,l,m,p,q
Assume g=f(t=0) is defined as below
do l=1,3 !matrix index loops
do k=1,3 !matrix index loops
if (k == l) then
g(k,l) = cmplx(0.2,0)
else if ( k /= l) then
g(k,l) = cmplx(0,0)
end if
end do
end do
I have checked this result is indeed what I want it to be, so I know f at t=0 is defined properly.
Now I try to use this matrix at t=0 and find the matrix for all time, governed by the equation f(t+1) = f(t)+f(t)*Tr(f^2), but this is where I am not correctly implementing the code I want.
do m=1,3 !loop for 3 time iterations
do p=1,3 !loops for dummy indices for matrix trace
do q=1,3
g(1,1) = g(1,1) + g(1,1)*g(p,q)*g(p,q) !compute trace here
f(1,1) = g(1,1)
!f(2,2) = g(2,2) + g(2,2)*g(p,q)*g(p,q)
!f(3,3) = g(3,3) + g(3,3)*g(p,q)*g(p,q)
!assume all other matrix elements are zero except diagonal
end do
end do
end do
Printing this result is done by
print*, "calculated f where m=", m
do k=1,3
print*, (f(k,l), l=1,3)
end do
This is when I realize my code is not being implemented correctly.
When I print f(k,l) I expect for t=1 a result of 0.224*identity matrix and now I get this. However for t=2 the output is not right. So my code is being updated correctly for the first time iteration, but not after that.
I am looking for a solution as to how to properly implement the equation I want to obtain the result I am expecting.
I'll answer a couple things you seem to be having trouble with. First, the trace. The trace of a 3x3 matrix is A(1,1)+A(2,2)+A(3,3). The first and second indexes are the same, so we use one loop variable. To compute the trace of an NxN matrix A:
trace = 0.
do i=1,N
trace = trace + A(i,i)
enddo
I think you're trying to loop over p and q to compute your trace which is incorrect. In that sum, you'll add in terms like A(2,3) which is wrong.
Second, to compute the update, I recommend you compute the updated f into fNew, and then your code would look something like:
do m=1,3 ! time
! -- Compute f^2 (with loops not shown)
f2 = ...
! -- Compute trace of f2 (with loop not shown)
trace = ...
! -- Compute new f
do j=1,3
do i=1,3
fNew(i,j) = f(i,j) + trace*f(i,j)
enddo
enddo
! -- Now update f, perhaps recording fNew-f for some residual
! -- The LHS and RHS are both arrays of dimension (3,3),
! -- so fortran will automatically perform an array operation
f = fNew
enddo
This method has two advantages. First, your code actually looks like the math you're trying to do, and is easy to follow. This is very important for realistic problesm which are not so simple. Second, if fNew(i,j) depended on f(i+1,j), for example, you are not updating to the next time level while the current time level values still need to be used.
I am trying to build a block tridiagonal matrix in Fortran. Now I have this piece of code that would deal with just the matrices that are placed in the main diagonal of the A_matrix, one new matrix for every step in i.
do i = gs+1, total_mesh_points
start_line = (3*i)-2
start_colu = (3*i)-2
final_line = (3*i)
final_colu = (3*i)
do ii = 1, 3
do jj = 1, 3
A_matrix(start_line:final_line,start_colu:final_colu) = &
impflux(ii,jj)
end do
end do
end do
Here my A_matrix(i,j) is a big matrix that will receive another three by three matrix (impflux) in its main diagonal. Note that for each step in i I will have a new impflux matrix that needs to be positioned in the main diagonal of the A_matrix.
I can't think in a more simple solution for this problem. How people usually build block diagonal matrices in Fortran ?
Here's one way to build a block tridiagonal matrix. I'm not sure that there is, outside some well-known libraries, a usual way. This is a program, I'll leave it up to you to turn it into a function.
PROGRAM test
USE iso_fortran_env
IMPLICIT NONE
INTEGER :: k ! submatrix size
INTEGER :: n ! number of submatrices along main diagonal
INTEGER :: ix ! loop index
! the submatrices, a (lower diagonal) b (main diagonal) c (upper diagonal)
REAL(real64), DIMENSION(:,:,:), ALLOCATABLE :: amx, bmx, cmx
! the block tridiagonal matrix
REAL(real64), DIMENSION(:,:), ALLOCATABLE :: mat_a
k = 3 ! set these values as you wish
n = 4
ALLOCATE(amx(n-1,k,k), bmx(n,k,k), cmx(n-1,k,k))
ALLOCATE(mat_a(n*k,n*k))
mat_a = 0.0
! populate these as you wish
amx = 1.0
bmx = 2.0
cmx = 3.0
! first the lower diagonal
DO ix = 1,k*(n-1),k
mat_a(ix+k:ix+2*k-1,ix:ix+k-1) = amx(CEILING(REAL(ix)/REAL(k)),:,:)
END DO
! now the main diagonal
DO ix = 1,k*n,k
mat_a(ix:ix+k-1,ix:ix+k-1) = bmx(CEILING(REAL(ix)/REAL(k)),:,:)
END DO
! finally the upper diagonal
DO ix = 1,k*(n-1),k
mat_a(ix:ix+k-1,ix+k:ix+2*k-1) = cmx(CEILING(REAL(ix)/REAL(k)),:,:)
END DO
END PROGRAM test
Be warned, there's no error checking here at all and I've only made a few tests.
One obvious alternative would be to loop over the rows of mat_a only once, inserting amx, bmx, cmx at the same iteration, but this would require special handling for the first and last iterations and probably look a lot more complicated. As for performance, if it matters to you run some tests.
Note also that this produces a dense matrix. If your matrix gets very large then an approach which stores only the diagonal elements might be more useful. That takes us towards derived types and operations on them, and that's a whole other question.
does somebody know the algorithm for end point detection in continuous speech? because I can't find one, the existing algorithm is for isolated word, and not continuous, plis help. If may matlab source code would be helpfull
this is my algorithm
index1=[];
for i=1:length(spektral)
if abs(spektral(i))> 0.025
y(i)=spektral(i);
index1=[index1 i];
else y(i)=0;
end
end
spasi=[];
for i=2:length(index1)-1
if index1(i)>(index1(i-1)+1)
spasi=[spasi ; index1(i-1) index1(i)]; %penentuan spasi antarkata
end
end
The first loop can be omitted completely:
[row,col,val] = find(spektral>0.025);
This will output val the same as you have defined y above. Depending on the size of spektral, either row or col will contain your index1. If spektral is a column vector it will be row, if spektral is a row vector it will be col.
The second loop you can omit as well:
[row,col,val] = find(index1(2:end,:)>index1(1:end-1,:)+1);
Note that index1 will have to be either row or col as output from the first find command.
If I understand correctly, you want to have the spectral energy below the threshold to be considered as noise and want to have more than four seconds of this spectral energy below the threshold to classify it as a silence. In that case:
[row,col,val] = find(spektral<0.025);
tmp = cummin(row); % use cummin(col) if spektral is a row vector
Here I always struggle with find a short, vectorised way to check to subsequent amount of ones in the column, I'll add it when I find the solution.
You can do this with a nested while loop, but there's bound to be a vectorised way:
kk = 1;
while kk<length(tmp)-1
silence1 = 0;
while tmp(kk) = tmp(kk+1)
silence1 = silence1+1; % Sum the length of each silence
kk = kk+1;
end
silence(kk) = silence1;
end
silence(silence1==0)=[]; % Remove zero entries
TotalSilences = (sum(silence>4)); % Find the total number of silences
I am trying to write a function in Fortran that multiplies a number of matrices with different weights and then adds them together to form a single matrix. I have identified that this process is the bottleneck in my program (this weighting will be made many times for a single run of the program, with different weights). Right now I'm trying to make it run faster by switching from Matlab to Fortran. I am a newbie at Fortran so I appreciate all help.
In Matlab the fastest way I have found to make such a computation looks like this:
function B = weight_matrices()
n = 46;
m = 1800;
A = rand(n,m,m);
w = rand(n,1);
tic;
B = squeeze(sum(bsxfun(#times,w,A),1));
toc;
The line where B is assigned runs in about 0.9 seconds on my machine (Matlab R2012b, MacBook Pro 13" retina, 2.5 GHz Intel Core i5, 8 GB 1600 MHz DDR3). It should be noted that for my problem, the tensor A will be the same (constant) for the whole run of the program (after initialization), but w can take any values. Also, typical values of n and m are used here, meaning that the tensor A will have a size of about 1 GB in memory.
The clearest way I can think of writing this in Fortran is something like this:
pure function weight_matrices(w,A) result(B)
implicit none
integer, parameter :: n = 46
integer, parameter :: m = 1800
double precision, dimension(num_sizes), intent(in) :: w
double precision, dimension(num_sizes,msize,msize), intent(in) :: A
double precision, dimension(msize,msize) :: B
integer :: i
B = 0
do i = 1,n
B = B + w(i)*A(i,:,:)
end do
end function weight_matrices
This function runs in about 1.4 seconds when compiled with gfortran 4.7.2, using -O3 (function call timed with "call cpu_time(t)"). If I manually unwrap the loop into
B = w(1)*A(1,:,:)+w(2)*A(2,:,:)+ ... + w(46)*A(46,:,:)
the function takes about 0.11 seconds to run instead. This is great and means that I get a speedup of about 8 times compared to the Matlab version. However, I still have some questions on readability and performance.
First, I wonder if there is an even faster way to perform this weighting and summing of matrices. I have looked through BLAS and LAPACK, but can't find any function that seems to fit. I have also tried to put the dimension in A that enumerates the matrices as the last dimension (i.e. switching from (i,j,k) to (k,i,j) for the elements), but this resulted in slower code.
Second, this fast version is not very flexible, and actually looks quite ugly, since it is so much text for such a simple computation. For the tests I am running I would like to try to use different numbers of weights, so that the length of w will vary, to see how it affects the rest of my algorithm. However, that means I quite tedious rewrite of the assignment of B every time. Is there any way to make this more flexible, while keeping the performance the same (or better)?
Third, the tensor A will, as mentioned before, be constant during the run of the program. I have set constant scalar values in my program using the "parameter" attribute in their own module, importing them with the "use" expression into the functions/subroutines that need them. What is the best way to do the equivalent thing for the tensor A? I want to tell the compiler that this tensor will be constant, after init., so that any corresponding optimizations can be done. Note that A is typically ~1 GB in size, so it is not practical to enter it directly in the source file.
Thank you in advance for any input! :)
Perhaps you could try something like
do k=1,m
do j=1,m
B(j,k)=sum( [ ( (w(i)*A(i,j,k)), i=1,n) ])
enddo
enddo
The square brace is a newer form of (/ /), the 1d matrix (vector). The term in sum is a matrix of dimension (n) and sum sums all of those elements. This is precisely what your unwrapped code does (and is not exactly equal to the do loop you have).
I tried to refine Kyle Vanos' solution.
Therefor I decided to use sum and Fortran's vector-capabilities.
I don't know, if the results are correct, because I only looked for the timings!
Version 1: (for comparison)
B = 0
do i = 1,n
B = B + w(i)*A(i,:,:)
end do
Version 2: (from Kyle Vanos)
do k=1,m
do j=1,m
B(j,k)=sum( [ ( (w(i)*A(i,j,k)), i=1,n) ])
enddo
enddo
Version 3: (mixed-up indices, work on one row/column at a time)
do j = 1, m
B(:,j)=sum( [ ( (w(i)*A(:,i,j)), i=1,n) ], dim=1)
enddo
Version 4: (complete matrices)
B=sum( [ ( (w(i)*A(:,:,i)), i=1,n) ], dim=1)
Timing
As you can see, I had to mixup the indices to get faster execution times. The third solution is really strange because the number of the matrix is the middle index, but this is necessary for memory-order-reasons.
V1: 1.30s
V2: 0.16s
V3: 0.02s
V4: 0.03s
Concluding, I would say, that you can get a massive speedup, if you have the possibility to change order of the matrix indices in arbitrary order.
I would not hide any looping as this is usually slower. You can write it explicitely, then you'll see that the inner loop access is over the last index, making it inefficient. So, you should make sure your n dimension is the last one by storing A is A(m,m,n):
B = 0
do i = 1,n
w_tmp = w(i)
do j = 1,m
do k = 1,m
B(k,j) = B(k,j) + w_tmp*A(k,j,i)
end do
end do
end do
this should be much more efficient as you are now accessing consecutive elements in memory in the inner loop.
Another solution is to use the level 1 BLAS subroutines _AXPY (y = a*x + y):
B = 0
do i = 1,n
CALL DAXPY(m*m, w(i), A(1,1,i), 1, B(1,1), 1)
end do
With Intel MKL this should be more efficient, but again you should make sure the last index is the one which changes in the outer loop (in this case the loop you're writing). You can find the necessary arguments for this call here: MKL
EDIT: you might also want to use some parallellization? (I don't know if Matlab takes advantage of that)
EDIT2: In the answer of Kyle, the inner loop is over different values of w, which is more efficient than n times reloading B as w can be kept in cache (using A(n,m,m)):
B = 0
do i = 1,m
do j = 1,m
B(j,i)=0.0d0
do k = 1,n
B(j,i) = B(j,i) + w(k)*A(k,j,i)
end do
end do
end do
This explicit looping performs about 10% better as the code of Kyle which uses whole-array operations. Bandwidth with ifort -O3 -xHost is ~6600 MB/s, with gfortran -O3 it's ~6000 MB/s, and the whole-array version with either compiler is also around 6000 MB/s.
I know this is an old post, however I will be glad to bring my contribution as I played with most of the posted solutions.
By adding a local unroll for the weights loop (from Steabert's answer ) gives me a little speed-up compared to the complete unroll version (from 10% to 80% with different size of the matrices). The partial unrolling may help the compiler to vectorize the 4 operations in one SSE call.
pure function weight_matrices_partial_unroll_4(w,A) result(B)
implicit none
integer, parameter :: n = 46
integer, parameter :: m = 1800
real(8), intent(in) :: w(n)
real(8), intent(in) :: A(n,m,m)
real(8) :: B(m,m)
real(8) :: Btemp(4)
integer :: i, j, k, l, ndiv, nmod, roll
!==================================================
roll = 4
ndiv = n / roll
nmod = mod( n, roll )
do i = 1,m
do j = 1,m
B(j,i)=0.0d0
k = 1
do l = 1,ndiv
Btemp(1) = w(k )*A(k ,j,i)
Btemp(2) = w(k+1)*A(k+1,j,i)
Btemp(3) = w(k+2)*A(k+2,j,i)
Btemp(4) = w(k+3)*A(k+3,j,i)
k = k + roll
B(j,i) = B(j,i) + sum( Btemp )
end do
do l = 1,nmod !---- process the rest of the loop
B(j,i) = B(j,i) + w(k)*A(k,j,i)
k = k + 1
enddo
end do
end do
end function