Erratic behavior with compiled legacy code using ifort - macos

How I wish I had a minimum working example for this!
I'm doing a bunch of linear algebra using the HSL libraries. I've turned on every debugging flag I can think of.
On my workstation, the final result of my "deterministic" code rarely works. Most of the time, it complains of an indexing error:
On my workstation (Mac OS 10.7.5 and ifort 12):
forrtl: severe (408): fort: (3): Subscript #1 of the array W has value 0 which is less than the lower bound of 1
Image PC Routine Line Source
libintlc.dylib 0000000103C83E04 Unknown Unknown Unknown
libintlc.dylib 0000000103C8259E Unknown Unknown Unknown
libifcore.dylib 00000001031FBDA1 Unknown Unknown Unknown
libifcore.dylib 000000010316BA4E Unknown Unknown Unknown
libifcore.dylib 000000010316BFB3 Unknown Unknown Unknown
On my laptop (Mac OS 10.10.5 and ifort 16):
forrtl: severe (408): fort: (3): Subscript #1 of the array A has value 0 which is less than the lower bound of 1
Image PC Routine Line Source
libifcore.dylib 000000010ABDCC96 Unknown Unknown Unknown
Uniform2DSimplifi 00000001068851EE _ma48bd_ 1461 ma48d.f
Uniform2DSimplifi 000000010693619C _solve_sparse_mat 142 solve_sparse_matrix_d.f90
Uniform2DSimplifi 000000010693A7D8 _scale_and_solve_ 128 scale_and_solve_sparse_matrix_d.f90
Uniform2DSimplifi 000000010685740D _calc_simplified_ 598 calc_simplified_equations_B.f90
Uniform2DSimplifi 0000000106832176 _MAIN__ 161 uniform_2D_simplified_B.f90
Uniform2DSimplifi 000000010683175E Unknown Unknown Unknown
(You may notice that these are actually two different errors, even though I haven't changed a line of code between them.)
My code runs successfully ~70% of the time using the newer version of ifort on my laptop, but only ~20% of the time using the older version of ifort on my workstation. Oddly, the times that it does work are often after a fresh compilation, and after working one time, it gives that error every time after that. One time it worked, didn't work the second time, then worked the third time. (Sometimes on my laptop, it works for the first 2-3 runs, but throws an error the fourth time.)
My own code is entirely deterministic: it's setting up solving linear algebra routines. It also calls the HSL routines, which evidently call MKL. I would assume that both HSL and MKL are deterministic -- that is, identical inputs produce identical outputs. (They don't call RAND() or do file I/O....) Still, I'm not sure.
I looked up line 1461 of ma48d.f:
On my laptop, it's complaining because k has a value -1 (causing the error) while it normally has a value of 0 (leading to success). What's bizarre about this is that I'm giving these routines the exact same inputs, and the code appears to be deterministic, so they should execute the exact same lines...yet they don't.
My question:
What could be causing this erratic behavior?
So far, I've thought of the following possibilities:
Internal compiler error (Supporting evidence: the newer version of ifort seems to produce better--but even just different--results.)
Something related to the stack/heap (Supporting evidence: it works the first time, but not afterward.)
MKL (BLAS) is non-deterministic (not likely) (Supporting evidence: the traceback points to libintlc.dylib, which is an Intel library...possibly related to MKL?)
HSL is non-deterministic (likely??) (Supporting evidence: the code appears to be deterministic, though at least one of the errors is in this code.)
Something about my installation of ifort or my configuration of Mac OS X? (Supporting evidence: it's an old machine and something may have gone wrong somewhere?)

In my experience, compilers are far more reliable than programmers. That is, I would suspect the program of having a programming error unless it can be proved that bad code was generated.
This kind of error is certainly due to using an uninitialized value. Look for a variable which is not specifically set to some value before being used.
program x
integer :: i, arr(10)
do while (i < 10)
arr (i) = 0
i = i + 1
print *, arr
Sometimes this code will set all the elements to zero. Other times it won't change a thing.
A directly related, but more subtle lack-of-initialization error occurs in this logic:
program y
integer :: sum, i, arrA(10), arrB(10)
real :: ave(2)
do i = 1, 10
arrA(i) = i * 343
arrB(i) = i * 121
sum = 0
do i = 1, 10
sum = sum + arrA(i)
ave(0) = sum / 10.0
do i = 1, 10
sum = sum + arrB(i)
ave(1) = sum / 10.0
print *, 'Averages are', ave
No compiler warning will show up for failing to reinitialize sum, though this sort of error is reproducible and deterministic.

I cannot add a comment - hence the answer.
You can also try -ftrapuv (initialize stack variables to an unusual value). If you are using Intel 15 or higher you can set -init=snan. This initializes 'save'd variables to signal NaN.


Can I add a watch to a COMMON block?

I have a very large, very old, very byzantine, very undocumented set of Fortran code that I am trying to troubleshoot. It is giving me divide-by-zero problems at run time due to a section that's roughly like this:
subroutine badsub(ainput)
implicit double precision (a-h,o-z)
include 'commonincludes2.h'
This code hits divide by zero on the last line, because y is equal to w because x is zero, and thus dlog(dsqrt(1) is zero.
The include file looks something like this:
common /cblk/ r(12),z(12),otherstuff
There are actually 3 include headers with /cblk/ declaration which I've found from running grep -in "/cblk/" *.h *.f *.F: "commonincludes.h", "commonincludes2.h", and "commonincludes3.h". As an added bonus, the section of memory corresponding to r and z are named x and y in "commonincludes.h", i.e. "commonincludes'h" looks like:
common /cblk/ x(12),y(12),otherstuff
My problem is, I have NO IDEA where r and z are set. I've used grep to find everyplace where each of the headers are included, and I don't see anyplace where the variables are written into.
If I inspect the actual values in r and z in gdb where the error occurs the values look reasonable--they're non-zero, not-garbage-looking vectors of real numbers, it's just that r(6) equals r(8) and z(6) equals z(8) that's causing issue.
I need to find where z and r get written, but I can't find any instruction in the gdb documentation for attaching a watchpoint to COMMON block. How can I find where these are written to?
I think I have figured out how to do what I'm trying to do. Because COMMON variables are allocated statically, their addresses shouldn't change from run to run. Therefore, when my program stops due to my divide-by-zero error, I'm able to find the memory address of (in this example) r(8), which is global in scope and shouldn't change on subsequent runs. I can then re-run the code with a watchpoint on that address and it will flag when the value changes anywhere in the code.
In my example, the gdb session looks like this, with process names and directories filed off to protect the guilty:
Reading symbols from myprogram...
(gdb) r
Starting program: ************
Program received signal SIGFPE, Arithmetic exception.
0x00000000004df96d in badsub (ainput=1875.0000521766287) at badsub.f:109
109 v=2./dlog(dsqrt(w/y))
(gdb) p &r(8)
$1 = (PTR TO -> ( real(kind=8) )) 0xcbf7618 <cblk_+56>
(gdb) watch *(double precision *) 0x0cbf7618
Hardware watchpoint 1: *(double precision *) 0x0cbf7618
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: *************
Hardware watchpoint 1: *(double precision *) 0x0cbf7618
Old value = 0
New value = 6.123233995736766e-17
0x00007ffff6f2be2d in __memmove_avx_unaligned_erms () from /lib64/
I have confirmed from running a backtrace that this is indeed a place (presumably the first place) where my common block variable is being set.

Can I make GhostScript use more than 2 GB of RAM?

I'm running a 64-bit version of GhostScript (9.50) on 64-bit processor with 16gb of RAM under Windows 7.
GhostScript returns a random-ish error message (it will tell me that I have type error in the array command) when I try to allocate one too many arrays totaling more than 2 GBs of RAM.
To be clear, I am seeing how growth of the memory usage in Windows Task Monitor, not from within GhostScript
I'd like to know why this is so.
More importantly, I'd like to know if I can override this behavior.
Edit: This code produces the error --
/TL 25000 def
/TL- TL 1 sub def
/G TL array def
0 1 TL- { dup == flush G exch TL array put }for
The error looks like this: Here's the last bit of the messages I get
Unrecoverable error: typecheck in array
Operand stack: --nostringval-- ---
Begin offending input ---
/TL 25000 def /TL- TL 1 sub def /G TL array def 0 1 TL- { dup == flush G exch TL array put }for --- End offending input --- file offset = 0 gsapi_run_string_continue returns -20
The amount of RAM is almost certainly not the limiting factor, but it would help if you were to post the actual error message. It may be 'random-ish' to you, but it's meaningful to people who program in PostScript.
More than likely you've tripped over some other internal limit, for example the operand stack size but without seeing the PostScript program or the error message I cannot say any more than that. I can say that (64-bit) Ghostscript will happily address more than 2GB of RAM, I was running a file last week which had Ghostscript using 8.1GB.
Note that PostScript itself is basically a 32-bit language; while Ghostscript has extended many of the architectural limitations documented in the PostScript Language Reference Manual (such as 64K elements in arrays and strings) moving beyond 32-bit limits is essentially unspecified.
As to whether you can change the behaviour, that depends on exactly what the problem is, and I can't tell from what's here.
Here's a screenshot of Ghostscript running the test file to completion, along with the Task Manager display showing the amount of memory the process is using. Not shown is the vmstatus which I ran from the PostScript environment afterwards. This showed that Ghostscript thinks it's using 10,010,729,850 bytes form a maximum of 10,012,037,312. My calculator says that 9,562.8MB comes out at 10,027,322,572.4 bytes, so a pretty close match.
To answer the points in the comments this is (as you can probably tell) on a 64-bit Windows 10 installation with quite a lot of memory.
The difference is, almost certainly, something which has been fixed since the release of 9.52. The 9.52 64-bit binary does exit with a VMerror after (for me) 5360 iterations. Obviously trying to use vast amounts of PostScript memory (as opposed to, say, canvas memory) is not a common occurrence, not least because many PostScript interpreters simply won't allow it, so this doesn't get exercised much.
The Ghostscript Git repository is here if you want to go through the commits and try to figure out which one caused the change. You only have to go back to March this year, anything before about the 19th March would have been in 9.52.
Beyond simple curiosity, is there a reason to try and use up loads of memory in PostScript ?

Maximum elements to loop over in fortran in linux and windows [duplicate]

I am writing some parallel Fortran90/95 code and I just came across some thing I can't understand.
I work on a Toshiba laptop with 6Go RAM.
In Windows 10, I use code::blocks. I have imported gfortran from MinGW as a compiler and compile my code with the -fopenmp flag.
I have Ubuntu 18.04 inside VirtualBox. I let it use half of my ram, that is 3Go. I compile my code on this one using gfortran -fopenmp as well.
A minimal version of the encountered code causing issue is:
program main
implicit none
integer :: i
integer, parameter :: n=500000
real, dimension(n) :: A, B
real :: som
do i =1, n
A(i)= 1.0
B(i)= 2.0
end do
do i=1, n
som = som + A(i)*B(i)
end do
print *,"somme:", som
end program main
I then let vary the value of the parameter n.
Running on Windows. For n up to approx 200.000 everything's fine. Above, I get "Process returned -1073741571 (0xC00000FD)"
Running on Ubuntu I can go up to 1.000.000 with no issue. Seems that the barrier is around 2.000.000 after which I got a segfault.
My question is how one can explain that ubuntu, in spite of having far less memory available can handle 10 times more iterations ?
Is there anything I can do on the Windows size to make it able to handle more loop iterations ?
According to Rodrigo Rodrigues comment, I added one more flag to my compiler setting:
Documentation says default is 32767 but I assume there is a different setting in code blocks and in ubuntu's native gfortran.
program main
implicit none
real,allocatable :: A(:),B(:)
real :: som
integer :: i
integer, parameter :: n=500000
do i =1, n
A(i)= 1.0
B(i)= 2.0
end do
do i=1, n
som = som + A(i)*B(i)
end do
print *,"somme:", som
end program main
real, dimension(n) :: A, B
real,allocatable :: A(:),B(:)
solved the problem, please check.

How can I debug a Fortran READ/WRITE statement with an implicit DO loop?

The Fortran program I am working is encountering a runtime error when processing an input file.
At line 182 of file ../SOURCE_FILE.f90 (unit = 1, file = 'INPUT_FILE.1')
Fortran runtime error: Bad value during integer read
Looking to line 182 I see a READ statement with an implicit/implied DO loop:
182: READ(IT4, 310 )((IPPRM2(IP,I),IP=1,NP),I=1,16) ! read 6 integers
183: READ(IT4, 320 )((PPARM2(IP,I),IP=1,NP),I=1,14) ! read 5 reals
Format statement:
310 FORMAT(1X,6I12)
When I reach this code in the debugger NP has a value of 2. I has a value of 6, and IP has a value of 67. I think I and IP should be reinitialized in the loop.
My problem is that when I try to step through in the debugger once I get to the READ statement it seems to execute and then throw the error. I'm not sure how to follow it as it reads. I tried stepping into the function, but it seems like that may be a difficult route to take since I am unfamiliar with the gfortran library. The input file looks OK, I think it should be read just fine. This makes me think this READ statement isn't looping as intended.
I am completely new to Fortran and implicit DO loops like this, but from what I can gather line 182 should read in 6 integers according to the format string #310. However, when I arrive NP has a value of 2 which makes me think it will only try to read 2 integers 16 times.
How can I debug this read statement to examine the values read into IPPARM as they are read from the file? Will I have to step through the Fortran library?
Any tips that can clear up my confusion regarding these implicit loops would be appreciated!
NOTE: I'm using gfortran/gcc and gdb on Linux.
Is there any reason you need specific formatting on the read? I would use READ(IT4, *) where feasible...
Later versions of gfortran support unlimited format reads (see link
Then it may be helpful to specify
310 FORMAT("*(1X,6I12)")
Or for older compilers
310 FORMAT(1000(1X,6I12))
The variables IP and I are loop indices and so they are reinitialized by the loop. With NP=2 the first statement is going to read a total of 32 integers -- it is contributing to the determination the list of items to read. The format determines how they are read. With "1X,6I12" they will be read as 6 integers per line of the input file. When the first 6 of the requested 32 integers is read fron a line/record, Fortran will consider that line/record completed and advance to the next record.
With a format of "1X,6I12" the integers must be precisely arranged in the file. There should be a single blank, then the integers should each be right-justified in fields of 12 columns. If they get out of alignment you could get the wrong value read or a runtime error.

Stack overflow on subroutine call only when compiled with Intel Visual Fortran and fine when compiled by Compaq Visual Fortran

Using identical source files for a Fortran .dll I can compile them with Compaq Visual Fortran 6.6C or Intel Visual Fortran (IA-32). The problem is that the execution fails on the Intel binary, but works well with Compaq. I am compiling 32-bit on a Windows 7 64-bit system. The .dll calling driver is written in C#.
The failure message comes from the dreaded _chkstk() call when an internal subroutine is called (called from the .dll entry routine). (SO answer on chkstk())
The procedure in question is declared as (pardon the fixed file format)
SUBROUTINE SRF(den, crpm, icrpm, inose, qeff, rev,
& qqmax, lvtyp1, lvtyp2, avespd, fridry, luin,
& luout, lurtpo, ludiag, ndiag, n, nzdepth,
& unit, unito, ier)
INTEGER*4 lvtyp1, lvtyp2, luin, luout, lurtpo, ludiag, ndiag, n,
& ncp, inose, icrpm, ier, nzdepth
REAL*8 den, crpm, qeff, rev, qqmax, avespd, fridry
CHARACTER*2 unit, unito
and called like this:
CALL SRF(den, crpm(i), i, inose, qeff(i), rev(i),
& qqmax(i), lvtyp1, lvtyp2, avespd, fridry,
& luin, luout, lurtpo, ludiag, ndiag, n, nzdepth,
& unit, unito, ier)
with similar variable specifications except for crpm, qeff, rev and qqmax are arrays of which only the i-th elements is used for each SRF() call.
I understand possible stack issues if the arguments are more than 8kb in size, but in this case we have 7 x real(64) + 11 x int(32) + 2 x 2 x char(8) = 832 bits only in passed arguments.
I have worked really hard to move arguments (especially arrays) into a module, but I keep getting the same error
The dissasembly from the Intel .dll is
The dissasembly from the Compaq .dll is
Can anyone offer any suggestions on what is causing the SO, or how to debug it?
PS. I have increased the reserved stack space to hundreds of Mb and the problem persists. I have tried skipping the chkstk() call in the dissasembler but in crashes the program. The stack check starts from address 0x354000 and iterates down to 0x2D2000 where it crashes accessing a guard page. The stack bottom address is 0x282000.
You are shooting the messenger. The Compaq generated code also calls _chkstk(), the difference is that it inlined it. A common optimization. The key difference between the two snippets is:
mov eax, 0D3668h
sub esp, 233E4h
The values you see used here are the amount of stack space required by the function. The Intel code requires 0xd3668 bytes = 865869 bytes. The Compaq code requires 0x233e4 = 144356. Big difference. In both cases that's rather a large amount but the Intel one is getting critical, a program normally has a one megabyte stack. Gobbling up 0.86 megabytes of it is pushing it very close, nest a couple of functions calls and you're looking at this site's name.
What you need to find out, I can't help because it is not in your snippet, is why the Intel generated function needs so much space for its local variables. Workarounds are to use the free store to find space for large arrays. Or use the linker's /STACK option to ask for more stack space (guessing at the option name).
The problem wasn't at the function call where the stack overflow occurred.
Earlier in the code, there were some global matrices initialized and they were placed in the stack and due to a bug in the code, they were still in scope and had already almost filled the stack. When the function call happened, the compiler tried to store the return address to the stack and it crashed the program.
The solution was to make the global matrices allocatable and also made sure the "Heap Arrays" option was set at an appropriate value.
Quite the rabbit hole this was, when it was 100% my buggy code the caused the issue.
