I am using a gcc command recommended by Maxim which has the -fsingle-precision-constant option. (For the purposes of this question, assume I have to use that option, for compatibility with Maxim stuff. Although, I must say, things seem to work just fine without that option.) Is there a documented way to write a double-precision constant when using this option?
I have found that appending "L" to the constant seems to make it double-precision. For example:
double d = 0.123456789123456L;
But I did a quick Google search and did not find any documentation for this syntax. As KamilCuk points out, the "L" suffix is documented in the C spec. However, the gcc documentation seems to override it:
-fsingle-precision-constant causes floating-point constants to be loaded in single precision even when this is not exact.
To make sense of the option in the first place, we have to assume that the gcc documentation overrides the C spec. Therefore, in my reading, constants ending in "L" are not exempt from the demotion effected by -fsingle-precision-constant.
Is there a documented way to write a double-precision constant when using -fsingle-precision-constant?
Total := 60 + 10;
LD #60
ADD #10
ST Total
00101000 00111100
00111000 00001010
01100000 00101001
Going from the HLL code to the binary uses a translator, a question in A levels is asking what translator has been used. But the answer does not allow interpreter, only compiler is the correct answer, why is it so? There isn't any other information.
Well, you haven't given the full wording of the question so I don't know why you think that, unless it is multiple choice or something.
But that data suggests they want 'compiler' as the answer because the first translation is from high level language to some sort of assembly, and then from assembly to binary code.
So only translation has been done; the code has not been executed yet, and therefore can't have been interpreted.
An interpreter may possibly include such a translation as a first pass (but more usually as one step), and that process may be called compilation. For example, high-level language to byte-code.
For the program I'm working on, I'd like to limit the length of each compiled function, so as to provide a hard upper-bound on the distance1 required to reach a function boundary2. Is there an option in GCC or Clang (or really any compiler framework/toolchain) that will enable function splitting to do this? Or are there limitations that I'm not aware of preventing this?
1 Distance here defined as any discrete unit smaller than a function - i.e., number of instructions, number of basic blocks, number of grey hairs on Jon Skeet's head3, etc.
2 I'm defining function boundary as "location where a new stack frame is pushed on to the CPU's stack". To my understanding, this happens almost exclusively when a new function is called (except occasionally for leaf functions that don't themselves call other functions).
3 This is just a joke. We all know that Jon Skeet's hair doesn't turn grey - it just garbage collects and a new hair is instantiated, good as new.
I'm not aware of any compiler switch, but you don't need one. The size of symbols in the text segment is easily obtained with nm:
$ nm -AP a.out|awk '$3=="T" {print $2 " " $5}'
main 000000000000005b
Note that this requires an unstripped executable. Many nm implementations provide additional options, such as decimal numbers instead of hex, which makes comparing the numbers a little easier. Turning this into a script to output functions larger than X left as an exercise :-)
If I have the follow snippet in my header file:
#define banana 4
#define orange 2
#define fruit banana|orange
Is the compiler smart enough to just use 6 wherever "fruit" appears in the program?
I assume so, but I hate to assume. It's idiotic that it would perform a boolean OR between 2 constant numbers every time.
If so, ditto with other operators? e.g. banana * orange, etc
#define lines are directives to perform text substitution. This is a separate phase of the compilation, called preprocessing. The name should hint you that it happens before normal processing.
The compiler textually replaces #defined names with their definitions at the very early stage. In your example, it replaces banana with 2, orange with 4, fruit with banana|orange and then with 2|4. For the rest of the compilation it only sees 2|4, and deals with it exactly like with any other constant expression.
Are compiles smart enough to deal with constant expressions intelligently? Well, the compilers are around for the last 50 years, and they deal with constant expressions like this all this time. Be rest assured they know about constant folding quite a bit. If you doubt it, you can always look at the generated assembly language.
I'm new to fortran and to gfortran. I learned that whole expression arrays are calculated in parallel, but I see that calculations only take place in just one core of my computer.
I use the following code:
program prueba_matrices
implicit none
integer, parameter :: num = 5000
double precision, dimension(1:num,1:num) :: A, B, C
double precision, dimension (num*num) :: temp
integer :: i
temp = (/ (i/2.0, i=1,num*num) /)
A = reshape(temp, (/ num, num/) )
B = reshape(temp, (/ num, num/) )
C = matmul(A , B)
end program prueba_matrices
I complie like this:
gfortran prueba_matrices.f03 -o prueba_gfortran
And, watching the graphs produced in real time by gnome-system-monitor, I can see that there is only one core working. If I substitute the line with the calculation
C = matmul(A , B)
for
C = A * B
It yields the same behaviour.
What am I doing wrong?
GFortran/GCC does have some automatic parallelization features, see http://gcc.gnu.org/wiki/AutoParInGCC . They are frequently not that good, so they are not enabled at any of the -ON optimization levels, you have to select it specifically with -ftree-parallelize-loops=N, where N is the number of threads you want to use. Note however that in your example above a loop like "A*B" is likely constrainet by memory bandwidth (for sufficiently large arrays), and thus adding cores might not help that much. Furthermore, the MATMUL intrinsic leads to an implementation in the gfortran runtime library, which is not compiled with the autopar options (unless you have specifically built it that way).
What could help your example code above more is to actually enable any optimization at all. With -O3 Gfortran automatically enables vectorization, which can be seen as a way to parallelize loops as well, although not over several cpu cores.
If you want your call to matmult from gfortran to be multithreaded, easiest is to simply link to external BLAS package that has been compiled with multithreading support. Candidates include OpenBlas (née Goto Blas), ATLAS, or commercial packages like Intel's MKL, AMD's ACML, or Apple's accelerate framework.
So for instance, for this simple example:
program timematmult
real, allocatable, dimension(:,:) :: A, B, C
integer, parameter :: N = 2048
allocate( A(N,N) )
allocate( B(N,N) )
allocate( C(N,N) )
call random_seed
call random_number(A)
call random_number(B)
C = matmul(A,B)
print *, C(1,1)
deallocate(C)
deallocate(B)
deallocate(A)
end program timematmult
With the base matmul:
$ gfortran -o matmult matmult.f90
$ time ./matmult
514.38751
real 0m6.518s
user 0m6.374s
sys 0m0.021s
and with the multithreaded gotoblas library:
$ gfortran -o matmult matmult.f90 -fexternal-blas -lgoto2
$ time ./matmult
514.38696
real 0m0.564s
user 0m2.202s
sys 0m0.964s
Note in particular here that the real time is less than the user time, indicating multiple cores are being used.
I think that a key sentence in the course that you cited is "With array assignment there is no implied order of the individual assignments, they are performed, conceptually, in parallel." The key word is "conceptually". It isn't saying that whole array expressions are actually executed in parallel; you shouldn't expect more than one core to be used. For that, you need to use OpenMP or MPI (outside of Fortran itself) or the coarrays of Fortran 2008.
EDIT: Fortran didn't have, as part of the language, actual parallel execution until the coarrays of Fortran 2008. Some compilers might provide parallelization otherwise and some language features make it easier for compilers to implement parallel execution (optionally). The sentence that I cited from the web article better states reality than the portion you cite. Whole-array expressions were not intended to require parallel execution; they are a syntactical convenience to the programmer, making the language higher level, so that array operations can be expressed in single statements, without writing do loops. In any case, no article on the web is definitive. Your observation of the lack of parallel executions shows which statement is correct. It does not contradict the Fortran language.