I'm trying to optimize fftw plans for a 65536x65536 matrix.
When I run fftw-wisdom tool:
fftwf-wisdom -n -m -T 12 -o wisdomfile.fftw rob65536x65536
it fails with:
bench: util.c:217: assertion failed: p
I believe this is caused by the large matrix size.
What is the maximum supported input size?
Is there any work around?
Related
make -j$(nproc)
When compiling gcc from source, I saw there is a step of using -j to match the available cores. Must the argument be the same as core numbers? Does it do any harm if it is less than actual core numbers? For example, if I have 8 cores, but I use:
make -j 4
Any consequences?
I'm using gfortran compiler and trying to work on parallel programming without MPI. Even though, I spent too much time on reading about Fortran, gfortran, parallel programming, I couldn't do anything to use different processors at the same time.
My purpose is to create a matrix multiplication working on different processor to reduce the time. I have many ideas to do that but first of all, I have to use different processors. But even I use written codes, my computer have only one image. For example:
program hello_image
integer::a
write(*,*) "Hello from image ", this_image(), &
"out of ", num_images()," total images"
read(*,*), a
end program hello_image
This is a very simple program took from a pdf about parallel programming in Fortran. It should give the output:
Hello from image 1 out of 8
Hello from image 2 out of 8
Hello from image 3 out of 8
Hello from image 4 out of 8
Hello from image 5 out of 8
Hello from image 6 out of 8
Hello from image 7 out of 8
Hello from image 8 out of 8
But my compiler is just giving the output:
Hello from image 1 out of 1.
I use gfortran as a compiler with the command
gfortran "codename" -fcoarray=single
I spent too much time to solve this "probably simple problem" but I just couldn't solve it.
This is the output I got when I try -fcoarray=lib. That's why I was using -fcoarray=single because it is the only one that can be executed. What should I do to solve this? Thank you for help;
/tmp/ccvnPvRc.o: In function `MAIN__':
hew.f08:(.text+0x62): undefined reference to `_gfortran_caf_this_image'
hew.f08:(.text+0xa8): undefined reference to `_gfortran_caf_num_images'
/tmp/ccvnPvRc.o: In function `main':
hew.f08:(.text+0x175): undefined reference to `_gfortran_caf_init'
hew.f08:(.text+0x19f): undefined reference to `_gfortran_caf_finalize'
collect2: error: ld returned 1 exit status
Even though, I installed linuxbrew from the website of OpenCoarrays, still having the same issue. brew doctor says there is no problem at all but when I use the line
gfortran hew.f08 -fcoarray=lib -lcaf_mpi
same error appears. Should I use another package with another line? What is the package which I should download? How to download it? How to use gfortran to have a executable file? (I'm using Ubuntu)
I used
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install)"
line to install. Then, I used both of the lines for setting the path;
PATH=/home/[username]/.linuxbrew/bin:$PATH
PATH=/home/linuxbrew/.linuxbrew/bin:$PATH
Now, when I use the line
gfortran hew.f08 -fcoarray=lib -lcaf_mpi
The output is:
/usr/bin/ld: cannot find -lcaf_mpi
collect2: error: ld returned 1 exit status
I'm attempting profile an Android NDK 14b clang based application with Google's simpleperf sampling profiler. The recorded callstack samples aren't actually unwound -- just the top frame of the callstack seems to be recorded, so the profiling reports aren't very useful. I've specified -fno-omit-frame-pointer in most of the code, but this seems to make no difference.
What am I missing? Is there a more current profiler for Android NDK projects I should be using?
If you are doing frame pointer based unwinding (using --call-graph fp option), please use aarch64 architecture, because arm has combined arm/thumb code, and can't unwind well even if you use -fno-omit-frame-pointer everywhere.
If you are doing dwarf based unwinding (using -g or --call-graph dwarf option), -fno-omit-frame-pointer doesn't work, and you'd better use shared libraries containing debug info in the apk.
It is also possible that the unwinding stops at java code. To unwind java code, you need to fully compiled it into native code and use dwarf based unwinding.
After all, you can use app_profiler.py contained in the ndk r14b. It tries to handle details for you, fully compiling the java code, and downloading libraries with debug info to device. It is also easy to check and change if it doesn't work well in your environment.
There are some simpleperf options I've found I need to specify (or not specify) which seem to make it more likely that I get the expected call-graph.
If I specify '-a --cpu 1' for instance, then the binary I'm profiling won't even appear in the call graph.
For instance if I do (where perf_text.x mostly spins for 1 second on cpu 1):
simpleperf record -g -a -e cpu-cycles --cpu 1 ./perf_test.x -C 1 -w bw -t 1
simpleperf report -g caller
then perf_test.x won't appear at all (for me) in the output.
So drop the --cpu x option if you are using it.
Also, high sampling rate increases the overhead. Below runs with the (current) default sampling rate of 4000 sample/sec.
simpleperf record -g -a -e cpu-cycles -F 4000 ./perf_test.x -C 1 -w bw -t 1
simpleperf report -g caller
Above shows simpleperf as the top process using 40-70% of the samples.
Reducing the sampling rate:
simpleperf record -g -a -e cpu-cycles -F 1000 ./perf_test.x -C 1 -w bw -t 1
simpleperf report -g caller
brought perf_test.x up to the top % of total samples and the 1st simpleperf entry comes in at 24% of total samples.
Hope this is helpful.
I have used a makefile to build my code and I have produced an ELF file.
To make it understandable for my attiny85, I usually use avr-objcopy -O ihex -R .eeprom -R .fuse main.elf main_all.hex. I get a hex file containing fuse settings. I flash the hex file with avrdude -p t85 -c avrispmkII -P usb -U flash:w:main_all.hex.
I am using an avrispmkII connected via a working and tested SPI.
This time I got an error.
ERROR: address 0x820003 out of range
I guess because I've played in the code with fuses that this is the problem. According to Contiki compile error, " ERROR: address 0x820003 out of range at line 1740 of...",
I've noticed that you can make avrdude create a hex without fuses.
avr-objcopy -O ihex -R .eeprom -R. Fuse main.elf main_ohne.hex
This has also worked and now lets the attiny85 flash completely normally.
Now the real question.
How do I still get the fuses on the attiny85?
Is there any way to see which fuse I am setting how, before I set the fuses? I ask explicitly before, because I have no experience in flashing with 12V (HV) and this arvmkII synonymous not true (Yes, I should look in the data sheet whether he can).
My main concern is to get the fuses on the attiny. I am a graduate electrical engineer who is programming in the spare time. So I'm fine with overprivileged links and the magic command.
(Rough translation from the German original)
You can set the fuse bytes on the command-line of avrdude. example
There are only 3 fuse bytes on the attiny: low, high, and extended. They can be found on p. 148 of the datasheet.
Just compute the fuse setting as a hex number and include -U switches like
-U efuse:w:0xff:m -U hfuse:w:0x89:m -U lfuse:w:0x2e:m
for the extended, high, and low fuses.
Can I measure the code size with the help of an fseek() function and store it to a shell variable?
Is it possible to extract the code size, compilation time and execution time using milepost gcc or a GNU Profiler tool? If yes, how to store them into shell variables?
Since my aim is to find the best set of optimization technique upon the basis of the compilation time, execution time and code size, I will be expecting some function that can return these parameters.
MyPgm=/root/Project/Programs/test.c
gcc -Wall -o1 -fauto-inc-dec $MyPgm -o output
time -f "%e" -o Output.log ./output
while read line;
do
echo -e "$line";
Val=$line
done<Output.log
This will store the execution time to the variable Val. Similarly, I want to get the values of code size as well as compilation time.
I will prefer something that I can do to accomplish this, without using an external program!
for code size on linux, you can use size command on terminal.
$size file-name.out
it will give size of different sections. use text section for code size. you can use data and bss if you want to consider global data size as well.
You can use the size(1) command http://www.linuxmanpages.com/man1/size.1.php
Or open the ELF file, walk over section headers and sum the sizes of all the section with type SHT_PROGBITS and the SHF_EXECINSTR flag set.
On non-Linux / non-GNU-utils systems (where you may have neither GNU size nor readelf), the nm program can be used to dump symbol information (including sizes) from object files (libraries / executables). The syntax is slightly system-dependent:
OpenGroup manpage for nm (the "portable subset")
Linux/BSD manpage for nm (GNU version)
Solaris manpage for nm
AIX manpage for nm
nm usage on HP/UX (this says "PA-RISC" but the utility is present / usable on Itanium)
Windows: Doesn't have nm as such, but see: Microsoft equivalent of the nm command
Unfortunately, while the utility is available almost everywhere, its output format is not as portable as could be, so some system-specific scripting is necessary.