I am using FFTW for FFTs, it's all working well but the optimisation takes a long time with the FFTW_PATIENT flag. However, according to the FFTW docs, I can improve on this by reusing wisdom between runs, which I can import and export to file. (I am using the floating point fftw routines, hence the fftwf_ prefix below instead of fftw_)
So, at the start of my main(), I have :
char wisdom_file[] = "optimise.fft";
fftwf_import_wisdom_from_filename(wisdom_file);
and at the end, I have:
fftwf_export_wisdom_to_filename(wisdom_file);
(I've also got error-checking to check the return is non-zero, omitted for simplicity above, so I know the files are reading and writing correctly)
After one run I get a file optimise.fft with what looks like ASCII wisdom. However, subsequent runs do not get any faster, and if I create my plans with the FFTW_WISDOM_ONLY flag, I get a null plan, showing that it doesn't see any wisdom there.
I am using 3 different FFTs (2 real to complex and 1 inverse complex to real), so have also tried import/export in each FFT, and to separate files, but that doesn't help.
I am using FFTW-3.3.3, I can see that FFTW-2 seemed to need more setting up to reuse wisdom, but the above seems sufficient now- what am I doing wrong?
Related
I have a sample code :
#include<iostream>
main()
{
int i = 10; //Line 1
std::cout<<" i : "<<i; //Line 2
}
I want to somehow insert another statement (lets say one more std::cout) between Line-1 and Line 2.
Direct way is to change the source code and add required line. But for my source code compilation takes lot of time, so i can't afford to change the code frequently. So i want an automatic way such that i will be able to execute any statement in any function from outside so that upon hitting that function it execute my newly given statement.
I am not sure if this is possible or not. But any direction in writing the original code in such a way that my requirement can be fulfilled would be helpful.
[for debugging prupose]
If you want new C++ code to be executed, it must first be compiled. I don't think you can avoid that. You can however try to reduce how long the compilation takes, through various practices such as using header guards and being selective with headers.
There is a lot you can do in gdb to modify the behaviour of your program when it hits a nonstop breakpoint. The print command can also be used to change values, eg print i=0 actually sets i to zero.
Just remember that all these changes and hacks need to be ported back into the source code and tested again! I have lost many excellent edits over the years doing inline hacks in running code, and then exiting without reviewing the changes.
After developing an elaborate TCL code to do smoothing based on Gabriel Taubin's smoothing without shape shrinkage, the code runs extremely slow. This is likely due to the size of unstructured grid I am smoothing. I have to use TCL because the grid generator I am using is Pointwise and Pointwise's "macro language" is TCL based. I'm still a bit new to this, but is there a way to run an external code from TCL where TCL sends the data to the software, the software runs the smoothing operation, and output is sent back to TCL to update the internal data inside the Pointwise grid generation tool? I will be writing the smoothing tool in another language which is significantly faster.
There are a number of options to deal with code that "runs extremely show". I would start with determining how fast it must run. Are we talking milliseconds, seconds, minutes, hours or days. Next it is necessary to determine which part is slow. The time command is useful here.
But assuming you have decided that more performance is necessary and you have some metrics for your current program so you will know if you are improving, here are some things to try:
Try to improve the existing code. If you are using the expr command, make sure your expressions are given to the command as a single argument enclosed in braces. Beginners sometimes forget this and the improvement can be substantial.
Use the critcl package to code parts of the program in "C". Critcl allows you to put "C" code directly into your Tcl program and have that code pulled out, compiled and loaded into your program.
Write a traditional "C" based Tcl extension. Tcl is very extensible and has a clean API for building extensions. There is sample code for extensions and source to many extensions is readily available.
Write a program to do the time consuming part of the job and execute it as a separate process and obtain the output back into your Tcl script. This is where the exec command comes in useful. Presumably you will have to write data out to some where the program can get it and read the output of the program back into your Tcl script. If you want to get fancy you can do two-way communications across a localhost TCP port. The set up in Tcl is quite simple. The "C" code in a program to do it is a bit more tedious, but many examples exist out on the Internet.
Which option to choose depends very much on how much improvement is required and the amount of code that must be improved. You haven't given us much idea what those things are in your case, so all I can offer is rather vague general solutions.
For a loadable module, you can write a Tcl extension. An example is here:
File Last Modified Time with Milliseconds Precision
Alternatively, just write your program to take input from a file. Have Tcl write the input data to the file, run the program, then collect the output from the external program.
I am preparing LaTex/Tex fragments with lua programs getting information from SQL request to a database (LuaSQL).
I wish I could see intermediate states of expansion for debugging purpose but also to control what has been brought from SQL requests and lua processings.
My dream would be for instance to see the code of my Latex page as if I had typed it myself manually with all the information given by the SQL requests and the lua processing.
I would then have a first processing with my lua programs and SQL request to build a valid and readable luaLatex code I could amend if necessary ; then I would compile again that file to obtain the wanted pdf document.
Today, I use a lua development environment, ZeroBrane Studio, to execute and test the lua chunk before I integrate it in my luaLatex code. For instance:
my lua chunk :
for k,v in pairs(data.param) do
print('\\def\\'..k..'{'..data.param[k]..'}')
end
lua print out:
\gdef\pB{0.7}
\gdef\pAinterB{0.5}
\gdef\pA{0.4}
\gdef\pAuB{0.6}
luaLaTex code :
nothing visible ! except that now I can use \pA for instance in the code
my dream would be to have, in the luaLatex code :
\gdef\pB{0.7}
\gdef\pAinterB{0.5}
\gdef\pA{0.4}
\gdef\pAuB{0.6}
May be a solution would stand in the use of the expl3 extension ? But since I am not familiar with it nor with the precise Tex expansion process, I prefer to ask you experts before I invest heavily in the understanding of this module.
Addition :
Pushing forward the reflection, a consequence would be that from a Latex code I get a Latex code instead of a pdf file for instance. This implies that we use only the first steps of the four TeX processors as described by Veijkhout in "TeX by Topic" : the input processor, the expansion processor (but with a controlled depth of expansion), not the execution processor nor the visual processor. Moreover, there would be need to show the intermediate state, that means a new processor able to show tokens back into readable strings and correct Tex/Latex code that can be processed later.
Unless somebody has already done or seen something like that, I feel that my wish may be unfeasible in the short and middle terms. What is your feeling, should I abandon any hope ?
I have a C program, which I compiled with the -g options, and then runned with:
valgrind --tool=cachegrind --branch-sim=yes ./myexecutable
This let me know which function contains a bottleneck. However this is a pretty long function and it is not clear to me from which part of the function I get most of the cache missings. I cannot (do not want to) divide it in two different parts.
Is there a way (maybe including a valgrind.h or with some magical #pragma stuff), to instruct Valgrind to make different statistics for different parts of a function?
To check function-by-function values, you have probably used cg_annotate like this:
cg_annotate cachegrind.out.1234
If you add the "--auto=yes" flag to that command, the values will be displayed for each line:
cg_annotate --auto=yes cachegrind.out.1234
You can print the result into a file so that you can search for your function. Note that only lines and functions that have a major performance impact will be displayed, so if you can't find certain lines they likely have a negligible impact on the execution.
I am making an app with a TON of features. My problem is that applescript seems to have a cut-off point. After a certain number of lines, the script stopps working. It basically only works until the end. Once it gets to that point it stops. I have moved the code around to make sure that it is not an error within the code. Help?
I might be wrong, but I believe a long script is not a good way to put your code.
It's really hard to read, to debug or to maintain as one slight change in a part can have unexpected consequences at the other part of you file.
If your script is very long, I suggest you break your code in multiple parts.
First, you may use functions if some part of the code is reused several times.
Another benefit of the functions is that you can validate them separately from the rest of the execution code.
Besides, it makes your code easier to read.
on doWhatYouHaveTo(anArgument)
say "hello!"
end doWhatYouHaveTo
If the functions are used by different scripts, you may want to have your functions in a seperate library that you will call at need.
set cc to load script alias ((path to library folder as string) & "Scripts:Common:CommonLibrary.app")
cc's doWhatYouHaveTo(oneArgument)
At last, a thing that I sometimes do is calling a different script with some arguments, if a long code fits for slightly different purposes:
run script file {mainFileName} with parameters {oneWay}
This last trick has a great yet curious benefit : it can accelerate the execution time for a reason I never explained (and when I say accelerate, I say reduce execution time by 17 or so for the very same code).