Currently I am working on performance improvements in a Delphi calculation module (bpl). In past days, I found several slow code lines. We improved execution time from 8 to 3 minutes.
I found slow code lines by adding stopwatches all over the units, but making these changes is time consuming.
procedure TCalculator.Execute;
begin
TStopwatchWrapper.Start(1);
CollectValues;
TStopwatchWrapper.Pause(1);
TStopwatchWrapper.Start(2);
CalculateOrderLines;
TStopwatchWrapper.Pause(2);
...
TStopwatchWrapper.ShowTimes;
end;
procedure TCalculator.CollectValues;
begin
for {..} do
begin
{more timing}
end;
end;
The calculation units are hundreds of lines.
I would love to be able to decorate an unit, to find these slow code lines. I was thinking of Spring4D interceptors, but this only works for intercepting the external calls.
Is it possible to measure cumulative execution time per line or procedure? Without adding stopwatches or profiling calls all over the code.
Thank you in advance.
Personally, I recommend using SamplingProfiler (the website is quite outdated - it works fine with executables from the latest Delphi version, it might not just pick up the search paths automatically). It usually works flawlessly with a single executable (needs either td32 or map file) but not so well with additional runtime packages or DLLs.
For that, I recommend using Intel VTune (completely free to use). For that to work you need to produce pdb files for any binary you want proper information (otherwise it will just report addresses). That can be done with map2pdb.
If you have an AMD CPU you might want to try AMD μProf - I have no personal experience with it but from what I have heared it's not as good as VTune.
Related
I have profiled my Rust code and see one processor-intensive function that takes a large portion of the time. Since I cannot break the function into smaller parts, I hope I can see which line in the function takes what portion of time. Currently I have tried CLion's Rust profiler, but it does not have that feature.
It would be best if the tool runs on MacOS since I do not have a Windows/Linux machine (except for virtualization).
P.S. Visual studio seems to have this feature; but I am using Rust. https://learn.microsoft.com/en-us/visualstudio/profiling/how-to-collect-line-level-sampling-data?view=vs-2017 It has:
Line-level sampling is the ability of the profiler to determine where in the code of a processor-intensive function, such as a function that has high exclusive samples, the processor has to spend most of its time.
Thanks for any suggestions!
EDIT: With C++, I do see source code line level information. For example, the following toy shows that, the "for" loop takes most of the time within the big function. But I am using Rust...
To get source code annotation in perf annotate or perf report you need to compile with debug=2 in your cargo toml.
If you also want source annotations for standard library functions you additionally need to pass -Zbuild-std to cargo (requires nightly).
Once compiled, "lines" of Rust do not exist. The optimiser does its job by completely reorganising the code you wrote and finding the minimal machine code that behaves the same as what you intended.
Functions are often inlined, so even measuring the time spent in a function can give incorrect results - or else change the performance characteristics of your program if you prevent it from being inlined to do so.
A while ago I wrote fractal program in Lua. It works fine on my old Windows 7, and Windows 8 PCs.
I got a new laptop with Windows 10, and I run it with luajit 2.1.0 , that was working best for me in the past. Unfortunately some of the runs with various parameters failed with no errors or cause. And then I noticed that even successful execution is using too much memory. I tried my best to track down the problem using various builds of luajit from 2.0 to 2.1 , but only thing I can see is that memory leak is somehow connected to FFI calls to OpenCl api.
If someone is willing to take a look, I will provide complete code of program and required api dlls.
Attaching old demo lua code , that is also leaking on win10 , based on and using : https://github.com/malkia/luajit-opencl
cl-demo.lua : https://github.com/LuaJIT/LuaJIT/files/4366334/cl-demo.txt
I'm sure that my post was not worded correctly, English is not my first language, and maybe not in the right place. I didn't mean someone waste time debugging my cobbled together old code, but to maybe point me in the right direction, or suggest some easily available memory errors inspection tool, like purify. I used to use it to find memory bugs, unfortunately its no longer available for single user.
Also I now have a strong suspicion that something is overwriting memory after calls to opencl. as a result calls to os.time, math.random & etc. result in code unexplained execution stop.
The following code after calls to OpenCL starts to behave weird, ii value in the last loop in the if statement all of sudden gets value like 0.13900852449513657 !
clfns[1]=true;
for ii=2,34,1 do clfns[ii]=false; end
for ii=2,34,1 do if string.find(formula, c2fns[ii] .. "[(]") then
for jj=1,4,1 do if cdfns[ii][jj] then clfns[cdfns[ii][jj]]=true; end end
end end
for ii=1,34,1 do
if clfns[ii] then
cFuns = cFuns .. cfns[ii]; end end
I have created a workaround for various problems my program had running on windows 10 and 8.1 with latest updates from Microsoft.
First to prevent luajit abrupt unexplained terminations running various versions of my lua code, I have rebuild luajit 2.1 under Microsoft Developer studio 2008 with alternative optimization options (/Ox /Ot). I was able to use similar compile options under 2015 & 2020 versions of MsDev , but had to add /guard:cf /D_CRTDBG_MAP_ALLOC , resulting in slower execution, up to 32%, and still some very weird sporadic aberrations.
To combat memory leak (over 100mb per fractal image generation) , I had to add collectgarbage() after every cl kernel program completion. And add release , free, and recreate memory buffer for every que of results from OpenCL code execution, which solves most of memory problems, but slows execution depending on size and complexity of running formulas from 19% to 41%.
added code :
clEnqueueReadBuffer(commands, output[jb], cl.CL_TRUE, 0, ressize, results, 0, nil, nil)
...
clReleaseMemObject(output[jbo]);
output[jbo] = nil;
output[jbo] = ffi.gc(clCreateBuffer(context, cl.CL_MEM_WRITE_HOST_PTR, ressize), ffi.C.free);
There still some memory issues when I run my program with OpenGL 3D output utilizing iup libraries, mostly due to garbage collection not catching up, but I can probably solve it by implementing use of 4 parallel threads using same content program space but separate kernels.
Still any suggestions, explanation or corrections would be greatly appreciated
I've been asked to tune the performance of a specific function which loads every time a worksheet is opened (so it's important that it doesn't make things slow). One of the things that seems to make this function slow is that it does a long call to the database (which is remote), but there are a bunch of other possibilities too. So far, I've been stepping through the code, and when something seems to take a long time making a note of it as a candidate for tuning.
I'd like a more objective way to tell which calls are slowing me down. Searching for timing and VBA yields a lot of results which basically amount to "Write a counter, and start and stop it either side of the critical section" (often with the macro explicitly called). I was wondering whether there was a way to (in the debugger) do something like "Step to next line, and tell me the time elapsed".
If not, can someone suggest a reasonable macro that I could use in the Immediate window to get what I'm after? Specifically, I would like to be able to time an arbitrary line of code within a larger procedure (rather than a whole procedure at once, which is what I found through Google).
Keywords for your further search would be to look for a "Profiler" for VBA. I've heard of VB Watch and VBA Code Profiler System (VBACP) as well as from Stephen Bull's PerfMon, but sparing the latter they're mostly not free.
So far for the official part of my answer, and I toss in some extra in terms of maybe useless suggestions:
Identifying "slow" code by "humanly measurement" (run a line and say: "Woah, that takes forever") in the debugger is certainly helpful, and you can then start looking into why they're slow. Your remote database call may take quite long if it has to transmit a lot of data - in which cases it may be a good idea to timestamp the data on both ends and ask the DB whether data had been modified before you grab it.
Writing the data into the sheet may be slow depending on the way you write it - which can sometimes be improved by writing arrays to a range instead of some form of iteration.
And I probably don't need to tell you about ScreenUpdating and EnableEvents and so on?
I am not a DBA. However, I work on a web application that lives entirely in an Oracle database (Yes, it uses PL/SQL procedures to write HTML to clobs and then vomits the clob at your browser. No, it wasn't my idea. Yes, I'll wait while you go cry.).
We're having some performance issues, and I've been assigned to find some bottlenecks and remove them. How do I go about measuring Oracle performance and finding these bottlenecks? Our unhelpful sysadmin says that Grid Control wasn't helpful, and that he had to rely on "his experience" and queries against the data dictionary and "v$" views.
I'd like to run some tests against my local Oracle instance and see if I can replicate the problems he found so I can make sure my changes are actually improving things. Could someone please point me in the direction of learning how to do this?
Not too surprising there are entire books written on this topic.
Really what you need to do is divide and conquer.
First thing is to just ask yourself some standard common sense questions. Has performance slowly degraded or was there a big drop in performance recently is an example.
After the obvious a good starting point for you would be to narrow down where to spend your time - top queries is a decent start for you. This will give you particular queries which run for a long time.
If you know specifically what screens in you front-end are slow and you know what stored procedures go with that, I'd put some logging. Simple DBMS_OUTPUT.put_lines with some wall clock information at key points. Then I'd run those interactively in SQLNavigator to see what part of the stored procedure is going slow.
Once you start narrowing it down you can look to evaluate why a particular query is going slow. EXPLAIN_PLAN will be your best friend to start with.
It can be overwhelming to analyze database performance with Grid Control, and I would suggest starting with the simplier AWR report - you can find the scripts to generate them in $ORACLE_HOME/rdbms/admin on the db host. This report will rank the SQL seen in the database by various categories (e.g. CPU time, disk i/o, elapsed time) and give you an idea where the bottlenecks are on the database side.
One advantage of the AWR report is that it is a SQL*Plus script and can be run from any client - it will spool HTML or text files to your client.
edit:
There's a package called DBMS_PROFILER that lets you do what you want, I think. I found out my IDE will profile PL/SQL code as I would guess many other IDE's do. They probably use this package.
http://www.dba-oracle.com/t_dbms_profiler.htm
http://www.databasejournal.com/features/oracle/article.php/2197231/Oracles-DBMSPROFILER-PLSQL-Performance-Tuning.htm
edit 2:
I just tried the Profiler out in PL/SQL Developer. It creates a report on the total time and occurrences of snippets of code during runtime and gives code location as unit name and line number.
original:
I'm in the same boat as you, as far as the crazy PL/SQL generated pages go.
I work in a small office with no programmer particularly versed in advanced features of Oracle. We don't have any established methods of measuring and improving performance. But the best bet I'd guess is to try out different PL/SQL IDE's.
I use PL/SQL Developer by Allaround Automations. It's got a testing functionality that lets you debug your PL/SQL code and that may have some benchmarking feature I haven't used yet.
Hope you find a better answer. I'd like to know too. :)
"I work on a web application that
lives entirely in an Oracle database
(Yes, it uses PL/SQL procedures to
write HTML to clobs and then vomits
the clob at your browser"
Is it the Apex product ? That's the web application environment now included as standard part of the Oracle database (although technically it doesn't spit out CLOBs).
If so there is a whole bunch of instrumentation already built in to the product/environment (eg it keeps a rolling two-week history of activity).
I've been testing out the performance and memory profiler AQTime to see if it's worthwhile spending those big $$$ for it for my Delphi application.
What amazes me is how it can give you source line level performance tracing (which includes the number of times each line was executed and the amount of time that line took) without modifying the application's source code and without adding an inordinate amount of time to the debug run.
The way that they do this so efficiently makes me think there might be some techniques/technologies used here that I don't know about that would be useful to know about.
Do you know what kind of methods they use to capture the execution line-by-line without code changes?
Are there other profiling tools that also do non-invasive line-by-line checking and if so, do they use the same techniques?
I've made an open source profiler for Delphi which does the same:
http://code.google.com/p/asmprofiler/
It's not perfect, but it's free :-). Is also uses the Detour technique.
It stores every call (you must manual set which functions you want to profile),
so it can make an exact call history tree, including a time chart (!).
This is just speculation, but perhaps AQtime is based on a technology that is similar to Microsoft Detours?
Detours is a library for instrumenting
arbitrary Win32 functions on x86, x64,
and IA64 machines. Detours intercepts
Win32 functions by re-writing the
in-memory code for target functions.
I don't know about Delphi in particular, but a C application debugger can do line-by-line profiling relatively easily - it can load the code and associate every code path with a block of code. Then it can break on all the conditional jump instructions and just watch and see what code path is taken. Debuggers like gdb can operate relatively efficiently because they work through the kernel and don't modify the code, they just get informed when each line is executed. If something causes the block to be exited early (longjmp), the debugger can hook that and figure out how far it got into the blocks when it happened and increment only those lines.
Of course, it would still be tough to code, but when I say easily I mean that you could do it without wasting time breaking on each and every instruction to update a counter.
The long-since-defunct TurboPower also had a great profiling/analysis tool for Delphi called Sleuth QA Suite. I found it a lot simpler than AQTime, but also far easier to get meaningful result. Might be worth trying to track down - eBay, maybe?