After an update in Debian 10, the PDF viewer Atril (a fork of Evince) takes about 25 seconds to launch even with no document. Previously it was almost instantaneous. Now I need to find out what causes this delay. When I run Atril through the strace command, it pauses at different system calls each time so I cannot draw any conclusions from that. Next I built Atril from source and run it in the gdb debugger but there I can only see a couple of threads being created and exited. How can I find out where in the source code the delay is?
You could run the program in the debugger and then interrupt it a few times using Ctrl+C. Each time, use where or bt to see where you are in the execution. There is a good explanation of this approach here. You probably want to enable debugging symbols when compiling (GCC: -g). Also, the compiled code can differ significantly from your original source code if you have optimizations turned on, so it might make sense to only use a debugging optimization level (GCC: -Og).
If there is a single step that causes this huge delay, this should allow you to instantly identify it.
Otherwise, you could check out other answers of this question. Profiling tools on linux include valgrind and the perf tools.
For perf, keep in mind that the sampling based approach could mislead you if your program is actually not executing, but waiting, e.g. for IO.
Related
I have 3 ocaml modules, the last one of the tree does the actual computation and uses functions defined in the other 2.
The basic idea of the program is to have a port as initial state, which contains boats, and moving those boats until we reach either a winning situation or no more moves are possible.
The actual code is not really a problem here. It is probably not very efficient, but finds a solution... except when it does not.
While the program takes a few seconds for complex situations, in some cases, when I add a bit more complexity, it takes forever to find a solutions.
When I execute the program like this:
$ ocamlc -o test.exe port.ml moves.ml solver.ml
$ ./test.exe > file
The resulting file is really huge, but its size stops increasing after some time..
It seems to me that after a while the program stops running but without terminating, no Stackoverflow or out of memory error is thrown. The program simply does not continue the execution. In other words, the command
$ ./test.exe > file
is still executed, but no more new lines are added to the file. If I log to the shell itself and not a file, I get the same result: no new lines keep getting added after some time.
What could this be?
The main function (that is responsible for finding a solution) uses a Depth-First-Search algorithm, and contains a lot of List operations, such as List.fold, List.map, List.iter, List.partition, List.filter. I was thinking that maybe these functions have problems dealing with huge lists of complex types at some moment, but again, no error is thrown, the execution just stops.
I explained this very vaguely, but I really don't understand the problem here. I don't know whether the problem has to do with my shell (Ubuntu subsystem on windows) running out of memory, or ocaml List functions being limited at some point... If you have any suggestions, feel free to comment
To debug such cases you should use diagnostic utilities provided by your operating system and the OCaml infrastructure itself.
First of all, you shall look into the state of your process. You can use top or htop utilities if you're running a Unix machine. Otherwise, you can use the task manager.
If the process is running out of the physical memory, it could be swapped by the operating system. In that case, all memory operations will turn into hard drive reads and writes. Therefore, garbage collecting a heap stored in a hard drive will take some time. If this was the case, then you can use a memory profiler to identify the crux of the problem.
If the process is constantly running without a change in the memory footprint, then it looks like that you either hit a bug in your code, i.e., an infinite loop, or that some of your algorithms have exponential complexity, as Konstantin has mentioned in the comment. Use a debugging output or tracing to identify the location where the program stalled.
Finally, if your program is in the sleeping state, then it could be a deadlock. For example, if you're reading and writing to the same file, this can end up in a race condition. In general, if your program is multithreaded or operates multiple processes, there are lots of possibilities to induce a race condition.
First of all, here's just something I'm curious about
I've made a little program which fills some templates with values and I noticed that every time I run it the execution time changes a little bit, it ranges from 0.550s to 0.600s. My CPU is running at 2.9GHZ if that could be useful.
The instructions are always the same, is it maybe something that has to do with physics or something more software oriented?
it has to do with java running on a virtual machine; even a c program might run different times slightly longer/shorter, also the operating system steers when a program has resources (cpu time, memory …) to be executed.
Following some changes my Arduino sketch became unstable, it only run 1-2 hours and crashes. It's now a month that I am trying to understand but do not make sensible progress: the main difficulty is that the slightest change make it run apparently "ok" for days...
The program is ~1500 lines long
Can someone suggest how to progress?
Thanks in advance for your time
Well, the embedded systems are wery well known for continuous fight against Universe forth dimension: time. It is known that some delays must be added inside code - this does not imply the use of a sistem delay routine always - just operation order may solve a lot.
Debugging a system with such problem is difficult. Some techniques could be used:
a) invasive ones: mark (i.e. use some printf statements) in various places of your software, entry or exit of some routines or other important steps and run again - when the application crashes, you must note the last message seen and conclude the crash is after that software step marked by the printf.
b) less invasive: use an available GPIO pin as output and set it high at the entry of some routine and low at the exit; the crasing point will leave the pin either high or low. You can use several pins if available and watch the activity with an oscilloscope.
c) non invasive - use the JTAG or SWD debugging - this is the best one - if your micro support faults debugging, then you have the methods to locate the bug.
I'm a big fan of speeding up my builds using "make -j8" (replacing 8 with whatever my current computer's number of cores is, of course), and compiling N files in parallel is usually very effective at reducing compile times... unless some of the compilation processes are sufficiently memory-intensive that the computer runs out of RAM, in which case all the various compile processes start swapping each other out, and everything slows to a crawl -- thus defeating the purpose of doing a parallel compile in the first place.
Now, the obvious solution to this problem is "buy more RAM" -- but since I'm too cheap to do that, it occurs to me that it ought to be possible to have an implementation of 'make' (or equivalent) that watches the system's available RAM, and when RAM gets down to near zero and the system starts swapping, make would automatically step in and send a SIGSTOP to one or more of the compile processes it had spawned. That would allow the stopped processes to get fully swapped out, so that the other processes could finish their compile without further swapping; then, when the other processes exit and more RAM becomes available, the 'make' process would send a SIGCONT to the paused processes, allowing them to resume their own processing. That way most swapping would be avoided, and I could safely compile on all cores.
Is anyone aware of a program that implements this logic? Or conversely, is there some good reason why such a program wouldn't/couldn't work?
For GNU Make, there's the -l option:
-l [load], --load-average[=load]
Specifies that no new jobs (commands) should be started if there are others jobs running and the load average is at least load (a floating-
point number). With no argument, removes a previous load limit.
I don't think there's a standard option for this, though.
The profilers I have experience with (mainly the Digital Mars D profiler that comes w/ the compiler) seem to massively slow down the execution of the program being profiled. This has a major effect on my willingness to use a profiler, as it makes profiling a "real" run of a lot of my programs, as opposed to testing on a very small input, impractical. I don't know much about how profilers are implemented. Is a major (>2x) slowdown when profiling pretty much a fact of life, or are there profilers that avoid it? If it can be avoided, are there any fast profilers available for D, preferrably for D2 and preferrably for free?
I don't know about D profilers, but in general there are two different ways a profiler can collect profiling information.
The first is by instrumentation, by injecting logging calls all over the place. This slows down the application more or less. Typically more.
The second is sampling. Then the profiler breaks the application at regular intervals and inspects the call stack. This does not slow down the application very much at all.
The downside of a sampling profiler is that the result is not as detailed as with an instrumenting profiler.
Check the documentation for your profiler if you can run with sampling instead of instrumentation. Otherwise you have some new Google terms in "sampling" and "instrumenting".
My favorite method of profiling slows the program way way down, and that's OK. I run the program under the debugger, with a realistic load, and then I manually interrupt it. Then I copy the call stack somewhere, like to Notepad. So it takes on the order of a minute to collect one sample. Then I can either resume execution, or it's even OK to start it over from the beginning to get another sample.
I do this 10 or 20 times, long enough to see what the program is actually doing from a wall-clock perspective. When I see something that shows up a lot, then I take more samples until it shows up again. Then I stop and really study what it is in the process of doing and why, which may take 10 minutes or more. That's how I find out if that activity is something I can replace with more efficient code, i.e. it wasn't totally necessary.
You see, I'm not interested in measuring how fast or slow it's going. I can do that separately with maybe only a watch. I'm interested in finding out which activities take a large percentage of time (not amount, percentage), and if something takes a large percentage of time, that is the probability that each stackshot will see it.
By "activity" I don't necessarily mean where the PC hangs out. In realistic software the PC is almost always off in a system or library routine somewhere. Typically more important is call sites in our code. If I see, for example, a string of 3 calls showing up on half of the stack samples, that represents very good hunting, because if any one of those isn't truly necessary and can be done away with, execution time will drop by half.
If you want a grinning manager, just do that once or twice.
Even in what you would think would be math-heavy scientific number crunching apps where you would think low-level optimization and hotspots would rule the day, you know what I often find? The math library routines are checking arguments, not crunching. Often the code is not doing what you think it's doing, and you don't have to run it at top speed to find that out.
I'd say yes, both sampling and instrumenting forms of profiling will tax your program heavily - regardless of whose profiler you are using, and on what language.
You could try h3r3tic's xfProf, which is a sampling profiler. Haven't tried it myself, but that guy always makes cool stuff :)
From the description:
If the program is sampled only a few hundred (or thousand)
times per second, the performance overhead will not be noticeable.