Why do different homographies affect running time? - performance

I am applying OpenCV's warpPerspective() function to an image and I'm timing this task (only the call to the function, nothing else). I noticed that if I use different homographies the running time changes.
For example I tried using the identity matrix and found that it is faster than another homography that I generated with OpenCV's findHomography(), specifically this one:
[ -4.2374501377308356e+00, -4.1373817174321941e+00, 1.6044389922446646e+03,
-1.6805996938549963e+00, -9.0838245171456080e+00, 1.9901208871396577e+03,
-2.4454046226610403e-03, -8.2658343249518724e-03, 1. ]
Please note that the output is not my concern, I am only talking about running time. So why is it different?
Thanks
EDIT: I'm using OpenCV 3.4 on a PowerVR GX6650. I tested it with and without OpenCL and the pattern is still the same.

As mentioned by #Micka in the comments, the difference seems to be given by the different number of times that the interpolation method is called.

Related

Using `callgrind` to count function calls in Linux

I am trying to track function call counts on a program I'm interested in. If I run the program on its own, it will run fine. If I try to run it with valgrind using the command seen below I seem to be getting a different result.
Command run:
Produces this input immediately, even though the execution is normally slow.
I'd say that this is more likely to be related to this issue. However to be certain you will need to tell us
what compilation options are being used - specifically are you using anything related to AVX or x87?
What hardware this is running on.
It would help if you can cut this down to a small example and either update this or the frexp bugzilla items.
valgrind has limited floating point support. You're probably using non-standard or very large floats.
UPDATE: since you're using long double, you're outta luck. Unfortunately,
Your least-worst option
is to find a way to make your world work just using standard IEEE754
64-bit double precision.
This probably isn't easy considering you're using an existing project.

Profiling Rust with execution time for each *line* of code?

I have profiled my Rust code and see one processor-intensive function that takes a large portion of the time. Since I cannot break the function into smaller parts, I hope I can see which line in the function takes what portion of time. Currently I have tried CLion's Rust profiler, but it does not have that feature.
It would be best if the tool runs on MacOS since I do not have a Windows/Linux machine (except for virtualization).
P.S. Visual studio seems to have this feature; but I am using Rust. https://learn.microsoft.com/en-us/visualstudio/profiling/how-to-collect-line-level-sampling-data?view=vs-2017 It has:
Line-level sampling is the ability of the profiler to determine where in the code of a processor-intensive function, such as a function that has high exclusive samples, the processor has to spend most of its time.
Thanks for any suggestions!
EDIT: With C++, I do see source code line level information. For example, the following toy shows that, the "for" loop takes most of the time within the big function. But I am using Rust...
To get source code annotation in perf annotate or perf report you need to compile with debug=2 in your cargo toml.
If you also want source annotations for standard library functions you additionally need to pass -Zbuild-std to cargo (requires nightly).
Once compiled, "lines" of Rust do not exist. The optimiser does its job by completely reorganising the code you wrote and finding the minimal machine code that behaves the same as what you intended.
Functions are often inlined, so even measuring the time spent in a function can give incorrect results - or else change the performance characteristics of your program if you prevent it from being inlined to do so.

Deterministic OpenSimplexNoise across Godot versions

For a given version of Godot, you can deterministically generate OpenSimplexNoise using seed (documentation).
However, in the documentation for RandomNumberGenerator there is the following:
Note: The underlying algorithm is an implementation detail. As a
result, it should not be depended upon for reproducible random streams
across Godot versions.
Some workarounds for the above issue are described here, the short answer is to write a custom portable RNG.
Is there any way to insert a custom RNG for OpenSimplexNoise to manage determinism? Is there another solution?
The Godot developers are warning you that they might decide to change RandomNumberGenerator for a future version (also and some changes happened already).
And no, you can't insert a custom random number generator for OpenSimplexNoise.
Anyway, OpenSimplexNoise does not use RandomNumberGenerator or rand. Instead it takes this library as a git module: smcameron/open-simplex-noise-in-c.
Does OpenSimplexNoise change? Rarely. However, there is a breaking change in OpenSimplexNoise for Godot 4 and 3.4: the axis has been swaped (this is a fix).
So that leads us to add a custom noise solution. Which could be a port of OpenSimplex noise to C# or GDScript.
See open-simplex-noise.c from Godot source and digitalshadow/OpenSimplexNoise.cs (which is a port to C#, there are some other ports linked in the comments there too).
And for the texture, there are a couple options:
You can create a script that draws the image (I suggest using lockbits) and set it.
Or you can extend Viewport (which entains adding a Viewport to the scene and attaching a script). Then you take advantage of ViewportTexture, and you can take advantage of shaders to create the texture you want. Take BastiaanOlij/godot-noise-texture-asset for reference.

Zstandard levels in hadoop

Compression level in org.apache.hadoop.io.compress.zstd.ZStandardCompressor does't seem to work. I see the reset function getting called in ZStandardCompressor constructor which is turn call init(level, stream) to call native function which I believe to be only place setting zstd parameter.
In my test, I am ensuring that this is being called but calling it different levels like 1, 5, 10. 20 etc did not make any difference as output size is exact same.
Hadoop doesn't seem to use zstd-jni and use own stuff to use zstd. I am sure people are using different levels in hadoop. Could you someone point I should go around chasing for next step
Given that people are finding this question without answer, I am adding solution which I used. InternalParquetRecordWriter has compressor as argument, so I integrated zstd-jni library here by creating a compressor by extending BytesInputCompressor.

Octave taking too long to save image files

I'm running Octave 3.8.1 on a i3 2GB DDR3 RAM powered notebook within Ubuntu 14.04 on dualboot with Windows 7.
And I'm having a really hardtime saving plots that I use on my seismologic research, they are quite simple and still I wait almost 5 min to save a single plot, the plot is built within seconds, the saving though...
Is it purely a problem with my notebook performance?
When I run a program for the first time I get the following warnings on shadowed functions, has one of them anything to do with it?
warning: function /home/michel/octave/specfun-1.1.0/x86_64-pc-linux-gnu-api-v49+/ellipj.oct shadows a built-in function
warning: function /home/michel/octave/specfun-1.1.0/erfcinv.m shadows a built-in function
warning: function /home/michel/octave/specfun-1.1.0/ellipke.m shadows a core library function
warning: function /home/michel/octave/specfun-1.1.0/expint.m shadows a core library function
Also, this started to happen when I upgraded from a very old version of Octave (2.8 if I'm not mistaken), it seems that the old one used to run on the linux default plot making functions, and the new one (3.8.1) runs on its own, is it correct? I used to take a little more time with this notebook that I take with the lab PC, but not even close to 5min+ for each plot.
Is there anything I can do, like upgrading anything within the octave or "unshadowing" the functions mentioned before?
Thanks a lot for the help.
Shadowing functions is just a name clash which is explained for example here: Warnings after octave installation
As for low performance, octave renderer doesn't seem to be well optimized for writing plots with huge number of points. For example, the following:
tic; x=1:10000; plot(sin(x)); saveas(gcf,'2123.png'); toc;
Will put octave in coma for quite a while. Even though the plot itself is made in an instant. If amounts of your data are of comparable magnitude, consider making it more sparce prior putting it on the graph.
There's no default linux plotmaker, there's gnuplot. You may try your luck with it by invoking
graphics_toolkit gnuplot
before plotting. (To me it didn't do much good though. graphics_toolkit fltk will return octave's usual plotter.)
If the slowness you refer to is in saving three dimensional plots (like mesh), the only workaround I've found on system similar to your is to use alt+prtscr.
Alternatively, you could try obtaining octave 4.0 which is released by now. It's changelogs mention yet another graphics toolkit.

Resources