What is the motivation behind PixelOffsetModeHighSpeed and PixelOffsetModeHighQuality? - windows

I do quite a bit of manual GDI+ drawing in C# and was always annoyed by the apparent (0.5, 0.5) pixel offset that GDI+ uses by default (my mind is more compatible with the IMO simpler definition of (0, 0) being the upper left corner of the upper left pixel). Until recently I thought it was probably just a stupid .NET thing to make things """easier""" - I just translated by (-0.5, -0.5) before doing anything else. Until I stumbled upon the PixelOffsetMode enum.
.NET definition | C API definition
typedef enum {
PixelOffsetModeInvalid = QualityModeInvalid,
PixelOffsetModeDefault = QualityModeDefault,
PixelOffsetModeHighSpeed = QualityModeLow,
PixelOffsetModeHighQuality = QualityModeHigh,
PixelOffsetModeNone = QualityModeHigh + 1,
PixelOffsetModeHalf = QualityModeHigh + 2
} PixelOffsetMode;
It seems that the "off by (0.5, 0.5)" is a deliberate GDI+ thing.
There are also these 2 answers on SO:
Looking for details on the PixelOffsetMode Enumeration in .Net, WinForms
What is PixelOffsetMode?
The answer to the latter question seems to be subtly incorrect as well. There is no difference between HighSpeed and Half (which is the mode that puts the origin in the upper left corner of the upper left pixel, and HighSpeed and None (which puts the origin in the center of the upper left pixel). The documentation of the C API enum definition even confirms this.
What bugs me most is, even though 2 of the options contain the words "Speed" and "Quality", which value you choose has nothing at all to do with speed or quality, it's just a different definition of the coordinate system used for drawing. Both can produce the exact same result with the exact same speed. In practice, this is very obscure and knowing the precise location of the origin is crucial for writing correct drawing code - vague terms like "Quality" or "Speed" aren't helpful here. Using the incorrect enum value doesn't make the drawing slow or low-quality, it simply makes it wrong.
Yet someone must have come up with those enum values when GDI+ was developed and may have thought of a reason for HighQuality and HighSpeed to exist. I'd like to know that reason - maybe there is a subtle difference, or there used to be a difference but it's not relevant anymore.

I don’t know the motivation but I can make a guess.
GDI is very old API, and that thing appeared in Windows 2000. The recommended hardware requirements for that OS say Pentium II 300MHz, 128 MB RAM, minimum is Pentium 133 MHz, 32MB RAM. By today’s standards, that’s extremely slow hardware. Very likely, that’s why you aren’t observing any differences in rendering speed on a modern Windows PC.

Related

InterlockedAdd HLSL potential optimization

I was wondering if anyone might know whether there might be some kind of optimization going on with HLSL InterlockedAdd, specifically when it is used on a single global atomic counter (added value is constant across all threads) by a large number of threads.
Some information I dug up on the web says that atomic adds can create significant contention issues:
https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/
Granted, the article above is written for CUDA (also a little old dating to 2014), whereas I am interested in HLSL InterlockedAdd. To that end, I wrote a dummy HLSL shader for Unity (compiled to d3d11 via FXC, to my knowledge), where I call InterlockedAdd on a single global atomic counter, such that the added value is always the same across all the shaded fragments. The snippet in question (run in http://shader-playground.timjones.io/, compiled via FXC, optimization lvl 3, shading model 5.0):
**HLSL**:
RWStructuredBuffer<int> counter : register(u1);
void PSMain()
{
InterlockedAdd(counter[0], 1);
}
----
**Assembly**:
ps_5_0
dcl_globalFlags refactoringAllowed
dcl_uav_structured u1, 4
atomic_iadd u1, l(0, 0, 0, 0), l(1)
ret
I then slightly modified the code, and instead of always adding some constant value, I now add a value that varies across fragments, so something like this:
**HLSL**:
RWStructuredBuffer<int> counter : register(u1);
void PSMain(float4 pixel_pos : SV_Position)
{
InterlockedAdd(counter[0], int(pixel_pos.x));
}
----
**Assmebly**:
ps_5_0
dcl_globalFlags refactoringAllowed
dcl_uav_structured u1, 4
dcl_input_ps_siv linear noperspective v0.x, position
dcl_temps 1
ftoi r0.x, v0.x
atomic_iadd u1, l(0, 0, 0, 0), r0.x
ret
I implemented the equivalents of the aforementioned snippets in Unity, and used them as my fragment shaders for rendering a full-screen quad (granted, there is no output semantics, but that is irrelevant). I profiled the resulting shaders with Nsight Grphics. Suffice to say that the difference between two draw calls was massive, with the fragment shader based on the second snippet (InterlockedAdd with variable value) being considerably slower.
I also made captures with RenderDoc to check the assembly, and they look identical to what is shown above. Nothing in the assembly code suggests such dramatic difference. And yet, the difference is there.
So my question is: is there some kind of optimization taking place when using HLSL InterlockedAdd on a single global atomic counter, such that the added value is a constant? Is it, perhaps, possible that the GPU driver can somehow rearrange the code?
System specs:
NVIDIA Quadro P4000
Windows 10
Unity 2019.4
The pixel shader on the GPU runs pixels in simd groups, called wavefronts. If the code currently executing would not change based on which pixel is being rendered the code only has to be run once for the entire group. If it changes based on the pixel then each of the pixels will need to run unique code.
In the first version, a 64 pixel wavefront would execute the code as a single simd InterlockedAdd<64>(counter[0], 1); or might even optimize it into InterlockedAdd(counter[0], 64);
In the second example it turns into a series of serial, non-simd Adds and becomes 64 times as expensive.
This is an oversimplification, and there are other tricks the GPU uses to share computing resources. But a good general rule of thumb is to make as much code as possible sharable by every nearby pixel.

{Two's complement} Bit shifting

I got confused by all this shifting thing since I saw two different results of shifting the same number. I know there are tons of questions about this thing but seems like I still couldn't find what I was looking for (Feel free to post link of a question or a website that could help).
So, first I have seen the number 13 binary like: 001101 (not whole word of bits).
When applied shifting to the left by 2 they hold the last bit (bit for sign probably) and results like 0|10100 = 20. However on other place I have seen the number 13 represented like: 01101, and now the 01101<<2 was 0|0100 = 4. I know shifting left is same as multiplying by the base, however this made me confused. Should i present 13 as 001101 or 01101 and apply shifting.
I think we omit the overflow considering the results.
Thank you !!
This behaviour seems to be corresponding with integers of length 5 and
4 (in bits, not counting the sign bit). So it seems overflow is indeed the problem. If it isn't, could you add some context as to where these strange results occur?
001101, 01101 and also 1101 and 00001101 and other sizes have equal claim to "being" 13. You can't really say that 13 has one definitive size, rather it is the operation that has a size (which may be infinite, then a left shift never wraps).
So you have to decide what size of shift you're doing, independently of the value you're shifting. Common choices are 32 or 64 bits, but you're certainly not limited to that, although "strange" sizes take more effort to implement on typical machines and in typical programming languages.
The sign is never deliberately kept in left shifts by the way, there is no useful way to do so: forcefully keeping it means the wrapping happens in a really odd way, instead of the usual wrapping modulo a power of two (which has nice properties).

changing least significant bit of a pixel using steganography

I am implementing an image encryption algorithm and in one phase I would like to change the least significant bit of the pixel. As per steganography, there is a stego-key which can be used to overwrite the LSB of pixels. But, how is the stego-key determined at the receiver end. Also, would like to know if changing the least significant bit from 1 to 0 or 0 to 1 is also considered as steganography?
But, how is the stego-key determined at the receiver end.
Key management or even encryption is not specifically part of steganography. You may perform key agreement by hiding that as well, but again, steganography is only about the hiding of the information. Encryption may be used to let the message to appear random as well as adding an additional layer of security though. Data that appears to be random may be easier to hide.
See the following definition from Wikipedia:
the practice of concealing messages or information within other non-secret text or data.
Also, would like to know if changing the least significant bit from 1 to 0 or 0 to 1 is also considered as steganography?
That is likely the case yes. But note that if you have a completely blue background that your message would still be visible - if encrypted as random changes. But in general, if the chances of the least significant bit being set is more or less random, then it would make a prime candidate for steganography.
You might however question how many times raw RGB (or whatever other lossless format) is exchanged, where the pixels are more or less random. That in itself could be considered a hint that something strange is going on. As long as you try to hide the message it would probably still be called steganography though.

OLA FFT Windows : Blackman-Nuttall or Dolph–Chebyshev?

I found a web page describing all the existing windows for FFT. it's here:
http://en.wikipedia.org/wiki/Window_function
it's very interesting as it shows the frequency response depending on the window used.
So when i look at the freq responses, i found that Blackman-nuttall and Dolph–Chebyshev windows seems the best
but what is the best of the best ?
and are they really better for audio processing than Hamming or
Hanning?
many thanks
Jeff
Blow your mind here:
http://www.rssd.esa.int/SP/LISAPATHFINDER/docs/Data_Analysis/GH_FFT.pdf
I can tell you a couple of things on the matter.
There is no "best" window function because it depends on what your application is about. The common parameters on which you should focus your choice are:
Scalloping loss
Main lobe width (of a sine wave)
Sidelobes max level/decrease
Computational cost
For example, the simple rectangular window does not require any computational cost and it provides the thinnest possible lobe, but at the expense of a big scalloping and very noisy sidelobes.
Blackman-style windows are usually built to minimize sidelobe levels, but they tend to have a heavy scalloping. You might instead choose one of the so-called "flat-top" windows if you need more precise peak measurements since scalloping is usually less than 1% even with the simplest ones, but their lobes are very fat (6-10 bins in width perhaps).
Example Nuttall window in [0, 1]:
1 - 1.369982685*cos(z) + 0.4054102674*cos(2*z) - 0.03542758202*cos(3*z)
Example flat-top window (SFT3M) in [0, 1]:
1 - 1.84540464*cos(z) + 0.6962635*cos(2*z)
If there is a window function that has no scalloping loss, that is very narrow and with no sidelobes, then it would be extremely expensive to calculate.

Correct use of Simplify in Mathematica (with multiphase trig)

I just started working with Mathematica (5.0) for the first time, and while the manual has been helpful, I'm not entirely sure my technique has been correct using (Full)Simplify. I am using the program to check my work on a derived transform to change between reference frames, which consisted of multiplying a trio of relatively large square matrices.
A colleague and I each did the work by hand, separately, to make sure there were no mistakes. We hoped to get a third check from the program, which seemed that it would be simple enough to ask. The hand calculations took some time due to matrix size, but we came to the same conclusions. The fact that we had the same answer made me skeptical when the program produced different results.
I've checked and double checked my inputs.
I am definitely . (dot-multiplying) the matrices for correct multiplication.
FullSimplify made no difference.
Neither have combinations with TrigReduce / expanding algebraically before simplifying.
I've taken indices from the final matrix and tryed to simplify them while isolated, to no avail, so the problem isn't due to the use of matrices.
I've also tried to multiply the first two matrices, simplify, and then multiply that with the third matrix; however, this produced the same results as before.
I thought Simplify automatically crossed into all levels of Heads, so I didn't need to worry about mapping, but even where zeros would be expected as outputs in the matrix, there are terms, and where we would expect terms, there are close answers, plus a host of sin and cosine terms that do not reduce.
Does anyone frequent any type of technique with Simplify to get more preferable results, in contrast to solely using Simplify?
If there are assumptions on parameter ranges you will want to feed them to Simplify. The following simple examples will indicate why this might be useful.
In[218]:= Simplify[a*Sqrt[1 - x^2] - Sqrt[a^2 - a^2*x^2]]
Out[218]= a Sqrt[1 - x^2] - Sqrt[-a^2 (-1 + x^2)]
In[219]:= Simplify[a*Sqrt[1 - x^2] - Sqrt[a^2 - a^2*x^2],
Assumptions -> a > 0]
Out[219]= 0
Assuming this and other responses miss the mark, if you could provide an example that in some way shows the possibly bad behavior, that would be very helpful. Disguise it howsoever necessary in order to hide proprietary features: bleach out watermarks, file down registration numbers, maybe dress it in a moustache.
Daniel Lichtblau
Wolfram Research
As you didn't give much details to chew on I can only give you a few tips:
Mma5 is pretty old. The current version is 8. If you have access to someone with 8 you might ask him to try it to see whether that makes a difference. You could also try WolframAlpha online (http://www.wolframalpha.com/), which also understands some (all?) Mma syntax.
Have you tried comparing your own and Mma's result numerically? Generate a Table of differences for various parameter values or use Plot. If the differences are negligable (use Chop to cut off small residuals) the results are probably equivalent.
Cheers -- Sjoerd

Resources