OpenCL - GPU Vector Math (Instruction Level Parallelism) - parallel-processing

This article talks about the optimization of code and discusses Instruction level parallelism. They give an example of GPU vector math where the float4 vector math can be performed on the vector rather than the individual scalars. Example given:
float4 x_neighbor = center.xyxy + float4(-1.0f, 0.0f, 1.0f, 0.0f);
Now my question is can it be used for comparison purposes as well? So in the reduction example, can I do this:
accumulator.xyz = (accumulator.xyz < element.xyz) ? accumulator.xyz : element.xyz;
Thank you.

As already stated by Austin comparison operators apply on vectors as well.
The point d. in the section 6.3 of the standard is the relevant part for you. It says:
The relational operators greater than (>), less than (<), greater than
or equal (>=), and less than or equal (<=) operate on scalar and
vector types.
it explains as well the valid cases:
The two operands are scalars. (...)
One operand is a scalar, and the other is a vector. (...) The scalar type is then widened to a vector that has the same number of
components as the vector operand. The operation is done component-wise
resulting in the same size vector.
The two operands are vectors of the same type. In this case, the operation is done component-wise resulting in the same size vector.
And finally, what these comparison operators return:
The result is a scalar signed integer of type int if the source
operands are scalar and a vector signed integer type of the same size
as the source operands if the source operands are vector types.
For scalar types, the relational operators shall return 0 if the
specified relation is false and 1 if the specified relation is true.
For vector types, the relational operators shall return 0 if the
specified relation is false and –1 (i.e. all bits set) if the
specified relation is true. The relational operators always return 0
if either argument is not a number (NaN).
EDIT:
To complete a bit the return value part, especially after #redrum's comment; It seems odd at first that the true value is -1 for the vector types. However, since OCL behaves as much as possible like C, it doesn't make a big change since everything that is different than 0 is true.
As an example is you have the vector:
int2 vect = (int2)(0, -1);
This statement will evaluate to true and do something:
if(vect.y){
//Do something
}
Now, note that this isn't valid (not related to the value returned, but only to the fact it is a vector):
if(vect){
//do something
}
This won't compile, however, you can use the function all and any to evaluate all elements of a vector in an "if statement":
if(any(vect){
//this will evaluate to true in our example
}
Note that the returned value is (from the quick reference card):
int any (Ti x): 1 if MSB in component of x is set; else 0
So any negative number will do.
But still, why not keep 1 as the returned value when evaluated to true?
I think that the important part is the fact that all bits are set. My guess, would be that like that you can make easily bitwise operation on vectors, like say you want to eliminate the elements smaller than a given value. Thanks to the fact that the value "true" is -1, i.e. 111111...111, you can do something like that:
int4 vect = (int4)(75, 3, 42, 105);
int ref = 50;
int4 result = (vect < ref) & vect;
and result's elements will be: 0, 3, 42, 0
in the other hand if the returned value was 1 for true, the result would be: 0, 1, 0, 0

The OpenCL 1.2 Reference Card from Khronos says that logical operators:
Operators [6.3]
These operators behave similarly as in C99 except that
operands may include vector types when possible:
+ - * % / -- ++ == != &
~ ^ > < >= <= | ! && ||
?: >> << = , op= sizeof

Related

how post and pre increment works with multiplication operator? [duplicate]

What are "sequence points"?
What is the relation between undefined behaviour and sequence points?
I often use funny and convoluted expressions like a[++i] = i;, to make myself feel better. Why should I stop using them?
If you've read this, be sure to visit the follow-up question Undefined behavior and sequence points reloaded.
(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all this would be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)
C++98 and C++03
This answer is for the older versions of the C++ standard. The C++11 and C++14 versions of the standard do not formally contain 'sequence points'; operations are 'sequenced before' or 'unsequenced' or 'indeterminately sequenced' instead. The net effect is essentially the same, but the terminology is different.
Disclaimer : Okay. This answer is a bit long. So have patience while reading it. If you already know these things, reading them again won't make you crazy.
Pre-requisites : An elementary knowledge of C++ Standard
What are Sequence Points?
The Standard says
At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations
shall be complete and no side effects of subsequent evaluations shall have taken place. (§1.9/7)
Side effects? What are side effects?
Evaluation of an expression produces something and if in addition there is a change in the state of the execution environment it is said that the expression (its evaluation) has some side effect(s).
For example:
int x = y++; //where y is also an int
In addition to the initialization operation the value of y gets changed due to the side effect of ++ operator.
So far so good. Moving on to sequence points. An alternation definition of seq-points given by the comp.lang.c author Steve Summit:
Sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete.
What are the common sequence points listed in the C++ Standard?
Those are:
at the end of the evaluation of full expression (§1.9/16) (A full-expression is an expression that is not a subexpression of another expression.)1
Example :
int a = 5; // ; is a sequence point here
in the evaluation of each of the following expressions after the evaluation of the first expression (§1.9/18) 2
a && b (§5.14)
a || b (§5.15)
a ? b : c (§5.16)
a , b (§5.18) (here a , b is a comma operator; in func(a,a++) , is not a comma operator, it's merely a separator between the arguments a and a++. Thus the behaviour is undefined in that case (if a is considered to be a primitive type))
at a function call (whether or not the function is inline), after the evaluation of all function arguments (if any) which
takes place before execution of any expressions or statements in the function body (§1.9/17).
1 : Note : the evaluation of a full-expression can include the evaluation of subexpressions that are not lexically
part of the full-expression. For example, subexpressions involved in evaluating default argument expressions (8.3.6) are considered to be created in the expression that calls the function, not the expression that defines the default argument
2 : The operators indicated are the built-in operators, as described in clause 5. When one of these operators is overloaded (clause 13) in a valid context, thus designating a user-defined operator function, the expression designates a function invocation and the operands form an argument list, without an implied sequence point between them.
What is Undefined Behaviour?
The Standard defines Undefined Behaviour in Section §1.3.12 as
behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements 3.
Undefined behavior may also be expected when this
International Standard omits the description of any explicit definition of behavior.
3 : permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or with-
out the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
In short, undefined behaviour means anything can happen from daemons flying out of your nose to your girlfriend getting pregnant.
What is the relation between Undefined Behaviour and Sequence Points?
Before I get into that you must know the difference(s) between Undefined Behaviour, Unspecified Behaviour and Implementation Defined Behaviour.
You must also know that the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.
For example:
int x = 5, y = 6;
int z = x++ + y++; //it is unspecified whether x++ or y++ will be evaluated first.
Another example here.
Now the Standard in §5/4 says
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
What does it mean?
Informally it means that between two sequence points a variable must not be modified more than once.
In an expression statement, the next sequence point is usually at the terminating semicolon, and the previous sequence point is at the end of the previous statement. An expression may also contain intermediate sequence points.
From the above sentence the following expressions invoke Undefined Behaviour:
i++ * ++i; // UB, i is modified more than once btw two SPs
i = ++i; // UB, same as above
++i = 2; // UB, same as above
i = ++i + 1; // UB, same as above
++++++i; // UB, parsed as (++(++(++i)))
i = (i, ++i, ++i); // UB, there's no SP between `++i` (right most) and assignment to `i` (`i` is modified more than once btw two SPs)
But the following expressions are fine:
i = (i, ++i, 1) + 1; // well defined (AFAIK)
i = (++i, i++, i); // well defined
int j = i;
j = (++i, i++, j*i); // well defined
Furthermore, the prior value shall be accessed only to determine the value to be stored.
What does it mean? It means if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written.
For example in i = i + 1 all the access of i (in L.H.S and in R.H.S) are directly involved in computation of the value to be written. So it is fine.
This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification.
Example 1:
std::printf("%d %d", i,++i); // invokes Undefined Behaviour because of Rule no 2
Example 2:
a[i] = i++ // or a[++i] = i or a[i++] = ++i etc
is disallowed because one of the accesses of i (the one in a[i]) has nothing to do with the value which ends up being stored in i (which happens over in i++), and so there's no good way to define--either for our understanding or the compiler's--whether the access should take place before or after the incremented value is stored. So the behaviour is undefined.
Example 3 :
int x = i + i++ ;// Similar to above
Follow up answer for C++11 here.
This is a follow up to my previous answer and contains C++11 related material..
Pre-requisites : An elementary knowledge of Relations (Mathematics).
Is it true that there are no Sequence Points in C++11?
Yes! This is very true.
Sequence Points have been replaced by Sequenced Before and Sequenced After (and Unsequenced and Indeterminately Sequenced) relations in C++11.
What exactly is this 'Sequenced before' thing?
Sequenced Before(§1.9/13) is a relation which is:
Asymmetric
Transitive
between evaluations executed by a single thread and induces a strict partial order1
Formally it means given any two evaluations(See below) A and B, if A is sequenced before B, then the execution of A shall precede the execution of B. If A is not sequenced before B and B is not sequenced before A, then A and B are unsequenced 2.
Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A, but it is unspecified which3.
[NOTES]
1 : A strict partial order is a binary relation "<" over a set P which is asymmetric, and transitive, i.e., for all a, b, and c in P, we have that:
........(i). if a < b then ¬ (b < a) (asymmetry);
........(ii). if a < b and b < c then a < c (transitivity).
2 : The execution of unsequenced evaluations can overlap.
3 : Indeterminately sequenced evaluations cannot overlap, but either could be executed first.
What is the meaning of the word 'evaluation' in context of C++11?
In C++11, evaluation of an expression (or a sub-expression) in general includes:
value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and
initiation of side effects.
Now (§1.9/14) says:
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.
Trivial example:
int x;
x = 10;
++x;
Value computation and side effect associated with ++x is sequenced after the value computation and side effect of x = 10;
So there must be some relation between Undefined Behaviour and the above-mentioned things, right?
Yes! Right.
In (§1.9/15) it has been mentioned that
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced4.
For example :
int main()
{
int num = 19 ;
num = (num << 3) + (num >> 3);
}
Evaluation of operands of + operator are unsequenced relative to each other.
Evaluation of operands of << and >> operators are unsequenced relative to each other.
4: In an expression that is evaluated more than once during the execution
of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations.
(§1.9/15)
The value computations of the operands of an
operator are sequenced before the value computation of the result of the operator.
That means in x + y the value computation of x and y are sequenced before the value computation of (x + y).
More importantly
(§1.9/15) If a side effect on a scalar object is unsequenced relative to either
(a) another side effect on the same scalar object
or
(b) a value computation using the value of the same scalar object.
the behaviour is undefined.
Examples:
int i = 5, v[10] = { };
void f(int, int);
i = i++ * ++i; // Undefined Behaviour
i = ++i + i++; // Undefined Behaviour
i = ++i + ++i; // Undefined Behaviour
i = v[i++]; // Undefined Behaviour
i = v[++i]: // Well-defined Behavior
i = i++ + 1; // Undefined Behaviour
i = ++i + 1; // Well-defined Behaviour
++++i; // Well-defined Behaviour
f(i = -1, i = -1); // Undefined Behaviour (see below)
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [Note: Value computations and side effects associated with different argument expressions are unsequenced. — end note]
Expressions (5), (7) and (8) do not invoke undefined behaviour. Check out the following answers for a more detailed explanation.
Multiple preincrement operations on a variable in C++0x
Unsequenced Value Computations
Final Note :
If you find any flaw in the post please leave a comment. Power-users (With rep >20000) please do not hesitate to edit the post for correcting typos and other mistakes.
C++17 (N4659) includes a proposal Refining Expression Evaluation Order for Idiomatic C++
which defines a stricter order of expression evaluation.
In particular, the following sentence
8.18 Assignment and compound assignment operators:....
In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
The right operand is sequenced before the left operand.
together with the following clarification
An expression X is said to be sequenced before an expression Y if every
value computation and every side effect associated with the expression X is sequenced before every value
computation and every side effect associated with the expression Y.
make several cases of previously undefined behavior valid, including the one in question:
a[++i] = i;
However several other similar cases still lead to undefined behavior.
In N4140:
i = i++ + 1; // the behavior is undefined
But in N4659
i = i++ + 1; // the value of i is incremented
i = i++ + i; // the behavior is undefined
Of course, using a C++17 compliant compiler does not necessarily mean that one should start writing such expressions.
I am guessing there is a fundamental reason for the change, it isn't merely cosmetic to make the old interpretation clearer: that reason is concurrency. Unspecified order of elaboration is merely selection of one of several possible serial orderings, this is quite different to before and after orderings, because if there is no specified ordering, concurrent evaluation is possible: not so with the old rules. For example in:
f (a,b)
previously either a then b, or, b then a. Now, a and b can be evaluated with instructions interleaved or even on different cores.
In C99(ISO/IEC 9899:TC3) which seems absent from this discussion thus far the following steteents are made regarding order of evaluaiton.
[...]the order of evaluation of subexpressions and the order in which
side effects take place are both unspecified. (Section 6.5 pp 67)
The order of evaluation of the operands is unspecified. If an attempt
is made to modify the result of an assignment operator or to access it
after the next sequence point, the behavior[sic] is undefined.(Section
6.5.16 pp 91)

Increment or decrement in boundaries

I'll make examples in Python, since I use Python, but the question is not about Python.
Lets say I want to increment a variable by specific value so that it stays in given boundaries.
So for increment and decrement I have these two functions:
def up (a, s, Bmax):
r = a + s
if r > Bmax : return Bmax
else : return r
def down (a, s, Bmin):
r = a - s
if r < Bmin : return Bmin
else : return r
Note: it is supposed that initial value of the variable "a" is already in boundaries (min <= a <= max) so additional initial checking does not belong to this function. What makes me curious, almost every program I made needs these functions.
The question is:
are those classified as some typical operations and have they specific names?
if yes, is there some correspondence to intrinsic processor functionality so it is optimised in some compilers?
Reason why I ask is pure curiousity, of course I cannot optimise it in Python and I know little about CPU architecture.
To be more specific, on a lower level for an unsigned 8-bit integer the increment would look I suppose like this:
def up (a, s, Bmax):
counter = 0
while True:
if counter == s : break
if a == Bmax : break
if a == 255 : break
a += 1
counter += 1
I know the latter would not make any sense in Python so treat it as my naive attempt to imagine low level code which adds the value in place. There are some nuances, e.g. signed, unsigned, but I was interested merely about unsigned integers since I came across it more often.
It is called saturation arithmetic, it has native support on DSPs and GPUs (not a random pair: both deals with signals).
For example the NVIDIA PTX ISA let the programmer chose if an addition is saturated or not
add.type d, a, b;
add{.sat}.s32 d, a, b; // .sat applies only to .s32
.sat
limits result to MININT..MAXINT (no overflow) for the size of the operation.
The TI TMS320C64x/C64x+ DSP has support for
Dual 16-bit saturated arithmetic operations
and instruction like sadd to perform a saturated add and even a whole register (Saturation Status Register) dedicated to collecting precise information about saturation while executing a sequence of instructions.
Even the mainstream x86 has support for saturation with instructions like vpaddsb and similar (including conversions).
Another example is the GLSL clamp function, used to make sure color values are not outside the range [0, 1].
In general if the architecture must be optimized for signal/media processing it has support for saturation arithmetic.
Much more rare is the support for saturation with arbitrary bounds, e.g. asymmetrical bounds, non power of two bounds, non word sized bounds.
However, saturation can be implemented easily as min(max(v, b), B) where v is the result of the unsaturated (and not overflowed) operation, b the lower bound and B the upper bound.
So any architecture that support finding the minimum and the maximum without a branch, can implement any form of saturation efficiently.
See also this question for a more real example of how saturated addition is implemented.
As a side note the default behavior is wrap around: for 8-bit quantities the sum 255 + 1 equals 0 (i.e. operations are modulo 28).

What's the purpose of expressing the relation of objects with integers?

When comparing objects it's common that you will end up with an integer other than -1, 0, 1.
e.g. (in Java)
Byte a = 10;
Byte b = 20;
System.out.println(a.compareTo(b)); // -10
Is there any algorithm, data-structure used in practice that takes advantage of this attribute of the comparison model?
Or in other words: why is any number > 1 or < -1 is a helpful piece of info?
Edit: I'm sorry. I see how you could've misinterpreted the question as a Java problem. My mistake. I changed the tag from "java" to "language agnostic".
The contract of a Comparable object specifies that the value returned by compareTo() is:
A negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.
The above definition simplifies comparisons, we just need to test the returned value against zero using the usual comparison operators. For instance, to check if object a is greater than or equal to object b we can write:
a.compareTo(b) >= 0
Also, this is more flexible than simply returning -1, 1 or 0 as it allows each implementation to return a value with additional information. For example, String's compareTo() returns:
The difference of the two character values at position k in the two strings -- that is, the value:
this.charAt(k) - anotherString.charAt(k)
If there is no index position at which they differ, then the shorter string lexicographically precedes the longer string. In this case, compareTo returns the difference of the lengths of the strings -- that is, the value:
this.length() - anotherString.length()
No algorithm will take advantage of this "attribute", because you cannot rely on the exact value returned.
The only guarantee you have, is that it will be <0, =0, or >0, because that is the contract defined by Comparable.compareTo():
Returns a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.
The Byte implementation isn't any more specific:
Returns the value 0 if this Byte is equal to the argument Byte; a value less than 0 if this Byte is numerically less than the argument Byte; and a value greater than 0 if this Byte is numerically greater than the argument Byte (signed comparison).
Anything else is arbitrary and may change without notice.
To clarify, the returned value is defined to be <0, =0, or >0 instead of -1, 0, or +1 as a convenience to the implementation, not as a means to provide additional information to the caller.
As an example, the Byte.compareTo(Byte anotherByte) is implemented to return a number between -255 and 255 (inclusive) with this simple code:
return this.value - anotherByte.value;
The alternative would be code like:
return this.value < anotherByte.value ? -1 : this.value > anotherByte.value ? 1 : 0;
Since it's as easy for the caller to test the return value x < 0 instead of x == -1, allowing the broader range of return values provides for cleaner, more optimal code.

Why do operators not work in C++-CLI?

According to microsoft, the operators in C++ are the same in visual studio C++ 2010
http://msdn.microsoft.com/en-us/library/x04xhy0h.aspx
However, look at the following builds:
int^ number = 32;
if (number < 100)
MessageBox::Show("The number is not greater than 100");
Build failed
'<' : 'System::Int32 ^' does not define this operator or a conversion to a type acceptable to the predefined operator
if (number <= 100)
MessageBox::Show("The number is not greater than 100");
Build failed
'<=' : 'System::Int32 ^' does not define this operator or a conversion to a type acceptable to the predefined operator
if (number == 32)
MessageBox::Show("The is equal to 32");
Build successful... However the message is not displayed.
if (number = 32)
MessageBox::Show("The is equal to 32");
Build successful.. The message is displayed. (Why? The operator of the equality is ==)
Why is this happening?
int^ declares a handle to an object. Whenever you reference number directly, you're actually referencing a boxed integer (somewhat equivalent to (object)32 in C#).
In addition, handles to objects don't define the < or <= (or > or >=) operators when comparing to an integer literal. The reason for that can be deducted from the following:
They do, however define an == operator. But in order to compare, the literal value you're comparing to will be implicitly boxed, making the comparison (somewhat) equivalent to this C# code:
object number = 32;
if (number == (object)32)
MessageBox.Show("The number is equal to 32");
That comparison will check if the references are the same. Which they aren't - they're two different objects. Hence:
int^ number = 32;
if (number == 32)
MessageBox::Show("The number is equal to 32"); // isn't displayed
... and since you're comparing references rather than values, >, >=, <=, < would make little sense.
In your last case, you're assigning 32 to number, then checking if the result of that expression (which is itself 32) is different from 0 - it is, so the message is displayed. That's what if does in C++ (and C) - in C#, number = 32 does have the result 32, but you'd get a compiler error due to the if requiring a boolean value.
"Solution": Dereference the int^:
if (*number == 32)
MessageBox::Show("The number is equal to 32");
... or simply use int:
int number = 32;
EDIT: Rewrote based on Ben Voigt's more correct explanation.

How to compute the integer absolute value

How to compute the integer absolute value without using if condition.
I guess we need to use some bitwise operation.
Can anybody help?
Same as existing answers, but with more explanations:
Let's assume a twos-complement number (as it's the usual case and you don't say otherwise) and let's assume 32-bit:
First, we perform an arithmetic right-shift by 31 bits. This shifts in all 1s for a negative number or all 0s for a positive one (but note that the actual >>-operator's behaviour in C or C++ is implementation defined for negative numbers, but will usually also perform an arithmetic shift, but let's just assume pseudocode or actual hardware instructions, since it sounds like homework anyway):
mask = x >> 31;
So what we get is 111...111 (-1) for negative numbers and 000...000 (0) for positives
Now we XOR this with x, getting the behaviour of a NOT for mask=111...111 (negative) and a no-op for mask=000...000 (positive):
x = x XOR mask;
And finally subtract our mask, which means +1 for negatives and +0/no-op for positives:
x = x - mask;
So for positives we perform an XOR with 0 and a subtraction of 0 and thus get the same number. And for negatives, we got (NOT x) + 1, which is exactly -x when using twos-complement representation.
Set the mask as right shift of integer by 31 (assuming integers are stored as two's-complement 32-bit values and that the right-shift operator does sign extension).
mask = n>>31
XOR the mask with number
mask ^ n
Subtract mask from result of step 2 and return the result.
(mask^n) - mask
Assume int is of 32-bit.
int my_abs(int x)
{
int y = (x >> 31);
return (x ^ y) - y;
}
One can also perform the above operation as:
return n*(((n>0)<<1)-1);
where n is the number whose absolute need to be calculated.
In C, you can use unions to perform bit manipulations on doubles. The following will work in C and can be used for both integers, floats, and doubles.
/**
* Calculates the absolute value of a double.
* #param x An 8-byte floating-point double
* #return A positive double
* #note Uses bit manipulation and does not care about NaNs
*/
double abs(double x)
{
union{
uint64_t bits;
double dub;
} b;
b.dub = x;
//Sets the sign bit to 0
b.bits &= 0x7FFFFFFFFFFFFFFF;
return b.dub;
}
Note that this assumes that doubles are 8 bytes.
I wrote my own, before discovering this question.
My answer is probably slower, but still valid:
int abs_of_x = ((x*(x >> 31)) | ((~x + 1) * ((~x + 1) >> 31)));
If you are not allowed to use the minus sign you could do something like this:
int absVal(int x) {
return ((x >> 31) + x) ^ (x >> 31);
}
For assembly the most efficient would be to initialize a value to 0, substract the integer, and then take the max:
pxor mm1, mm1 ; set mm1 to all zeros
psubw mm1, mm0 ; make each mm1 word contain the negative of each mm0 word
pmaxswmm1, mm0 ; mm1 will contain only the positive (larger) values - the absolute value
In C#, you can implement abs() without using any local variables:
public static long abs(long d) => (d + (d >>= 63)) ^ d;
public static int abs(int d) => (d + (d >>= 31)) ^ d;
Note: regarding 0x80000000 (int.MinValue) and 0x8000000000000000 (long.MinValue):
As with all of the other bitwise/non-branching methods shown on this page, this gives the single non-mathematical result abs(int.MinValue) == int.MinValue (likewise for long.MinValue). These represent the only cases where result value is negative, that is, where the MSB of the two's-complement result is 1 -- and are also the only cases where the input value is returned unchanged. I don't believe this important point was mentioned elsewhere on this page.
The code shown above depends on the value of d used on the right side of the xor being the value of d updated during the computation of left side. To C# programmers this will seem obvious. They are used to seeing code like this because .NET formally incorporates a strong memory model which strictly guarantees the correct fetching sequence here. The reason I mention this is because in C or C++ one may need to be more cautious. The memory models of the latter are considerably more permissive, which may allow certain compiler optimizations to issue out-of-order fetches. Obviously, in such a regime, fetch-order sensitivity would represent a correctness hazard.
If you don't want to rely on implementation of sign extension while right bit shifting, you can modify the way you calculate the mask:
mask = ~((n >> 31) & 1) + 1
then proceed as was already demonstrated in the previous answers:
(n ^ mask) - mask
What is the programming language you're using? In C# you can use the Math.Abs method:
int value1 = -1000;
int value2 = 20;
int abs1 = Math.Abs(value1);
int abs2 = Math.Abs(value2);

Resources