What's the purpose of expressing the relation of objects with integers? - algorithm

When comparing objects it's common that you will end up with an integer other than -1, 0, 1.
e.g. (in Java)
Byte a = 10;
Byte b = 20;
System.out.println(a.compareTo(b)); // -10
Is there any algorithm, data-structure used in practice that takes advantage of this attribute of the comparison model?
Or in other words: why is any number > 1 or < -1 is a helpful piece of info?
Edit: I'm sorry. I see how you could've misinterpreted the question as a Java problem. My mistake. I changed the tag from "java" to "language agnostic".

The contract of a Comparable object specifies that the value returned by compareTo() is:
A negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.
The above definition simplifies comparisons, we just need to test the returned value against zero using the usual comparison operators. For instance, to check if object a is greater than or equal to object b we can write:
a.compareTo(b) >= 0
Also, this is more flexible than simply returning -1, 1 or 0 as it allows each implementation to return a value with additional information. For example, String's compareTo() returns:
The difference of the two character values at position k in the two strings -- that is, the value:
this.charAt(k) - anotherString.charAt(k)
If there is no index position at which they differ, then the shorter string lexicographically precedes the longer string. In this case, compareTo returns the difference of the lengths of the strings -- that is, the value:
this.length() - anotherString.length()

No algorithm will take advantage of this "attribute", because you cannot rely on the exact value returned.
The only guarantee you have, is that it will be <0, =0, or >0, because that is the contract defined by Comparable.compareTo():
Returns a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.
The Byte implementation isn't any more specific:
Returns the value 0 if this Byte is equal to the argument Byte; a value less than 0 if this Byte is numerically less than the argument Byte; and a value greater than 0 if this Byte is numerically greater than the argument Byte (signed comparison).
Anything else is arbitrary and may change without notice.
To clarify, the returned value is defined to be <0, =0, or >0 instead of -1, 0, or +1 as a convenience to the implementation, not as a means to provide additional information to the caller.
As an example, the Byte.compareTo(Byte anotherByte) is implemented to return a number between -255 and 255 (inclusive) with this simple code:
return this.value - anotherByte.value;
The alternative would be code like:
return this.value < anotherByte.value ? -1 : this.value > anotherByte.value ? 1 : 0;
Since it's as easy for the caller to test the return value x < 0 instead of x == -1, allowing the broader range of return values provides for cleaner, more optimal code.

Related

Should I provide consistency checks in the Huffman tree building algorithm for DEFLATE?

In RFC-1951 there is a simple algorithm that restores the Huffman tree from a list of code lengths, described following way:
1) Count the number of codes for each code length. Let
bl_count[N] be the number of codes of length N, N >= 1.
2) Find the numerical value of the smallest code for each
code length:
code = 0;
bl_count[0] = 0;
for (bits = 1; bits <= MAX_BITS; bits++) {
code = (code + bl_count[bits-1]) << 1;
next_code[bits] = code;
}
3) Assign numerical values to all codes, using consecutive
values for all codes of the same length with the base
values determined at step 2. Codes that are never used
(which have a bit length of zero) must not be assigned a
value.
for (n = 0; n <= max_code; n++) {
len = tree[n].Len;
if (len != 0) {
tree[n].Code = next_code[len];
next_code[len]++;
}
But there is no any data consistency checks in the algorithm. On the other hand is it obvious that the lengths list can be invalid. The length values, because of encoding in 4 bits can not be invalid, but, for example, there can be more codes than can be encoded for some code length.
What is the minimal set of checks that will provide data validation? Or such checks are not needed for some reason that I missed?
zlib checks that the list of code lengths is both complete, i.e. that it uses up all bit patterns, and that it does not overflow the bit patterns. The one allowed exception is when there is a single symbol with length 1, in which case the code is allowed to be incomplete (The bit 0 means that symbol, a 1 bit is undefined).
This helps zlib reject random, corrupted, or improperly coded data with higher probability and earlier in the stream. This is a different sort of robustness than what was suggested in another answer here, where you could alternatively permit incomplete codes and only return an error when an undefined code is encountered in the compressed data.
To calculate completeness, you start with the number of bits in the code k=1, and the number of possible codes n=2. There are two possible one-bit codes. You subtract from n the number of length 1 codes, n -= a[k]. Then you increment k to look at two-bit codes, and you double n. Subtract the number of two-bit codes. When you're done, n should be zero. If at any point n goes negative, you can stop right there as you have an invalid set of code lengths. If when you're done n is greater than zero, then you have an incomplete code.
I think that checking that next_code[len] does not overflow past its respective bits is enough. So after tree[n].Code = next_code[len];, you can do the following check:
if (tree[n].Code & ((1<<len)-1) == 0)
print(Error)
If tree[n].Code & ((1<<len)-1) reaches 0, it means that there are more codes of length len than they should, so the lengths list had an error in it.
On the other hand, if every symbol of the tree is assigned a valid (unique) code, then you have created a correct Huffman tree.
EDIT: It just dawned on me: You can simply make the same check at the end of step one: You just have to check that bl_count[N] <= 2^N - SUM((2^j)*bl_count[N-j]) for all 1<=j<=N and for all N >=1 (If a binary tree has bl_count[N-1] leaves in level N-1, then it cannot have more than 2^N - 2*bl_count[N-1] leaves in level N, level 0 being the root).
This guarantees that the code you create is a prefix code, but it does not guarantee that it is the same as the original creator intended. If for example the lengths list is invalid in a way that you can still create a valid prefix code, you cannot prove that this is the Huffman code, simply because you do not know the frequency of occurence for each symbol.
You need to make sure that there is no input that will cause your code to execute illegal or undefined behavior, such as indexing off the end of an array, because such illegal inputs might be used to attack your code.
In my opinion, you should attempt to handle illegal but not dangerous inputs as gracefully as possible, so as to inter-operate with programs written by others which may interpret the specification in a different way than you have, or which have made small errors which have only one plausible interpretation. This is the Robustness principle - you can find discussions of this starting at http://en.wikipedia.org/wiki/Robustness_principle.

The result of Sum is floor(Col + Row + 1) is never an integer and I don't know why

I have to write a piece of prolog where I have to calculate which position in an array is used to store a value. However the result of these calculations should return an integer, so I use the floor/1 predicate to get myself the integer of the value but this doesn't work in my code. It keeps returning a number with decimal point, for example 3.0 instead of 3
The following is my code:
assign_value(El, NumberArray, RowNumber, I) :-
ground(El),
Number is NumberArray[El],
Col is I/3,
Row is RowNumber/3*3,
Sum is floor(Col + Row + 1),
subscript(Number, [Sum], El).
assign_value(_, _, _, _).
The result of the Sum is floor(Col + Row + 1) is never an integer and I don't know why. Can anyone help me with this?
In ISO Prolog, the evaluable functor floor/1 has as signature (9.1.1 in ISO/IEC 13211-1):
floorF→I
So it expects a float and returns an integer.
However, I do not believe that first creating floats out of integers and then flooring them back to integers is what you want, instead, consider to use (div)/2 in place of (/)/2 thereby staying with integers all the time.
From the documentation of floor/2 (http://www.eclipseclp.org/doc/bips/kernel/arithmetic/floor-2.html)
The result type is the same as the argument type. To convert the
type to integer, use integer/2.
For example:
...,
Floor is floor(Col+Row+1), Sum is integer(Floor).
Reading the documentation for floor/2, we see that
[floor/2] works on all numeric types. The result value is the largest integral
value that is smaller that Number (rounding down towards minus infinity).
The result type is the same as the argument type. To convert the type to integer,
use integer/2.
So you get the same type you supplied as the argument. Looking further at your predicate, we see the use of the / operator. Reading the documentation further, we see that
'/'/3 is used by the ECLiPSe compiler to expand evaluable arithmetic expressions.
So the call to /(Number1, Number2, Result) is equivalent to
Result is Number1 / Number2
which should be preferred for portability.
The result type of the division depends on the value of the global flag prefer_rationals.
When it is off, the result is a float, when it is on, the result is a rational.
Your division operation never returns an integer, meaning that things get upcast to floating point.
If you want to perform integer division, you should use the operators // or div.

Finding pairs that sum to a multiple of k

Suppose, I have a set of N(N<=10^10) natural numbers. Out of these, I want to form sets of 2 numbers such that their sum is divisible by k. Suppose, that N=4,ie, Numbers: 1, 2, 3, 4 and k=2. Hence, the formed sets would be:: (1,3) and (2,4).
No repetitions and the first element of the set should be less than the second element.
Following is my code and logic. But I don't know why it is giving incorrect answers for lage values of N.:
int c[] = new int[K];
for (long j=1;j<=N;j++) {
++c[(int)j%K];//storing remainder in array
}
long count = 0;
if (K%2==0)
count = (c[0]*(c[0]-1) + c[K/2]*(c[K/2]-1))/2;//modulus that have value 0 or half of k, should be paired together, in C(N,2) ways.
else
count = c[0]*(c[0]-1)/2;
for (int j=1;j<(K+1)/2;j++) {
count+=c[j]*c[K-j];//sets whose modulus form a sum of K
}
I see at least two things:
First, in this line:
++c[(int)j%K];//storing remainder in array
I'm pretty sure it'll do the cast to int before actually doing the % operation (but not 100% sure).
Second, in the rest of the code, for all of the count = ... lines, you are doing arithmetic on ints then assigning the result to a long. The implicit cast to long is not done until after the arithmetic operations are done. Thus, if the operations overflow an int, you end up overflowing then casting to a long.
If you want to fix that, you'll have to explicitly do casts to long on the right-hand side to make sure that none of the arithmetic operations operate on two ints. (Though unless you have memory constraints, it'll be better to just use longs everywhere instead of ints, with the exception of j and K)

OpenCL - GPU Vector Math (Instruction Level Parallelism)

This article talks about the optimization of code and discusses Instruction level parallelism. They give an example of GPU vector math where the float4 vector math can be performed on the vector rather than the individual scalars. Example given:
float4 x_neighbor = center.xyxy + float4(-1.0f, 0.0f, 1.0f, 0.0f);
Now my question is can it be used for comparison purposes as well? So in the reduction example, can I do this:
accumulator.xyz = (accumulator.xyz < element.xyz) ? accumulator.xyz : element.xyz;
Thank you.
As already stated by Austin comparison operators apply on vectors as well.
The point d. in the section 6.3 of the standard is the relevant part for you. It says:
The relational operators greater than (>), less than (<), greater than
or equal (>=), and less than or equal (<=) operate on scalar and
vector types.
it explains as well the valid cases:
The two operands are scalars. (...)
One operand is a scalar, and the other is a vector. (...) The scalar type is then widened to a vector that has the same number of
components as the vector operand. The operation is done component-wise
resulting in the same size vector.
The two operands are vectors of the same type. In this case, the operation is done component-wise resulting in the same size vector.
And finally, what these comparison operators return:
The result is a scalar signed integer of type int if the source
operands are scalar and a vector signed integer type of the same size
as the source operands if the source operands are vector types.
For scalar types, the relational operators shall return 0 if the
specified relation is false and 1 if the specified relation is true.
For vector types, the relational operators shall return 0 if the
specified relation is false and –1 (i.e. all bits set) if the
specified relation is true. The relational operators always return 0
if either argument is not a number (NaN).
EDIT:
To complete a bit the return value part, especially after #redrum's comment; It seems odd at first that the true value is -1 for the vector types. However, since OCL behaves as much as possible like C, it doesn't make a big change since everything that is different than 0 is true.
As an example is you have the vector:
int2 vect = (int2)(0, -1);
This statement will evaluate to true and do something:
if(vect.y){
//Do something
}
Now, note that this isn't valid (not related to the value returned, but only to the fact it is a vector):
if(vect){
//do something
}
This won't compile, however, you can use the function all and any to evaluate all elements of a vector in an "if statement":
if(any(vect){
//this will evaluate to true in our example
}
Note that the returned value is (from the quick reference card):
int any (Ti x): 1 if MSB in component of x is set; else 0
So any negative number will do.
But still, why not keep 1 as the returned value when evaluated to true?
I think that the important part is the fact that all bits are set. My guess, would be that like that you can make easily bitwise operation on vectors, like say you want to eliminate the elements smaller than a given value. Thanks to the fact that the value "true" is -1, i.e. 111111...111, you can do something like that:
int4 vect = (int4)(75, 3, 42, 105);
int ref = 50;
int4 result = (vect < ref) & vect;
and result's elements will be: 0, 3, 42, 0
in the other hand if the returned value was 1 for true, the result would be: 0, 1, 0, 0
The OpenCL 1.2 Reference Card from Khronos says that logical operators:
Operators [6.3]
These operators behave similarly as in C99 except that
operands may include vector types when possible:
+ - * % / -- ++ == != &
~ ^ > < >= <= | ! && ||
?: >> << = , op= sizeof

what is the advantage of using negative enums

In the kaleidoscope parser / AST example at LLVM, the enum is given all negative values. Why the minus sign ? enum Token {
tok_eof = -1,
// commands
tok_def = -2, tok_extern = -3,
// primary
tok_identifier = -4, tok_number = -5
};
A common C idiom with enums is to use negative values to mean one set of conditions and positive values to mean another set. For example, error conditions from the parser might be all positive values, while normal conditions have all negative values, and maybe zero is the "undefined" case. So in your code testing for any error is as simple as tok >= 0.
I believe the usage of these negative values was just a way to denote special tokens in the code.
In the example code, valid tokens are from 0 to 255 so any value out of this range can be used for special tokens like tok_eof. Therefore since 0 to 255 cannot be used for the enum, they've then chosen to use negative values although they could have used, 256, 257, 258 etc. Negative values seem more inituitive than 256, 257, 258 IMO.

Resources