Comparison method violates its general contract in Spark - sorting

I tried to sort my List[Row] data set and here is how I made for it.
def getDiffMinute(ts1:Timestamp, ts2:Timestamp) : Long = {
if(ts1==null || ts2==null) 0
else (ts1.getTime - ts2.getTime) / 60000
}
myList.sortWith( (r1: Row, r2: Row) =>
MYUtils.getDiffMinute( r1.getAs[Timestamp]("time"), r2.getAs[Timestamp]("time")) < 0
)
Since getDiffMinute function return Long type data and wortWith need bool type, there is no way to get exception.
Some data lists work so well, but others(especially big data like more than 1gb) does not work with this error.
Comparison method violates its general contract
Any Idea of this?

I assume its because your comparator getDiffMinute is not properly written.
In your case lets say B is null, then diff(A,B) = 0, diff(B,C) = 0 so diff (A,C) should be 0 too, but it can be anything if neither A and C are nulls.
more info:
http://docs.oracle.com/javase/6/docs/api/java/util/Comparator.html#compare(T,%20T)

Related

Avoid counting values of Ints with for loop in Kotlin

I have a list of A class objects
data class A{
val abc: Abc
val values: Int?
}
val list = List<A>
If I want to count how many objects I have in list I use:
val count= a.count()
or val count= a.count(it -> {})
How to append all values in the list of objects A avoiding for loop? Generaly Im looking for proper kotlin syntax with avoiding code below
if (a!= null) {
for (i in list) {
counter += i.values!!
}
}
Either use sumBy or sum in case you have a list of non-nullable numbers already available, i.e.:
val counter = list.sumBy { it.values ?: 0 }
// or
val counter = extractedNonNullValues.sum()
The latter only makes sense if you already mapped your A.values before to a list of non-nullable values, e.g. something like:
val extractedNonNullValues= list.mapNotNull { it.values } // set somewhere else before because you needed it...
If you do not need such an intermediate extractedNonNullValues-list then just go for the sumBy-variant.
I don't see you doing any appending to a list in the question. Based on your for loop I believe what you meant was "How do I sum properties of objects in my list". If that's the case you can use sumBy, the extension function on list that takes a labmda: ((T) -> Int) and returns an Int like so:
val sum = list.sumBy { a -> a.values ?: 0 }
Also, calling an Int property values is pretty confusing, I think it should be called value. The plural indicates a list...
On another note, there is a possible NPE in your original for loop. Avoid using !! on nullable values as, if the value is null, you will get an NPE. Instead, use null coalescing (aka elvis) operator to fall back to a default value ?: - this is perfectly acceptable in a sum function. If the iteration is not to do with summing, you may need to handle the null case differently.

Using the Haxe While Loop to Remove All of a Value from an Array

I'm wanting to remove all of a possibly duplicated value in an array. At the moment I'm using the remove(x:T):Bool function in a while loop, but I'm wondering about the expression part.
I've started by using:
function removeAll(array:Array<String>, element:String):Void
while (array.remove(element)) {}
but I'm wondering if any of these lines would be more efficient:
while (array.remove(element)) continue;
while (array.remove(element)) true;
while (array.remove(element)) 0;
or if it makes any kind of difference.
I'm guessing that using continue is less efficient because it actually has to do something, true and 0 are slightly more efficient, but still do something, and {} would probably be most efficient.
Does anyone have any background information on this?
While other suggested filter, it will create a new instance of list/array which may cause your other code to lose reference.
If you loop array.remove, it is going to loop through all the elements in the front of the array every time, which is not so performant.
IMO a better approach is to use a reverse while loop:
var i = array.length;
while(--i >= 0)
if(array[i] == element) array.splice(i, 1);
It doesn't make any difference. In fact, there's not even any difference in the generated code for the {}, 0 and false cases: they all end up generating {}, at least on the JS target.
However, you could run into issues if you have a large array with many duplicates: in that case, remove() would be called many times, and it has to iterate over the array each time (until it finds a match, that is). In that case, it's probably more efficient to use filter():
function removeAll(array:Array<String>, element:String):Array<String>
return array.filter(function(e) return e != element);
Personally, I also find this to be a bit more elegant than your while-loop with an empty body. But again, it depends on the use case: this does create a new array, and thus causes an allocation. Usually, that's not worth worrying about, but if you for instance do it in the update loop of a game, you might want to avoid it.
In terms of the expression part of the while loop, it seems that it's just set to empty brases ({}) when compiled so it doesn't really matter what you do.
In terms of performance, a much better solution is the Method 2 from the following:
class Test
{
static function main()
{
var thing:Array<String> = new Array<String>();
for (index in 0...1000)
{
thing.push("0");
thing.push("1");
}
var copy1 = thing.copy();
var copy2 = thing.copy();
trace("epoch");
while (copy1.remove("0")) {}
trace("check");
// Method 2.
copy2 = [
for (item in Lambda.filter(copy2, function(v)
{return v != "0";}))
item
];
trace("check");
}
}
which can be seen [here](https://try.haxe.org/#D0468"Try Haxe example."). For 200,000 one-character elements in an Array<String>, Method 2 takes 0.017s while Method 1 takes 44.544s.
For large arrays it will be faster to use a temporary array and then assign that back after populating ( method3 in try )?
OR
If you don't want to use a temp you can assign back and splice ( method4 in try )?
https://try.haxe.org/#5f80c
Both are more verbose codewise as I setup vars, but on mac seems faster at runtime, summary of my method3 approach:
while( i < l ) { if( ( s = copy[ i++ ] ) != '0' ) arr[ j++ ] = s;
copy = arr;
am I missing something obvious against these approaches?

Comparator.compareBoolean() the same as Comparator.compare()?

How can I write this
Comparator <Item> sort = (i1, i2) -> Boolean.compare(i2.isOpen(), i1.isOpen());
to something like this (code does not work):
Comparator<Item> sort = Comparator.comparing(Item::isOpen).reversed();
Comparing method does not have something like Comparator.comparingBool(). Comparator.comparing returns int and not "Item".
Why can't you write it like this?
Comparator<Item> sort = Comparator.comparing(Item::isOpen);
Underneath Boolean.compareTo is called, which in turn is the same as Boolean.compare
public static int compare(boolean x, boolean y) {
return (x == y) ? 0 : (x ? 1 : -1);
}
And this: Comparator.comparing returns int and not "Item". make little sense, Comparator.comparing must return a Comparator<T>; in your case it correctly returns a Comparator<Item>.
The overloads comparingInt, comparingLong, and comparingDouble exist for performance reasons only. They are semantically identical to the unspecialized comparing method, so using comparing instead of comparingXXX has the same outcome, but might having boxing overhead, but the actual implications depend on the particular execution environment.
In case of boolean values, we can predict that the overhead will be negligible, as the method Boolean.valueOf will always return either Boolean.TRUE or Boolean.FALSE and never create new instances, so even if a particular JVM fails to inline the entire code, it does not depend on the presence of Escape Analysis in the optimizer.
As you already figured out, reversing a comparator is implemented by swapping the argument internally, just like you did manually in your lambda expression.
Note that it is still possible to create a comparator fusing the reversal and an unboxed comparison without having to repeat the isOpen() expression:
Comparator<Item> sort = Comparator.comparingInt(i -> i.isOpen()? 0: 1);
but, as said, it’s unlikely to have a significantly higher performance than the Comparator.comparing(Item::isOpen).reversed() approach.
But note that if you have a boolean sort criteria and care for the maximum performance, you may consider replacing the general-purpose sort algorithm with a bucket sort variant. E.g.
If you have a Stream, replace
List<Item> result = /* stream of Item */
.sorted(Comparator.comparing(Item::isOpen).reversed())
.collect(Collectors.toList());
with
Map<Boolean,List<Item>> map = /* stream of Item */
.collect(Collectors.partitioningBy(Item::isOpen,
Collectors.toCollection(ArrayList::new)));
List<Item> result = map.get(true);
result.addAll(map.get(false));
or, if you have a List, replace
list.sort(Comparator.comparing(Item::isOpen).reversed());
with
ArrayList<Item> temp = new ArrayList<>(list.size());
list.removeIf(item -> !item.isOpen() && temp.add(item));
list.addAll(temp);
etc.
Use comparing using key extractor parameter:
Comparator<Item> comparator =
Comparator.comparing(Item::isOpen, Boolean::compare).reversed();

Range of doubles in Swift

I am currently writing a Swift application and parts of it require making sure certain user inputs add up to a specified value.
A simplified example:
Through program interaction, the user has specified that totalValue = 67 and that turns = 2. This means that in two inputs, the user will have to provide two values that add up to 67.
So lets say on turn 1 the user enters 32, and then on turn 2 he enters 35, this would be valid because 32 + 35 = 67.
This all works fine, but the moment we verge into more than one decimal place, the program cannot add the numbers correctly. For example, if totalValue = 67 and then on turn 1 the user enters 66.95 and then on turn 2 he enters .05 the program will return that this is an error despite the fact that
66.95 + .05 = 67. This problem does not happen with one decimal place or less (something like turn 1 = 55.5 and turn 2 = 11.5 works fine), only for two decimal spots and beyond. I am storing the values as doubles. Thanks in advance
Some example code:
var totalWeights = 67
var input = Double(myTextField.text.bridgeToObjectiveC().doubleValue)
/*Each turn is for a button click*/
/*For turn 1*/
if inputValid == true && turn == 1 && input < totalWeights
{
myArray[0] = input
}
else
{
//show error string
}
/*For turn 2*/
if inputValid == true && turn == 2 && input == (totalWeights - myArray[0])
{
myArray[1] = input
}
else
{
//show error string
}
If you want exact values from floating point then the float/double types will not work, as they are only ever approximations of exact numbers. Look into using the NSDecimalNumber class from within Swift, I'm not sure what the bridging would look like but it should be simple.
Here is an example of how this could work:
var a = 0
for num in numlist {
a += num
}
var result = false
if a == targetnum
result = true
I haven't tested this out, but if numlist is an array of double then it should work for any input that is a valid number.
One problem I just realized is that there is an issue with doing an equals with doubles, as rounding will cause problems for you. I am not going to show it, but if, while reading in the inputs you keep track of how many numbers to the right of the decimal place, then multiply all of the values by that number of tens, so 66.95 * 100 to get it all as an integer, then add, then do the comparison, after multiplying the targetnum by the same value (100).
Unfortunately there is no ideal solution to this. We must use approximation type comparison.
For example, instead of checking:
if val1 == val2
we must try something like:
if val1 > (val2 - .0005) && val1 < (val2 + .0005)

Should I test if equal to 1 or not equal to 0?

I was coding here the other day, writing a couple of if statements with integers that are always either 0 or 1 (practically acting as bools). I asked myself:
When testing for positive result, which is better; testing for int == 1 or int != 0?
For example, given an int n, if I want to test if it's true, should I use n == 1 or n != 0?
Is there any difference at all in regards to speed, processing power, etc?
Please ignore the fact that the int may being more/less than 1/0, it is irrelevant and does not occur.
Human's brain better process statements that don't contain negations, which makes "int == 1" better way.
It really depends. If you're using a language that supports booleans, you should use the boolean, not an integer, ie:
if (value == false)
or
if (value == true)
That being said, with real boolean types, it's perfectly valid (and typically nicer) to just write:
if (!value)
or
if (value)
There is really very little reason in most modern languages to ever use an integer for a boolean operation.
That being said, if you're using a language which does not support booleans directly, the best option here really depends on how you're defining true and false. Often, false is 0, and true is anything other than 0. In that situation, using if (i == 0) (for false check) and if (i != 0) for true checking.
If you're guaranteed that 0 and 1 are the only two values, I'd probably use if (i == 1) since a negation is more complex, and more likely to lead to maintenance bugs.
If you're working with values that can only be 1 or 0, then I suggest you use boolean values to begin with and then just do if (bool) or if (!bool).
In language where int that are not 0 represents the boolean value 'true', and 0 'false', like C, I will tend to use if (int != 0) because it represents the same meaning as if (int) whereas int == 1 represents more the integer value being equal to 1 rather than the boolean true. It may be just me though. In languages that support the boolean type, always use it rather than ints.
A Daft question really. If you're testing for 1, test for 1, if you're testing for zero, test for zero.
The addition of an else statement can make the choice can seem arbitrary. I'd choose which makes the most sense, or has more contextual significance, default or 'natural' behaviour suggested by expected frequency of occurrence for example.
This choice between int == 0 and int != 1 may very well boil down to subjective evaluations which probably aren't worth worrying about.
Two points:
1) As noted above, being more explicit is a win. If you add something to an empty list you not only want its size to be not zero, but you also want it to be explicitly 1.
2) You may want to do
(1 == int)
That way if you forget an = you'll end up with a compile error rather than a debugging session.
To be honest if the value of int is just 1 or 0 you could even say:
if (int)
and that would be the same as saying
if (int != 0)
but you probably would want to use
if (int == 1)
because not zero would potentially let the answer be something other than 1 even though you said not to worry about it.
If only two values are possible, then I would use the first:
if(int == 1)
because it is more explicit. If there were no constraint on the values, I would think otherwise.
IF INT IS 1
NEXT SENTENCE
ELSE MOVE "INT IS NOT ONE" TO MESSAGE.
As others have said, using == is frequently easier to read than using !=.
That said, most processors have a specific compare-to-zero operation. It depends on the specific compiler, processor, et cetera, but there may be an almost immeasurably small speed benefit to using != 0 over == 1 as a result.
Most languages will let you use if (int) and if (!int), though, which is both more readable and get you that minuscule speed bonus.
I'm paranoid. If a value is either 0 or 1 then it might be 2. May be not today, may be not tomorrow, but some maintenance programmer is going to do something weird in a subclass. Sometimes I make mistakes myself [shh, don't tell my employer]. So, make the code say tell me that the value is either 0 or 1, otherwise it cries to mummy.
if (i == 0) {
... 0 stuff ...
} else if (i == 1) {
... 1 stuff ...
} else {
throw new Error();
}
(You might prefer switch - I find its syntax in curly brace language too heavy.)
When using integers as booleans, I prefer to interpret them as follows: false = 0, true = non-zero.
I would write the condition statements as int == 0 and int != 0.
I would say it depends on the semantics, if you condition means
while ( ! abort ) negation is ok.
if ( quit ) break; would be also ok.
if( is_numeric( $int ) ) { its a number }
elseif( !$int ) { $int is not set or false }
else { its set but its not a number }
end of discussion :P
I agree with what most people have said in this post. It's much more efficient to use boolean values if you have one of two distinct possibilities. It also makes the code a lot easier to read and interpret.
if(bool) { ... }
I was from the c world. At first I don't understand much about objective-c. After some while, I prefer something like:
if (int == YES)
or
if (int == NO)
in c, i.e.:
if (int == true)
if (int == false)
these days, I use varchar instead of integer as table keys too, e.g.
name marital_status
------ --------------
john single
joe married
is a lot better than:
name marital_status
------ --------------
john S
joe M
or
name marital_status
------ --------------
john 1
joe 2
(Assuming your ints can only be 1 or 0) The two statements are logically equivalent. I'd recommend using the == syntax though because I think it's clearer to most people when you don't introduce unnecessary negations.

Resources