I write my own reimplementation of LINQ using F# (thanks to Jon Skeet for inspiration).
I use a trick to generate empty sequence:
let empty<'b> =
seq {
for n = 0 to -1 do
yield Unchecked.defaultof<'b>
}
printfn "%A" empty<int> // -> seq []
Is there any idiomatic approach to do this?
(Seq.empty is not useful, I'm just reimplementing it)
The simplest implementation using sequence expressions I can think of is:
let empty() = seq { do () }
Or if you want a generic value rather than a function:
let empty<'T> : seq<'T> = seq { do () }
One would want to write just seq { } for a sequence expression that does not produce any values, but that's not syntactically valid and so we need to do something inside the sequence expression. Using do () is just a way to tell the compiler that this is a syntactically valid sequence expression that does not do anything (and does not produce any values) when evaluated.
Related
How can I write this
Comparator <Item> sort = (i1, i2) -> Boolean.compare(i2.isOpen(), i1.isOpen());
to something like this (code does not work):
Comparator<Item> sort = Comparator.comparing(Item::isOpen).reversed();
Comparing method does not have something like Comparator.comparingBool(). Comparator.comparing returns int and not "Item".
Why can't you write it like this?
Comparator<Item> sort = Comparator.comparing(Item::isOpen);
Underneath Boolean.compareTo is called, which in turn is the same as Boolean.compare
public static int compare(boolean x, boolean y) {
return (x == y) ? 0 : (x ? 1 : -1);
}
And this: Comparator.comparing returns int and not "Item". make little sense, Comparator.comparing must return a Comparator<T>; in your case it correctly returns a Comparator<Item>.
The overloads comparingInt, comparingLong, and comparingDouble exist for performance reasons only. They are semantically identical to the unspecialized comparing method, so using comparing instead of comparingXXX has the same outcome, but might having boxing overhead, but the actual implications depend on the particular execution environment.
In case of boolean values, we can predict that the overhead will be negligible, as the method Boolean.valueOf will always return either Boolean.TRUE or Boolean.FALSE and never create new instances, so even if a particular JVM fails to inline the entire code, it does not depend on the presence of Escape Analysis in the optimizer.
As you already figured out, reversing a comparator is implemented by swapping the argument internally, just like you did manually in your lambda expression.
Note that it is still possible to create a comparator fusing the reversal and an unboxed comparison without having to repeat the isOpen() expression:
Comparator<Item> sort = Comparator.comparingInt(i -> i.isOpen()? 0: 1);
but, as said, it’s unlikely to have a significantly higher performance than the Comparator.comparing(Item::isOpen).reversed() approach.
But note that if you have a boolean sort criteria and care for the maximum performance, you may consider replacing the general-purpose sort algorithm with a bucket sort variant. E.g.
If you have a Stream, replace
List<Item> result = /* stream of Item */
.sorted(Comparator.comparing(Item::isOpen).reversed())
.collect(Collectors.toList());
with
Map<Boolean,List<Item>> map = /* stream of Item */
.collect(Collectors.partitioningBy(Item::isOpen,
Collectors.toCollection(ArrayList::new)));
List<Item> result = map.get(true);
result.addAll(map.get(false));
or, if you have a List, replace
list.sort(Comparator.comparing(Item::isOpen).reversed());
with
ArrayList<Item> temp = new ArrayList<>(list.size());
list.removeIf(item -> !item.isOpen() && temp.add(item));
list.addAll(temp);
etc.
Use comparing using key extractor parameter:
Comparator<Item> comparator =
Comparator.comparing(Item::isOpen, Boolean::compare).reversed();
This is a follow up to my previous question regarding the Seq module's iter and map functions being much slower compared to the Array and List module equivalents.
Looking at the source, I can see that some functions such as isEmpty and length performs a very simple type check to optimize for arrays and lists before resorting to using IEnumerator.
[<CompiledName("IsEmpty")>]
let isEmpty (source : seq<'T>) =
checkNonNull "source" source
match source with
| :? ('T[]) as a -> a.Length = 0
| :? list<'T> as a -> a.IsEmpty
| :? ICollection<'T> as a -> a.Count = 0
| _ ->
use ie = source.GetEnumerator()
not (ie.MoveNext())
[<CompiledName("Length")>]
let length (source : seq<'T>) =
checkNonNull "source" source
match source with
| :? ('T[]) as a -> a.Length
| :? ('T list) as a -> a.Length
| :? ICollection<'T> as a -> a.Count
| _ ->
use e = source.GetEnumerator()
let mutable state = 0
while e.MoveNext() do
state <- state + 1;
state
In the case of the iter the same approach can be done to vastly improve its performance, when I shadowed the iter function it presented significant gains over the built-in version:
[<CompiledName("Iterate")>]
let iter f (source : seq<'T>) =
checkNonNull "source" source
use e = source.GetEnumerator()
while e.MoveNext() do
f e.Current;
My question is, given that some of the functions in the Seq module were optimized for use with specific collection types (arrays, list< T>, etc.) how come other functions such as iter and nth were not optimized in a similar way?
Also, in the case of map function, as #mausch pointed out, is it not possible to employ a similar approach to Enumerable.Select (see below) and build up specialized iterators for different collection types?
public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
if (source == null)
throw Error.ArgumentNull("source");
if (selector == null)
throw Error.ArgumentNull("selector");
if (source is Enumerable.Iterator<TSource>)
return ((Enumerable.Iterator<TSource>) source).Select<TResult>(selector);
if (source is TSource[])
return (IEnumerable<TResult>) new Enumerable.WhereSelectArrayIterator<TSource, TResult>((TSource[]) source, (Func<TSource, bool>) null, selector);
if (source is List<TSource>)
return (IEnumerable<TResult>) new Enumerable.WhereSelectListIterator<TSource, TResult>((List<TSource>) source, (Func<TSource, bool>) null, selector);
else
return (IEnumerable<TResult>) new Enumerable.WhereSelectEnumerableIterator<TSource, TResult>(source, (Func<TSource, bool>) null, selector);
}
Many thanks in advance.
In the case of the iter the same approach can be done to vastly improve its performance
I think this is where the answer to your question is. Your test is artificial, and doesn't actually test any real world examples of these methods. You tested 10,000,000 iterations of these methods in order to get differences in timing in ms.
Converted back to per item costs, here they are:
Array List
Seq.iter 4 ns 7 ns
Seq.map 20 ns 91 ns
These methods are typically used once per collection, meaning this cost is an additional linear factor to your algorithms performance. In the worst case you are losing less than 100 ns per item in a list (which you shouldn't be using if you care about performance that much).
Contrast this with the case of length which is always linear in the general case. By adding this optimization you provide enormous benefit to someone who forgot to manually cache the length but luckily is always given a list.
Similarly you may call isEmpty many times, and adding another object creation is silly if you can just ask directly. (This one isn't as strong an argument)
Another thing to keep in mind is that neither of those methods actually looks at more than one element of the output. What would you expect the following code to do (excluding syntax errors or missing methods)
type Custom() =
interface IEnumerable with
member x.GetEnumerator() =
return seq {
yield 1
yield 2
}
interface IList with
member x.Item with
get(index) = index
member x.Count = 12
let a = Custom()
a |> Seq.iter (v -> printfn (v.ToString()))
On the surface, the type-checks in Seq.length/isEmpty seem like mistakes. I assume most Seq functions don't perform such checks for orthogonality: type-specific versions already exist in the List/Array modules. Why duplicate them?
Those checks make more sense in LINQ since it only uses IEnumerable directly.
I am new to Scala and am trying to get a list of random double values:
The thing is, when I try to run this, it takes way too long compared to its Java counterpart. Any ideas on why this is or a suggestion on a more efficient approach?
def random: Double = java.lang.Math.random()
var f = List(0.0)
for (i <- 1 to 200000)
( f = f ::: List(random*100))
f = f.tail
You can also achieve it like this:
List.fill(200000)(math.random)
the same goes for e.g. Array ...
Array.fill(200000)(math.random)
etc ...
You could construct an infinite stream of random doubles:
def randomList(): Stream[Double] = Stream.cons(math.random, randomList)
val f = randomList().take(200000)
This will leverage lazy evaluation so you won't calculate a value until you actually need it. Even evaluating all 200,000 will be fast though. As an added bonus, f no longer needs to be a var.
Another possibility is:
val it = Iterator.continually(math.random)
it.take(200000).toList
Stream also has a continually method if you prefer.
First of all, it is not taking longer than java because there is no java counterpart. Java does not have an immutable list. If it did, performance would be about the same.
Second, its taking a lot of time because appending lists have linear performance, so the whole thing has quadratic performance.
Instead of appending, prepend, which had constant performance.
if your using mutable state anyways you should use a mutable collection like buffer which you can add too with += (which then would be the real counterpart to java code).
but why dont u use list comprehension?
val f = for (_ <- 1 to 200000) yield (math.random * 100)
by the way: var f = List(0.0) ... f = f.tail can be replaced by var f: List[Double] = Nil in your example. (no more performance but more beauty ;)
Yet more options! Tail recursion:
def randlist(n: Int, part: List[Double] = Nil): List[Double] = {
if (n<=0) part
else randlist(n-1, 100*random :: part)
}
or mapped ranges:
(1 to 200000).map(_ => 100*random).toList
Looks like you want to use Vector instead of List. List has O(1) prepend, Vector has O(1) append. Since you are appending, but using concatenation, it'll be faster to use Vector:
def random: Double = java.lang.Math.random()
var f: Vector[Double] = Vector()
for (i <- 1 to 200000)
f = f :+ (random*100)
Got it?
I mean this:
bool passed = true;
for(int i = 0; i < collection.Length; i++)
{
if(!PassesTest(collection[i]))
{
passed = false;
break;
}
}
if(passed){/*passed code*/}
requires extra variable, extra test
for(int i = 0; i < collection.Length; i++)
{
if(!PassesTest(collection[i]))
{
return;
}
}
{/*passed code*/}
neat, but requires this to be it's own function, if this it's self is inside a loop or something, not the most performant way of doing things. also, writing a whole extra function is a pain
if(passed){/*passed code*/}
for(int i = 0; i < collection.Length; i++)
{
if(!PassesTest(collection[i]))
{
goto failed;
}
}
{/*passed code*/}
failed: {}
great, but you have to screw around with label names and ugly label syntax
for(int i = 0; ; i++)
{
if(!(i < collection.Length))
{
{/*passed code*/}
break;
}
if(!PassesTest(collection[i]))
{
break;
}
}
probably the nicest, but still a bit manual, kinda wasting the functionality of the for loop construct, for instance, you can't do this with a foreach
what is the nicest way to handle this problem?
it seems to me something like this would be nice:
foreach(...)
{
...
}
finally{...} // only executed if loop ends conventionally (without break)
am I missing something? because this is a very common problem for me, and I don't really like any of the solutions I've come up with.
I use c++ and C#, so solutions in either would be great.
but would also be interested in solutions in other languages. (though a design principle that avoids this in any language would be ideal)
If your language doesn't have this feature, write a function "forall," which takes two arguments: a list and a boolean-valued function which is to be true for all elements of the list. Then you only have to write it once, and it matters very little how idiomatic it is.
The "forall" function looks exactly like your second code sample, except that now "collection" and "passesTest" are the arguments to that function.
Calling forall looks roughly like:
if (forall(myList,isGood)) {
which is readable.
As an added bonus, you could implement "exists" by calling "forall" on the negated boolean function, and negating its answer. That is, "exists x P(x)" is implemented as "not forall x not P(x)".
You can use Linq in .NET.
Here's an example in C#:
if(collection.All(item => PassesTest(item)))
{
// do my code
}
C++ and STL
if (std::all_of(collection.begin(), collection.end(), PassesTest))
{
/* passed code */
}
Ruby:
if collection.all? {|n| n.passes_test?}
# do something
end
Clojure:
(if (every? passes-test? collection)
; do something
)
Groovy:
if (!collection.find { !PassesTest(it) }) {
// do something
}
Scala:
def passesTest(i: Int) = i < 5 // example, would likely be idiomatically in-line
var seq = List(1,2,3,4);
seq.forall(passesTest) // => True
Most, if not all of the answers presented here are saying: higher-order constructs -- such as "passing functions" -- are really, really nice. If you are designing a language, don't forget about the last 60+ years of programming languages/designs.
Python (since v2.5):
if all(PassesTest(c) for c in collection):
do something
Notes:
we can iterate directly on collection, no need for an index and lookup
all() and any() builtin functions were added in Python 2.5
the argument to any(), i.e. PassesTest(c) for c in collection, is a generator expression. You could also view it as a list comprehension by adding brackets [(PassesTest(c) for c in collection]
... for c in collection causes iteration through a collection (or sequence/tuple/list). For a dict, use dict.keys(). For other datatypes, use the correct iterator method.
I am completely new to F# (and functional programming in general) but I see pattern matching used everywhere in sample code. I am wondering for example how pattern matching actually works? For example, I imagine it working the same as a for loop in other languages and checking for matches on each item in a collection. This is probably far from correct, how does it actually work behind the scenes?
How does pattern matching actually work? The same as a for loop?
It is about as far from a for loop as you could imagine: instead of looping, a pattern match is compiled to an efficient automaton. There are two styles of automaton, which I call the "decision tree" and the "French style." Each style offers different advantages: the decision tree inspects the minimum number of values needed to make a decision, but a naive implementation may require exponential code space in the worst case. The French style offers a different time-space tradeoff, with good but not optimal guarantees for both.
But the absolutely definitive work on this problem is Luc Maranget's excellent paper "Compiling Pattern Matching to Good Decisions Trees from the 2008 ML Workshop. Luc's paper basically shows how to get the best of both worlds. If you want a treatment that may be slightly more accessible to the amateur, I humbly recommend my own offering When Do Match-Compilation Heuristics Matter?
Writing a pattern-match compiler is easy and fun!
It depends on what kind of pattern matching do you mean - it is quite powerful construct and can be used in all sorts of ways. However, I'll try to explain how pattern matching works on lists. You can write for example these patterns:
match l with
| [1; 2; 3] -> // specific list of 3 elements
| 1::rest -> // list starting with 1 followed by more elements
| x::xs -> // non-empty list with element 'x' followed by a list
| [] -> // empty list (no elements)
The F# list is actually a discriminated union containing two cases - [] representing an empty list or x::xs representing a list with first element x followed by some other elements. In C#, this might be represented like this:
// Represents any list
abstract class List<T> { }
// Case '[]' representing an empty list
class EmptyList<T> : List<T> { }
// Case 'x::xs' representing list with element followed by other list
class ConsList<T> : List<T> {
public T Value { get; set; }
public List<T> Rest { get; set; }
}
The patterns above would be compiled to the following (I'm using pseudo-code to make this simpler):
if (l is ConsList) && (l.Value == 1) &&
(l.Rest is ConsList) && (l.Rest.Value == 2) &&
(l.Rest.Rest is ConsList) && (l.Rest.Rest.Value == 3) &&
(l.Rest.Rest.Rest is EmptyList) then
// specific list of 3 elements
else if (l is ConsList) && (l.Value == 1) then
var rest = l.Rest;
// list starting with 1 followed by more elements
else if (l is ConsList) then
var x = l.Value, xs = l.Rest;
// non-empty list with element 'x' followed by a list
else if (l is EmptyList) then
// empty list (no elements)
As you can see, there is no looping involved. When processing lists in F#, you would use recursion to implement looping, but pattern matching is used on individual elements (ConsList) that together compose the entire list.
Pattern matching on lists is a specific case of discriminated union which is discussed by sepp2k. There are other constructs that may appear in pattern matching, but essentially all of them are compiled using some (complicated) if statement.
No, it doesn't loop. If you have a pattern match like this
match x with
| Foo a b -> a + b
| Bar c -> c
this compiles down to something like this pseudo code:
if (x is a Foo)
let a = (first element of x) in
let b = (second element of x) in
a+b
else if (x is a Bar)
let c = (first element of x) in
c
If Foo and Bar are constructors from an algebraic data type (i.e. a type defined like type FooBar = Foo int int | Bar int) the operations x is a Foo and x is a Bar are simple comparisons. If they are defined by an active pattern, the operations are defined by that pattern.
If you compile your F# code to an .exe then take a look with Reflector you can see what the C# "equivalent" of the F# code.
I've used this method to look at F# examples quite a bit.