Filtering subsets using Linq - linq

Imagine a have a very long enunumeration, too big to reasonably convert to a list. Imagine also that I want to remove duplicates from the list. Lastly imagine that I know that only a small subset of the initial enumeration could possibly contain duplicates. The last point makes the problem practical.
Basically I want to filter out the list based on some predicate and only call Distinct() on that subset, but also recombine with the enumeration where the predicate returned false.
Can anyone think of a good idiomatic Linq way of doing this? I suppose the question boils down to the following:
With Linq how can you perform selective processing on a predicated enumeration and recombine the result stream with the rejected cases from the predicate?

You can do it by traversing the list twice, once to apply the predicate and dedup, and a second time to apply the negation of the predicate. Another solution is to write your own variant of the Where extension method that pushes non-matching entries into a buffer on the side:
IEnumerable<T> WhereTee(this IEnumerable<T> input, Predicate<T> pred, List<T> buffer)
{
foreach (T t in input)
{
if (pred(t))
{
yield return t;
}
else
{
buffer.Add(t);
}
}
}

Can you give a little more details on how you would like to recombine the elments.
One way i can think of solving this problem is by using the Zip operator of .Net 4.0 like this.
var initialList = new List<int>();
var resjectedElemnts = initialList.Where( x=> !aPredicate(x) );
var accepetedElements = initialList.Where( x=> aPredicate(x) );
var result = accepetedElements.Zip(resjectedElemnts,(accepted,rejected) => T new {accepted,rejected});
This will create a list of pair of rejected and accepeted elements. But the size of the list will be contrained by the shorter list between the two inputs.

Related

Comparator.compareBoolean() the same as Comparator.compare()?

How can I write this
Comparator <Item> sort = (i1, i2) -> Boolean.compare(i2.isOpen(), i1.isOpen());
to something like this (code does not work):
Comparator<Item> sort = Comparator.comparing(Item::isOpen).reversed();
Comparing method does not have something like Comparator.comparingBool(). Comparator.comparing returns int and not "Item".
Why can't you write it like this?
Comparator<Item> sort = Comparator.comparing(Item::isOpen);
Underneath Boolean.compareTo is called, which in turn is the same as Boolean.compare
public static int compare(boolean x, boolean y) {
return (x == y) ? 0 : (x ? 1 : -1);
}
And this: Comparator.comparing returns int and not "Item". make little sense, Comparator.comparing must return a Comparator<T>; in your case it correctly returns a Comparator<Item>.
The overloads comparingInt, comparingLong, and comparingDouble exist for performance reasons only. They are semantically identical to the unspecialized comparing method, so using comparing instead of comparingXXX has the same outcome, but might having boxing overhead, but the actual implications depend on the particular execution environment.
In case of boolean values, we can predict that the overhead will be negligible, as the method Boolean.valueOf will always return either Boolean.TRUE or Boolean.FALSE and never create new instances, so even if a particular JVM fails to inline the entire code, it does not depend on the presence of Escape Analysis in the optimizer.
As you already figured out, reversing a comparator is implemented by swapping the argument internally, just like you did manually in your lambda expression.
Note that it is still possible to create a comparator fusing the reversal and an unboxed comparison without having to repeat the isOpen() expression:
Comparator<Item> sort = Comparator.comparingInt(i -> i.isOpen()? 0: 1);
but, as said, it’s unlikely to have a significantly higher performance than the Comparator.comparing(Item::isOpen).reversed() approach.
But note that if you have a boolean sort criteria and care for the maximum performance, you may consider replacing the general-purpose sort algorithm with a bucket sort variant. E.g.
If you have a Stream, replace
List<Item> result = /* stream of Item */
.sorted(Comparator.comparing(Item::isOpen).reversed())
.collect(Collectors.toList());
with
Map<Boolean,List<Item>> map = /* stream of Item */
.collect(Collectors.partitioningBy(Item::isOpen,
Collectors.toCollection(ArrayList::new)));
List<Item> result = map.get(true);
result.addAll(map.get(false));
or, if you have a List, replace
list.sort(Comparator.comparing(Item::isOpen).reversed());
with
ArrayList<Item> temp = new ArrayList<>(list.size());
list.removeIf(item -> !item.isOpen() && temp.add(item));
list.addAll(temp);
etc.
Use comparing using key extractor parameter:
Comparator<Item> comparator =
Comparator.comparing(Item::isOpen, Boolean::compare).reversed();

Filter a list and get also its opposite with linq

I have a simple question about list with LINQ. I searched about it and I haven't found anything like it.
I don't know which of them is better (in performance, for example).
list1 = list.Where(x => x.property).ToList();
Now I want the opposite list
list2 = list.Where(x=> !x.property).ToList();
I used this, but I am not sure.
list2 = list.Except(list1).ToList();
I've also thought getting a group and get both lists at the same time. It is a list with two lists. Iterate over the first list is heavy.
What do you think about it?
Thanks a lot.
Honestly, I'd just go with:
var listEqual = new List<YourType>();
var listNotEqual = new List<YourType>();
foreach(YourType item in list)
if (item.Property)
listEqual.Add(item)
else
listNotEqual.Add(item);
You could use
var lookup = list.ToLookup(x => x.Property);
var list1 = lookup[true].ToList();
var list2 = lookup[false].ToList();
(even dropping the ToList if you don't strictly need lists) - this does only iterate your sequence once, although you need a little more code to handle the case where everything in your input is either true or false.

Good algorithm to turn stl map into sorted list of the keys based on a numeric value

I have a stl map that's of type:
map<Object*, baseObject*>
where
class baseObject{
int ID;
//other stuff
};
If I wanted to return a list of objects (std::list< Object* >), what's the best way to sort it in order of the baseObject.ID's?
Am I just stuck looking through for every number or something? I'd prefer not to change the map to a boost map, although I wouldn't be necessarily against doing something that's self contained within a return function like
GetObjectList(std::list<Object*> &objects)
{
//sort the map into the list
}
Edit: maybe I should iterate through and copy the obj->baseobj into a map of baseobj.ID->obj ?
What I'd do is first extract the keys (since you only want to return those) into a vector, and then sort that:
std::vector<baseObject*> out;
std::transform(myMap.begin(), myMap.end(), std::back_inserter(out), [](std::pair<Object*, baseObject*> p) { return p.first; });
std::sort(out.begin(), out.end(), [&myMap](baseObject* lhs, baseObject* rhs) { return myMap[lhs].componentID < myMap[rhs].componentID; });
If your compiler doesn't support lambdas, just rewrite them as free functions or function objects. I just used lambdas for conciseness.
For performance, I'd probably reserve enough room in the vector initially, instead of letting it gradually expand.
(Also note that I haven't tested the code, so it might need a little bit of fiddling)
Also, I don't know what this map is supposed to represent, but holding a map where both key and value types are pointers really sets my "bad C++" sense tingling. It smells of manual memory management and muddled (or nonexistent) ownership semantics.
You mentioned getting the output in a list, but a vector is almost certainly a better performing option, so I used that. The only situation where a list is preferable is really when you have no intention of ever iterating over it, and if you need the guarantee that pointers and iterators stay valid after modification of the list.
The first thing is that I would not use a std::list, but rather a std::vector. Now as of the particular problem you need to perform two operations: generate the container, sort it by whatever your criteria is.
// Extract the data:
std::vector<Object*> v;
v.reserve( m.size() );
std::transform( m.begin(), m.end(),
std::back_inserter(v),
[]( const map<Object*, baseObject*>::value_type& v ) {
return v.first;
} );
// Order according to the values in the map
std::sort( v.begin(), v.end(),
[&m]( Object* lhs, Object* rhs ) {
return m[lhs]->id < m[rhs]->id;
} );
Without C++11 you will need to create functors instead of the lambdas, and if you insist in returning a std::list then you should use std::list<>::sort( Comparator ). Note that this is probably inefficient. If performance is an issue (after you get this working and you profile and know that this is actually a bottleneck) you might want to consider using an intermediate map<int,Object*>:
std::map<int,Object*> mm;
for ( auto it = m.begin(); it != m.end(); ++it )
mm[ it->second->id ] = it->first;
}
std::vector<Object*> v;
v.reserve( mm.size() ); // mm might have less elements than m!
std::transform( mm.begin(), mm.end(),
std::back_inserter(v),
[]( const map<int, Object*>::value_type& v ) {
return v.second;
} );
Again, this might be faster or slower than the original version... profile.
I think you'll do fine with:
GetObjectList(std::list<Object*> &objects)
{
std::vector <Object*> vec;
vec.reserve(map.size());
for(auto it = map.begin(), it_end = map.end(); it != it_end; ++it)
vec.push_back(it->second);
std::sort(vec.begin(), vec.end(), [](Object* a, Object* b) { return a->ID < b->ID; });
objects.assign(vec.begin(), vec.end());
}
Here's how to do what you said, "sort it in order of the baseObject.ID's":
typedef std::map<Object*, baseObject*> MapType;
MapType mymap; // don't care how this is populated
// except that it must not contain null baseObject* values.
struct CompareByMappedId {
const MapType &map;
CompareByMappedId(const MapType &map) : map(map) {}
bool operator()(Object *lhs, Object *rhs) {
return map.find(lhs)->second->ID < map.find(rhs)->second->ID;
}
};
void GetObjectList(std::list<Object*> &objects) {
assert(objects.empty()); // pre-condition, or could clear it
// or for that matter return a list by value instead.
// copy keys into list
for (MapType::const_iterator it = mymap.begin(); it != mymap.end(); ++it) {
objects.push_back(it->first);
}
// sort the list
objects.sort(CompareByMappedId(mymap));
}
This isn't desperately efficient: it does more looking up in the map than is strictly necessary, and manipulating list nodes in std::list::sort is likely a little slower than std::sort would be at manipulating a random-access container of pointers. But then, std::list itself isn't very efficient for most purposes, so you expect it to be expensive to set one up.
If you need to optimize, you could create a vector of pairs of (int, Object*), so that you only have to iterate over the map once, no need to look things up. Sort the pairs, then put the second element of each pair into the list. That may be a premature optimization, but it's an effective trick in practice.
I would create a new map that had a sort criterion that used the component id of your objects. Populate the second map from the first map (just iterate through or std::copy in). Then you can read this map in order using the iterators.
This has a slight overhead in terms of insertion over using a vector or list (log(n) time instead of constant time), but it avoids the need to sort after you've created the vector or list which is nice.
Also, you'll be able to add more elements to it later in your program and it will maintain its order without need of a resort.
I'm not sure I completely understand what you're trying to store in your map but perhaps look here
The third template argument of an std::map is a less functor. Perhaps you can utilize this to sort the data stored in the map on insertion. Then it would be a straight forward loop on a map iterator to populate a list

Explain the below Linq Query?

results.Where(x=>x.Members.Any(y=>members.Contains(y.Name.ToLower())
I happened to see this query in internet. Can anyone explain this query please.
suggest me a good LINQ tutorial for this newbie.
thank you all.
Edited:
what is this x and y stands for?
x is a single result, of the type of the elements in the results sequence.
y is a single member, of the type of the elements in the x.Members sequence.
These are lambda expressions (x => x.whatever) that were introduced into the language with C# 3, where x is the input, and the right side (x.whatever) is the output (in this particular usage scenario).
An easier example
var list = new List<int> { 1, 2, 3 };
var oddNumbers = list.Where(i => i % 2 != 0);
Here, i is a single int item that is an input into the expression. i % 2 != 0 is a boolean expression evaluating whether the input is even or odd. The entire expression (i => i % 2 != 0) is a predicate, a Func<int, bool>, where the input is an integer and the output is a boolean. Follow? As you iterate over the query oddNumbers, each element in the list sequence is evaluated against the predicate. Those that pass then become part of your output.
foreach (var item in oddNumbers)
Console.WriteLine(item);
// writes 1, 3
Its a lambda expression. Here is a great LINQ tutorial
Interesting query, but I don't like it.
I'll answer your second question first. x and y are parameters to the lambda methods that are defined in the calls to Where() and Any(). You could easy change the names to be more meaningful:
results.Where(result =>
result.Members.Any(member => members.Contains(member.Name.ToLower());
And to answer your first question, this query will return each item in results where the Members collection has at least one item that is also contained in the Members collection as a lower case string.
The logic there doesn't make a whole lot of sense to me with knowing what the Members collection is or what it holds.
x will be every instance of the results collection. The query uses lambda syntax, so x=>x.somemember means "invoke somemember on each x passed in. Where is an extension method for IEnumerables that expects a function that will take an argument and return a boolean. Lambda syntax creates delegates under the covers, but is far more expressive for carrying out certain types of operation (and saves a lot of typing).
Without knowing the type of objects held in the results collection (results will be something that implements IEnumerable), it is hard to know exactly what the code above will do. But an educated guess is that it will check all the members of all the x's in the above collection, and return you an IEnumerable of only those that have members with all lower-case names.

An efficient technique to replace an occurence in a sequence with mutable or immutable state

I am searching for an efficient a technique to find a sequence of Op occurences in a Seq[Op]. Once an occurence is found, I want to replace the occurence with a defined replacement and run the same search again until the list stops changing.
Scenario:
I have three types of Op case classes. Pop() extends Op, Push() extends Op and Nop() extends Op. I want to replace the occurence of Push(), Pop() with Nop(). Basically the code could look like seq.replace(Push() ~ Pop() ~> Nop()).
Problem:
Now that I call seq.replace(...) I will have to search in the sequence for an occurence of Push(), Pop(). So far so good. I find the occurence. But now I will have to splice the occurence form the list and insert the replacement.
Now there are two options. My list could be mutable or immutable. If I use an immutable list I am scared regarding performance because those sequences are usually 500+ elements in size. If I replace a lot of occurences like A ~ B ~ C ~> D ~ E I will create a lot of new objects If I am not mistaken. However I could also use a mutable sequence like ListBuffer[Op].
Basically from a linked-list background I would just do some pointer-bending and after a total of four operations I am done with the replacement without creating new objects. That is why I am now concerned about performance. Especially since this is a performance-critical operation for me.
Question:
How would you implement the replace() method in a Scala fashion and what kind of data structure would you use keeping in mind that this is a performance-critical operation?
I am happy with answers that point me in the right direction or pseudo code. No need to write a full replace method.
Thank you.
Ok, some considerations to be made. First, recall that, on lists, tail does not create objects, and prepending (::) only creates one object for each prepended element. That's pretty much as good as you can get, generally speaking.
One way of doing this would be this:
def myReplace(input: List[Op], pattern: List[Op], replacement: List[Op]) = {
// This function should be part of an KMP algorithm instead, for performance
def compare(pattern: List[Op], list: List[Op]): Boolean = (pattern, list) match {
case (x :: xs, y :: ys) if x == y => compare(xs, ys)
case (Nil, Nil) => true
case _ => false
}
var processed: List[Op] = Nil
var unprocessed: List[Op] = input
val patternLength = pattern.length
val reversedReplacement = replacement.reverse
// Do this until we finish processing the whole sequence
while (unprocessed.nonEmpty) {
// This inside algorithm would be better if replaced by KMP
// Quickly process non-matching sequences
while (unprocessed.nonEmpty && unprocessed.head != pattern.head) {
processed ::= unprocessed.head
unprocessed = unprocessed.tail
}
if (unprocessed.nonEmpty) {
if (compare(pattern, unprocessed)) {
processed :::= reversedReplacement
unprocessed = unprocessed drop patternLength
} else {
processed ::= unprocessed.head
unprocessed = unprocessed.tail
}
}
}
processed.reverse
}
You may gain speed by using KMP, particularly if the pattern searched for is long.
Now, what is the problem with this algorithm? The problem is that it won't test if the replaced pattern causes a match before that position. For instance, if I replace ACB with C, and I have an input AACBB, then the result of this algorithm will be ACB instead of C.
To avoid this problem, you should create a backtrack. First, you check at which position in your pattern the replacement may happen:
val positionOfReplacement = pattern.indexOfSlice(replacement)
Then, you modify the replacement part of the algorithm this:
if (compare(pattern, unprocessed)) {
if (positionOfReplacement > 0) {
unprocessed :::= replacement
unprocessed :::= processed take positionOfReplacement
processed = processed drop positionOfReplacement
} else {
processed :::= reversedReplacement
unprocessed = unprocessed drop patternLength
}
} else {
This will backtrack enough to solve the problem.
This algorithm won't deal efficiently, however, with multiply patterns at the same time, which I guess is where you are going. For that, you'll probably need some adaptation of KMP, to do it efficiently, or, otherwise, use a DFA to control possible matchings. It gets even worse if you want to match both AB and ABC.
In practice, the full blow problem is equivalent to regex match & replace, where the replace is a function of the match. Which means, of course, you may want to start looking into regex algorithms.
EDIT
I was forgetting to complete my reasoning. If that technique doesn't work for some reason, then my advice is going with an immutable tree-based vector. Tree-based vectors enable replacement of partial sequences with low amount of copying.
And if that doesn't do, then the solution is doubly linked lists. And pick one from a library with slice replacement -- otherwise you may end up spending way too much time debugging a known but tricky algorithm.

Resources