Linq query recursion - linq

I work in C# and Entity framework.
I have a table in my database named Genre. Here are its attributes:
idGenre, name, idParentGenre.
For example. the values would be:
(idGenre = 1, name = "acoustic", idParentGenre=2)
(idGenre = 2, name = "rock", idParentGenre=2)
(idGenre = 3, name = "country", idParentGenre=4)
(idGenre = 4, name = "folk", idParentGenre=5)
(idGenre = 5, name = "someOtherGenre", idParentGenre=5)
As you can see, it's kind of a tree.
Now, I have a method for searching through this table. The input parameter is idGenre and idParentGenre. I should return if the genre (idGenre) is a son/grandchild/grandgrandchild/... of idParentGenre.
For example, I get idGenre=3, idParentGenre=5, I should return true.
However, there isn't recursion in Linq. Is there a way I can do this?

I would make a method to handle this instead of using LINQ:
bool HasParent(int genre, int parent)
{
Genre item = db.Genres.FirstOrDefault(g => g.IdGenre == genre);
if (item == null)
return false;
// If there is no parent, return false,
// this is assuming it's defined as int?
if (!item.idParentGenre.HasValue)
return false;
if (item.idParentGenre.Value == parent)
return true;
return HasParent(item.idParentGenre, parent);
}
This lets you handle this in a single recursive function.

It looks like you're trying to implement a tree without using a tree.
Have you considered...using a tree? Here's a great question and some answers you could build off (including one with code:
delegate void TreeVisitor<T>(T nodeData);
class NTree<T>
{
T data;
LinkedList<NTree<T>> children;
public NTree(T data)
{
this.data = data;
children = new LinkedList<NTree<T>>();
}
public void addChild(T data)
{
children.AddFirst(new NTree<T>(data));
}
public NTree<T> getChild(int i)
{
foreach (NTree<T> n in children)
if (--i == 0) return n;
return null;
}
public void traverse(NTree<T> node, TreeVisitor<T> visitor)
{
visitor(node.data);
foreach (NTree<T> kid in node.children)
traverse(kid, visitor);
}
}

Bring the table of genres in the memory (it cannot be that big), and recursively traverse it to create a mapping between an idGenre and a transitive closure of its descendents, like this:
1: {1, 2}
2: {2}
3: {3, 4, 5}
4: {4, 5}
5: {5}
The data above stays only in memory. You recompute it every time on start-up, and on updates to the genres table.
When it is time to query for all songs in a specific genre, use the pre-calculated table in an idGenre in ... query, like this:
IEnumerable<Song> SongsWithGenreId(int idGenre) {
var idClosure = idToIdClosure[idGenre];
return context.Songs.Where(song => idClosure.Contains(song.idGenre));
}

Related

Linq join two lists: is it more efficient to use Dictionary?

Final rephrase
Below I join two sequences and I wondered if it would be faster to create a Dictionary of one sequence with the keySelector of the join as key and iterate through the other collection and find the key in the dictionary.
This only works if the key selector is unique. A real join has no problem with two records having the same key. In a dictionary you'll have to have unique keys
I measured the difference, and I noticed that the dictionary method is about 13% faster. In most use cases ignorable. See my answer to this question
Rephrased question
Some suggested that this question is the same question as LINQ - Using where or join - Performance difference?, but this one is not about using where or join, but about using a Dictionary to perform the join.
My question is: if I want to join two sequences based on a key selector, which method would be faster?
Put all items of one sequence in a Dictionary and enumerate the other sequence to see if the item is in the Dictionary. This would mean to iterate through both sequences once and calculate hash codes on the keySelector for every item in both sequences once.
The other method: use System.Enumerable.Join.
The question is: Would Enumerable.Join for each element in the first list iterate through the elements in the second list to find a match according to the key selector, having to compare N * N elements (is this called second order?) or would it use a more advanced method?
Original question with examples
I have two classes, both with a property Reference. I have two sequences of these classes and I want to join them based on equal Reference.
Class ClassA
{
public string Reference {get;}
...
}
public ClassB
{
public string Reference {get;}
...
}
var listA = new List<ClassA>()
{
new ClassA() {Reference = 1, ...},
new ClassA() {Reference = 2, ...},
new ClassA() {Reference = 3, ...},
new ClassA() {Reference = 4, ...},
}
var listB = new List<ClassB>()
{
new ClassB() {Reference = 1, ...},
new ClassB() {Reference = 3, ...},
new ClassB() {Reference = 5, ...},
new ClassB() {Reference = 7, ...},
}
After the join I want combinations of ClassA objects and ClassB objects that have an equal Reference. This is quite simple to do:
var myJoin = listA.Join(listB, // join listA and listB
a => a.Reference, // from listA take Reference
b => b.Reference, // from listB take Reference
(objectA, objectB) => // if references equal
new {A = objectA, B = objectB}); // return combination
I'm not sure how this works, but I can imagine that for each a in listA the listB is iterated to see if there is a b in listB with the same reference as A.
Question: if I know that the references are Distinct wouldn't it be more efficient to convert B into a Dictionary and compare the Reference for each element in listA:
var dictB = listB.ToDictionary<string, ClassB>()
var myJoin = listA
.Where(a => dictB.ContainsKey(a.Reference))
.Select(a => new (A = a, B = dictB[a.Reference]);
This way, every element of listB has to be accessed once to put in the dictionary and every element of listA has to be accessed once, and the hascode of Reference has to be calculated once.
Would this method be faster for large collections?
I created a test program for this and measured the time it took.
Suppose I have a class of Person, each person has a name and a Father property which is of type Person. If the Father is not know, the Father property is null
I have a sequence of Bastards (no father) that have exactly one Son and One Daughter. All Daughters are put in one sequence. All sons are put in another sequences.
The query: join the sons and the daughters that have the same father.
Results: Joining 1 million families using Enumerable.Join took 1.169 sec. Joining them using Dictionary join used 1.024 sec. Ever so slightly faster.
The code:
class Person : IEquatable<Person>
{
public string Name { get; set; }
public Person Father { get; set; }
// + a lot of equality functions get hash code etc
// for those interested: see the bottom
}
const int nrOfBastards = 1000000; // one million
var bastards = Enumerable.Range (0, nrOfBastards)
.Select(i => new Person()
{ Name = 'B' + i.ToString(), Father = null })
.ToList();
var sons = bastards.Select(father => new Person()
{Name = "Son of " + father.Name, Father = father})
.ToList();
var daughters = bastards.Select(father => new Person()
{Name = "Daughter of " + father.Name, Father = father})
.ToList();
// join on same parent: Traditionally and using Dictionary
var stopwatch = Stopwatch.StartNew();
this.TraditionalJoin(sons, daughters);
var time = stopwatch.Elapsed;
Console.WriteLine("Traditional join of {0} sons and daughters took {1:F3} sec", nrOfBastards, time.TotalSeconds);
stopwatch.Restart();
this.DictionaryJoin(sons, daughters);
time = stopwatch.Elapsed;
Console.WriteLine("Dictionary join of {0} sons and daughters took {1:F3} sec", nrOfBastards, time.TotalSeconds);
}
private void TraditionalJoin(IEnumerable<Person> boys, IEnumerable<Person> girls)
{ // join on same parent
var family = boys
.Join(girls,
boy => boy.Father,
girl => girl.Father,
(boy, girl) => new { Son = boy.Name, Daughter = girl.Name })
.ToList();
}
private void DictionaryJoin(IEnumerable<Person> sons, IEnumerable<Person> daughters)
{
var sonsDictionary = sons.ToDictionary(son => son.Father);
var family = daughters
.Where(daughter => sonsDictionary.ContainsKey(daughter.Father))
.Select(daughter => new { Son = sonsDictionary[daughter.Father], Daughter = daughter })
.ToList();
}
For those interested in the equality of Persons, needed for a proper dictionary:
class Person : IEquatable<Person>
{
public string Name { get; set; }
public Person Father { get; set; }
public bool Equals(Person other)
{
if (other == null)
return false;
else if (Object.ReferenceEquals(this, other))
return true;
else if (this.GetType() != other.GetType())
return false;
else
return String.Equals(this.Name, other.Name, StringComparison.OrdinalIgnoreCase);
}
public override bool Equals(object obj)
{
return this.Equals(obj as Person);
}
public override int GetHashCode()
{
const int prime1 = 899811277;
const int prime2 = 472883293;
int hash = prime1;
unchecked
{
hash = hash * prime2 + this.Name.GetHashCode();
if (this.Father != null)
{
hash = hash * prime2 + this.Father.GetHashCode();
}
}
return hash;
}
public override string ToString()
{
return this.Name;
}
public static bool operator==(Person x, Person y)
{
if (Object.ReferenceEquals(x, null))
return Object.ReferenceEquals(y, null);
else
return x.Equals(y);
}
public static bool operator!=(Person x, Person y)
{
return !(x==y);
}
}

How to use LINQ to find all items in list which have the most members in another list?

Given:
class Item {
public int[] SomeMembers { get; set; }
}
var items = new []
{
new Item { SomeMembers = new [] { 1, 2 } }, //0
new Item { SomeMembers = new [] { 1, 2 } }, //1
new Item { SomeMembers = new [] { 1 } } //2
}
var secondList = new int[] { 1, 2, 3 };
I need to find all the Items in items with the most of it's SomeMembers occurring in secondList.
In the example above I would expect Items 0 and 1 to be returned but not 2.
I know I could do it with things like loops or Contains() but it seems there must be a more elegant or efficient way?
This can be written pretty easily:
var result = items.Where(item => item.SomeMembers.Count(secondList.Contains) * 2
>= item.SomeMembers.Length);
Or possibly (I can never guess whether method group conversions will work):
var result = items.Where(item => item.SomeMembers.Count(x => secondList.Contains(x)) * 2
>= item.SomeMembers.Length);
Or to pull it out:
Func<int, bool> inSecondList = secondList.Contains;
var result = items.Where(item => item.SomeMembers.Count(inSecondList) * 2
>= item.SomeMembers.Length);
If secondList becomes large, you should consider using a HashSet<int> instead.
EDIT: To avoid evaluating SomeMembers twice, you could create an extension method:
public static bool MajoritySatisfied<T>(this IEnumerable<T> source,
Func<T, bool> condition)
{
int total = 0, satisfied = 0;
foreach (T item in source)
{
total++;
if (condition(item))
{
satisfied++;
}
}
return satisfied * 2 >= total;
}
Then:
var result = items.Where(item => item.MajoritySatisfied(secondList.Contains));

Algorithm for unique combinations

I've been trying to find a way to get a list of unique combinations from a list of objects nested in a container. Objects within the same group cannot be combined. Objects will be unique across all the groups
Example:
Group 1: (1,2)
Group 2: (3,4)
Result
1
2
3
4
1,3
1,4
2,3
2,4
If we add another group like so:
Group 1: (1,2)
Group 2: (3,4)
Group 3: (5,6,7)
The result would be
1
2
3
4
5
6
7
1,3
1,4
1,5
1,6
1,7
2,3
2,4
2,5
2,6
2,7
3,5
3,6
3,7
4,5
4,6
4,7
1,3,5
1,3,6
1,3,7
1,4,5
1,4,6
1,4,7
2,3,5
2,3,6
2,3,7
2,4,5
2,4,6
2,4,7
I may have missed a combination above, but the combinations mentioned should be enough indication.
I have a possibility of having up 7 groups, and 20 groups in each object.
I'm trying to avoid having code that knows that it's doing combinations of doubles, triples, quadruples etc, but I'm hitting a lot of logic bumps along the way.
To be clear, I'm not asking for code, and more for an approach, pseudo code or an indication would do great.
UPDATE
Here's what I have after seeing those two answers.
From #Servy's answer:
public static IEnumerable<IEnumerable<T>> GetCombinations<T>(this IEnumerable<IEnumerable<T>> sequences)
{
var defaultArray = new[] { default(T) };
return sequences.Select(sequence =>
sequence.Select(item => item).Concat(defaultArray))
.CartesianProduct()
.Select(sequence =>
sequence.Where(item => !item.Equals(default(T)))
.Select(item => item));
}
public static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] { item })
);
}
From #AK_'s answer
public static IEnumerable<IEnumerable<T>> GetCombinations<T>(this IEnumerable<IEnumerable<T>> groups)
{
if (groups.Count() == 0)
{
yield return new T[0];
}
if (groups.Count() == 1)
{
foreach (var t in groups.First())
{
yield return new T[] { t };
}
}
else
{
var furtherResult = GetCombinations(groups.Where(x => x != groups.Last()));
foreach (var result in furtherResult)
{
yield return result;
}
foreach (var t in groups.Last())
{
yield return new T[] { t };
foreach (var result in furtherResult)
{
yield return result.Concat(new T[] { t });
}
}
}
}
Usage for both
List<List<int>> groups = new List<List<int>>();
groups.Add(new List<int>() { 1, 2 });
groups.Add(new List<int>() { 3, 4, 5 });
groups.Add(new List<int>() { 6, 7 });
groups.Add(new List<int>() { 8, 9 });
groups.Add(new List<int>() { 10, 11 });
var x = groups.GetCombinations().Where(g => g.Count() > 0).ToList().OrderBy(y => y.Count());
What would be considered the best solution? To be honest, I am able to read what's happening with #AK_'s solution much easier (had to look for a solution on how to get Cartesian Product).
So first off consider the problem of a Cartesian Product of N sequences. That is, every single combination of one value from each of the sequences. Here is a example of an implementation of that problem, with an amazing explanation.
But how do we handle the cases where the output combination has a size smaller than the number of sequences? Alone that only handles the case where the given sequences are the same size as the number of sequences. Well, imagine for a second that every single input sequence has a "null" value. That null value gets paired with every single combination of values from the other sequences (including all of their null values). We can then remove these null values at the very end, and voila, we have every combination of every size.
To do this, while still allowing the input sequences to actually use the C# literal null values, or the default value for that type (if it's not nullable) we'll need to wrap the type. We'll create a wrapper that wraps the real value, while also having it's own definition of a def ult/null value. From there we map each of our sequences into a sequence of wrappers, append the actual default value onto the end, compute the Cartesian Product, and then map the combinations back to "real" values, filtering out the default values while we're at it.
If you don't want to see the actual code, stop reading here.
public class Wrapper<T>
{
public Wrapper(T value) { Value = value; }
public static Wrapper<T> Default = new Wrapper<T>(default(T));
public T Value { get; private set; }
}
public static IEnumerable<IEnumerable<T>> Foo<T>
(this IEnumerable<IEnumerable<T>> sequences)
{
return sequences.Select(sequence =>
sequence.Select(item => new Wrapper<T>(item))
.Concat(new[] { Wrapper<T>.Default }))
.CartesianProduct()
.Select(sequence =>
sequence.Where(wrapper => wrapper != Wrapper<T>.Default)
.Select(wrapper => wrapper.Value));
}
In C#
this is actually a monad... I think...
IEnumerable<IEnumerable<int>> foo (IEnumerable<IEnumerable<int>> groups)
{
if (groups.Count == 0)
{
return new List<List<int>>();
}
if (groups.Count == 1)
{
foreach(van num in groups.First())
{
return yield new List<int>(){num};
}
}
else
{
var furtherResult = foo(groups.Where(x=> x != groups.First()));
foreach (var result in furtherResult)
{
yield return result;
}
foreach(van num in groups.First())
{
yield return new List<int>(){num};
foreach (var result in furtherResult)
{
yield return result.Concat(num);
}
}
}
}
a better version:
public static IEnumerable<IEnumerable<T>> foo<T> (IEnumerable<IEnumerable<T>> groups)
{
if (groups.Count() == 0)
{
return new List<List<T>>();
}
else
{
var firstGroup = groups.First();
var furtherResult = foo(groups.Skip(1));
IEnumerable<IEnumerable<T>> myResult = from x in firstGroup
select new [] {x};
myResult = myResult.Concat( from x in firstGroup
from result in furtherResult
select result.Concat(new T[]{x}));
myResult = myResult.Concat(furtherResult);
return myResult;
}
}

Linq - 'Saving' OrderBy operation (c#)

Assume I have generic list L of some type in c#. Then, using linq, call OrderBy() on it, passing in a lambda expression.
If I then re-assign the L, the previous order operation will obviously be lost.
Is there any way I can 'save' the lambda expression I used on the list before i reassigned it, and re-apply it?
Use a Func delegate to store your ordering then pass that to the OrderBy method:
Func<int, int> orderFunc = i => i; // func for ordering
var list = Enumerable.Range(1,10).OrderByDescending(i => i); // 10, 9 ... 1
var newList = list.OrderBy(orderFunc); // 1, 2 ... 10
As another example consider a Person class:
public class Person
{
public int Id { get; set; }
public string Name { get; set; }
}
Now you want to preserve a sort order that sorts by the Name property. In this case the Func operates on a Person type (T) and the TResult will be a string since Name is a string and is what you are sorting by.
Func<Person, string> nameOrder = p => p.Name;
var list = new List<Person>
{
new Person { Id = 1, Name = "ABC" },
new Person { Id = 2, Name = "DEF" },
new Person { Id = 3, Name = "GHI" },
};
// descending order by name
foreach (var p in list.OrderByDescending(nameOrder))
Console.WriteLine(p.Id + ":" + p.Name);
// 3:GHI
// 2:DEF
// 1:ABC
// re-assinging the list
list = new List<Person>
{
new Person { Id = 23, Name = "Foo" },
new Person { Id = 14, Name = "Buzz" },
new Person { Id = 50, Name = "Bar" },
};
// reusing the order function (ascending by name in this case)
foreach (var p in list.OrderBy(nameOrder))
Console.WriteLine(p.Id + ":" + p.Name);
// 50:Bar
// 14:Buzz
// 23:Foo
EDIT: be sure to add ToList() after the OrderBy calls if you need a List<T> since the LINQ methods will return an IEnumerable<T>.
Calling ToList() or ToArray() on your IEnumerable<T> will cause it to be immediately evaluated. You can then assign the resulting list or array to "save" your ordered list.

What does ExpressionVisitor.Visit<T> Do?

Before someone shouts out the answer, please read the question through.
What is the purpose of the method in .NET 4.0's ExpressionVisitor:
public static ReadOnlyCollection<T> Visit<T>(ReadOnlyCollection<T> nodes, Func<T, T> elementVisitor)
My first guess as to the purpose of this method was that it would visit each node in each tree specified by the nodes parameter and rewrite the tree using the result of the elementVisitor function.
This does not appear to be the case. Actually this method appears to do a little more than nothing, unless I'm missing something here, which I strongly suspect I am...
I tried to use this method in my code and when things didn't work out as expected, I reflectored the method and found:
public static ReadOnlyCollection<T> Visit<T>(ReadOnlyCollection<T> nodes, Func<T, T> elementVisitor)
{
T[] list = null;
int index = 0;
int count = nodes.Count;
while (index < count)
{
T objA = elementVisitor(nodes[index]);
if (list != null)
{
list[index] = objA;
}
else if (!object.ReferenceEquals(objA, nodes[index]))
{
list = new T[count];
for (int i = 0; i < index; i++)
{
list[i] = nodes[i];
}
list[index] = objA;
}
index++;
}
if (list == null)
{
return nodes;
}
return new TrueReadOnlyCollection<T>(list);
}
So where would someone actually go about using this method? What am I missing here?
Thanks.
It looks to me like a convenience method to apply an aribitrary transform function to an expression tree, and return the resulting transformed tree, or the original tree if there is no change.
I can't see how this is any different of a pattern that a standard expression visitor, other than except for using a visitor type, it uses a function.
As for usage:
Expression<Func<int, int, int>> addLambdaExpression= (a, b) => a + b;
// Change add to subtract
Func<Expression, Expression> changeToSubtract = e =>
{
if (e is BinaryExpression)
{
return Expression.Subtract((e as BinaryExpression).Left,
(e as BinaryExpression).Right);
}
else
{
return e;
}
};
var nodes = new Expression[] { addLambdaExpression.Body }.ToList().AsReadOnly();
var subtractExpression = ExpressionVisitor.Visit(nodes, changeToSubtract);
You don't explain how you expected it to behave and why therefore you think it does little more than nothing.

Resources