I query database for records in structure as follows
ID | Term | ParentID
In C# code I have following class
public class Tree
{
public string Id { get; set; }
public string Term { get; set; }
public string ParentId { get; set; }
public int Level { get; set; }
public IList<Tree> ChildItems { get; set; }
}
Query returns 5 000 000 records.
I need to build tree of Tree items and populate it.
First at all, I select all items where ParentID is null, and then for every element search parent (if parent doesn't exist I build parent of the parent and so on) and build tree using recursion.
I'm not happy with my algorithm because It takes more than 5 minutes.
Please, let me some advice how to do that, what to use and so on.
This is how the code is now implemented:
private string Handle2(List<Tree> originalTree)
{
IList<Tree> newTree = new List<TreeTest.Models.Tree>();
IList<Tree> treeWithoutParents = originalTree.Where(x => String.IsNullOrEmpty(x.ParentID)).OrderBy(x => x.Term).ToList();
foreach(Tree item in treeWithoutParents)
{
Tree newItem = new Tree { Id = item.ID, Term = item.Term, ParentId = item.ParentID, Level = 0 };
newTree.Add(newItem);
InsertChilds(newItem, originalTree, 0);
}
return "output";
}
private void InsertChilds(Tree item, List<Tree> origTree, int level)
{
++level;
IList<Tree> childItems = origTree.Where(x => x.ParentID == item.Id).ToList();
origTree.RemoveAll(x => x.ParentID == item.Id);
foreach (Tree i in childItems)
{
origTree.Remove(i);
}
foreach (Tree tItem in childItems)
{
if (item.ChildTree == null)
{
item.ChildTree = new List<TreeTest.Models.Tree>();
}
Tree itemToAdd = new Tree { Id = tItem.ID, Term = tItem.Term, ParentId = tItem.ParentID, Level = level };
this.InsertChilds(itemToAdd, origTree, level);
item.ChildTree.Add(itemToAdd);
}
}
Try using a map (C# Dictionary) of ID (string, although I'm curious why that isn't int) to node (Tree object) to store your tree nodes.
This would allow you to get the node corresponding to an ID with expected O(1) complexity, rather than your current O(n) complexity.
Beyond that, I suggest you rethink your approach a bit - try to write code which involves you only going through the input data once, just use a single Dictionary - if the parent doesn't exist yet, you could just create a filler-item for the parent, which has its members populated only when you get to that item.
I would use a dictionary (hash table) to make this faster. Here is my algorithm in pseudocode:
- create a dictionary mapping ID to IList<Tree> // mapping a node to its children
- create Queue<string,string> of IDs //item (3,5) in the queue corresponds to a node with ID=3 that has a parent with ID=5
- initialize the queue with all the codes with no parent
- List<Tree> withoutParent = dictionary[null]
- for each item in withoutParent:
- add (item.Id, null) to the queue
- while queue is not empty:
- (id,pid) = delete an item from the queue
- make a new node t
- t.Id = id
- t.parentId = pid
- t.ChildItems = dictionary[id]
- for each child in t.ChildItems:
- add (child.Id, id) to the queue
is the column ID a unique identifier. If it is then you can try the following. Instead of using a List, use a Set or a hashmap. This is because if a parent has too many child, lookup in a list can slow down your operations. If you use a Set, you can do a quick lookup and you can also do a quick addition of your elements.
Also, can you check how much time an order by clause will take . This might really help you speed up your process. If ID is a clustered index, you will get a fast sort by(as the data is already sorted) , else your query will still use the same index
When a parent does not exist , you are creating a parent of a parent. I would try to avoid that. What you can do is in case a child's parent does not exist in the tree, add it to a separate list. After you have gone through the original list , make a second pass to find orphaned elements. The advantage is that you do not need to resize your tree every time you create a parent of parent and then find out that the parent was just at the end of the list
Related
Hard question to understand perhaps, but let me explain. I have a List of Channel-objects, that all have a ChannelId property (int). I also have a different List (int) - SelectedChannelIds, that contains a subset of the ChannelId-s.
I want to select (through LINQ?) all the Channel-objects that has a ChannelId-property matching one in the second List.
in other words, I have the following structure:
public class Lists
{
public List<Channel> AllChannels = ChannelController.GetAllChannels();
public List<int> SelectedChannelIds = ChannelController.GetSelectedChannels();
public List<Channel> SelectedChannels; // = ?????
}
public class Channel
{
// ...
public int ChannelId { get; set; }
// ...
}
Any ideas on what that LINQ query would look like? Or is there a more effective way? I'm coding for the Windows Phone 7, fyi.
You can use List.Contains in a Where clause:
public Lists()
{
SelectedChannels = AllChannels
.Where(channel => SelectedChannelIds.Contains(channel.ChannelId))
.ToList();
}
Note that it would be more efficient if you used a HashSet<int> instead of a List<int> for the SelectedChannelIds. Changing to a HashSet will improve the performance from O(n2) to O(n), though if your list is always quite small this may not be a significant issue.
SelectedChannels = new List<Channel>(AllChannels.Where(c => SelectedChannelIds.Contains(c.ChannelId)));
So I have a collection of objects in one list, but each object in that list contains another list.
Consider the following:
class Parent
{
public Parent(string parentName)
{
this.ParentName = parentName;
}
public string ParentName { get; set; }
public List<Child> Children { get; set; }
}
class Child
{
public Child(string name)
{
this.ChildName = name;
}
public string ChildName { get; set; }
}
By the nature of the application, all Parent objects in the list of parents are unique. Multiple parents can contain the same child, and I need to get the parents that contain child x.
So, say the child with ChildName of "child1" belongs to both parents with ParentName of "parent1" and "parent5". If there are 100 parents in the collection, I want to get only the ones that have the Child with ChildName of "child1"
I would prefer to do this with a lambda expression but I'm not sure where to start as I don't really have to much experience using them. Is it possible, and if so, what is the correct approach?
If the Child class has defined an equality operation by implementing IEquatable<Child>, you can do this easily by using a lambda, the Enumerable.Where method of LINQ and the List.Contains method:
var parents = new List<Parent> { ... }; // fully populated list of parents
var child = null; // the child you are looking for goes here
var filtered = parents.Where(p => p.Children.Contains(child));
You can now iterate over filtered and perform your business logic.
If the Child class does not have an equality operation explicitly defined (which means that it will use reference equality rules instead of checking for identical ChildName), then you would need to include the "what passes for equal" check into the lambda yourself:
var filtered = parents.Where(p => p.Children.Any(c => c.ChildName == "child1"));
Note: There are of course many other ways to do the above, including the possibly easier to read
parents.Where(p => p.Children.Count(c => c.ChildName == "child1") > 0);
However, this is not as efficient as the Any version even though it will produce the same results.
In both cases, the lambdas very much "read like" what they are intended to do:
I want those parents where the Children collection contains this item
I want those parents where at least one of the Children has ChildName == "child1"
You can do it like this:
var result = parents.Where(p => p.Children.Any(c => c.ChildName == "child1"));
This would do it
IEnumerable<Parent> parentsWithChild1 = parents.Where(p => p.Children.Any(c => c.ChildName == "child1"));
The big picture:
I'm working on a search form where the user can choose one or more criteria to filter the search results. One of the criteria is related to a child relationship.
I'm trying to create an extention method to Iqueryable<Parent> so I can use as part of my "chaining".
The method signature (as of now) is:
public static IQueryable<Parent> ContainsChild(this IQueryable<Parent> qry, int[] childrenIDs)
The parent table and a child table:
Parent
ParentID
Name
Description
Child
ParentID (FK)
AnotherID (from a lookup table)
Selection criteria:
int[] ids = new int[3] {1,2,3};
Usage would be something like this:
var parents = repository.All() //returns Iqueryable<Parent>
public IQueryable<Parent> Search(Search seach){
if (search.Criteria1 != null){
parents = parents.FilterByFirstCriteria(search.Criteria1);
}
if (search.ChildrenIDs != null){ //ChildrenIDs is an int[] with values 1,2,3
parents = parents.ContainsChild(search.ChildrenIDs)
}
}
What I'm trying to figure out is how to create the ContainsChild method that returns an IQueryable<Parent> where the parents have at least one child with the AnotherID in the ids array.
(I'm trying to use EF4 to accomplish this)
Any help fully appreciated.
Perhaps this:
public static IQueryable<Parent> ContainsChild(this IQueryable<Parent> qry,
int[] childrenIDs)
{
return qry.Where(p => p.Children.Any(c => childrenIDs.Contains(c.AnotherID)));
}
Edit
Just for fun another way which should give the same result:
public static IQueryable<Parent> ContainsChild(this IQueryable<Parent> qry,
int[] childrenIDs)
{
return qry.Where(p => p.Children.Select(c => c.AnotherID)
.Intersect(childrenIDs).Any());
}
The generated SQL for the first version looks more friendly though, so I'd probably prefer the first version.
I have a piece of code which combines an in-memory list with some data held in a database. This works just fine in my unit tests (using a mocked Linq2SqlRepository which uses List).
public IRepository<OrderItem> orderItems { get; set; }
private List<OrderHeld> _releasedOrders = null;
private List<OrderHeld> releasedOrders
{
get
{
if (_releasedOrders == null)
{
_releasedOrders = new List<nOrderHeld>();
}
return _releasedOrders;
}
}
.....
public int GetReleasedCount(OrderItem orderItem)
{
int? total =
(
from item in orderItems.All
join releasedOrder in releasedOrders
on item.OrderID equals releasedOrder.OrderID
where item.ProductID == orderItem.ProductID
select new
{
item.Quantity,
}
).Sum(x => (int?)x.Quantity);
return total.HasValue ? total.Value : 0;
}
I am getting an error I don't really understand when I run it against a database.
Exception information:
Exception type: System.NotSupportedException
Exception message: Local sequence cannot be used in LINQ to SQL
implementation of query operators
except the Contains() operator.
What am I doing wrong?
I'm guessing it's to do with the fact that orderItems is on the database and releasedItems is in memory.
EDIT
I have changed my code based on the answers given (thanks all)
public int GetReleasedCount(OrderItem orderItem)
{
var releasedOrderIDs = releasedOrders.Select(x => x.OrderID);
int? total =
(
from item in orderItems.All
where releasedOrderIDs.Contains(item.OrderID)
&& item.ProductID == orderItem.ProductID
select new
{
item.Quantity,
}
).Sum(x => (int?)x.Quantity);
return total.HasValue ? total.Value : 0;
}
I'm guessing it's to do with the fact
that orderItems is on the database
and releasedItems is in memory.
You are correct, you can't join a table to a List using LINQ.
Take a look at this link:
http://flatlinerdoa.spaces.live.com/Blog/cns!17124D03A9A052B0!455.entry
He suggests using the Contains() method but you'll have to play around with it to see if it will work for your needs.
It looks like you need to formulate the db query first, because it can't create the correct SQL representation of the expression tree for objects that are in memory. It might be down to the join, so is it possible to get a value from the in-memory query that can be used as a simple primitive? For example using Contains() as the error suggests.
You unit tests work because your comparing a memory list to a memory list.
For memory list to database, you will either need to use the memoryVariable.Contains(...) or make the db call first and return a list(), so you can compare memory list to memory list as before. The 2nd option would return too much data, so your forced down the Contains() route.
public int GetReleasedCount(OrderItem orderItem)
{
int? total =
(
from item in orderItems.All
where item.ProductID == orderItem.ProductID
&& releasedOrders.Contains(item.OrderID)
select new
{
item.Quantity,
}
).Sum(x => (int?)x.Quantity);
return total.HasValue ? total.Value : 0;
}
Given a class:
class Control
{
public Control Parent { get; set; }
public List<Control> Children { get; set; }
}
and a list:
List<Control> myControls;
Is it possible to write a linq query that will select all children & grandchildren for a given control? For example if a tree looks like this:
GridA1
PanelA1
TextBoxA1
TextBoxA2
PanelA2
ListBoxA1
ListBoxA2
GridB1
PanelB1
TextBoxB1
I'd like a query that, given list myControls that contains all above controls with Parent and Children properties set as approriate can be parameterized with PanelA1 and return TextBoxA1, TextBoxA2, PanelA2, ListBoxA1 and ListBoxA2. Is there an efficient way to do this with linq? I'm selecting a tree structure out of a database and looking for a better way to pull apart subtrees than a recursive function.
It's hard to do this in a tremendously pretty way with LINQ, since lambda expressions can't be self-recursive before they're defined. A recursive function (perhaps using LINQ) is your best bet.
How I'd implement it:
public IEnumerable<Control> ChildrenOf(this IEnumerable<Control> controls)
{
return controls.SelectMany(c =>
new Control[] { c }.Concat(ChildrenOf(c.Children)));
}