LINQ query optimization? - linq

I have a large unsorted list of items. Some items are important and need to listed first, followed by unimportant items. The items should be sorted by name in the two groups. I have a solution, but I believe it can be optimized. First, it gets a list of important items. Then a list of everything else, then concatenates the results. Any suggestions on how to optimize this?
Here is a simplified version of the problem for LINQPad:
var doc = XDocument.Parse(#"
<items>
<item id='a'>not important4</item>
<item id='b'>important2</item>
<item id='c'>not important2</item>
<item id='d'>not important3</item>
<item id='e'>important1</item>
<item id='f'>not important1</item>
</items>");
// identify which items are important
string[] importantItemIDs = new string[] { "b", "e" };
var items = doc.Root.Elements("item");
// get a list of important items (inner join)
var importantList = from itemID in importantItemIDs
from item in items
orderby (string) item.Value
where itemID == (string) item.Attribute("id")
select item;
// get items that are not important items
var notImportantList = items.Except(importantList).OrderBy(i => (string) i.Value);
// concatenate both sets of results into one list
var fullList = importantList.Concat(notImportantList);
fullList.Select(v => v.Value).Dump();
Here's the correct output:
important1
important2
not important1
not important2
not important3
not important4

One approach that immediately comes to mind is to utilize OrderBy as well as ThenBy to avoid querying the original data source multiple times. Something like:
var list = items
.OrderBy(i => importantItemIDs.Contains(i.Attribute("id") ? 0 : 1)
.ThenBy(i => i.Value);
.Select(i => i.Value);
I'm not sure if the ternary operator is necessary there - I forget how OrderBy deals with boolean results. Shouldn't be a major performance concern anyway and is potentially a bit clearer.

var prioritized =
from item in items
select new {
Importance = importantItemIDs.Contains((string) item.Attribute)? 1 :2,
Item = item
};
var fullList = from pitem in prioritized
orderby pitem.Importance, pitem.Item.Value
select pitem.Item.Value;

Related

Dart custom sorting of a list

I want to build a suggestion builder where I want to display search suggestions on changing the text in TextField. I want to search on the basis of contains method but I want to sort that particular list on the basis of startsWith, If I only use startsWith it neglects all other contains, How can I apply both simultaneously?
I have a List,
List<String> list = ["apple", "orange", "aaaaorange", "bbbborange","cccccorange"]
Now If I put only ora in search it's returning me in the following order,
aaaaorange
bbbborange
cccccorange
orange
What I want.
orange
aaaaorange
bbbborange
cccccorange
Code:
return list
.where((item) {
return item.toLowerCase().contains(query.toLowerCase());
}).toList(growable: false)
..sort((a, b) {
return a.toLowerCase().compareTo(b.toLowerCase());
});
It may be easiest to think of the two queries separately, and then combine the results:
var list = <String>[
'apple',
'orange',
'aaaaorange',
'bbbborange',
'cccccorange',
];
var pattern = 'ora';
var starts = list.where((s) => s.startsWith(pattern)).toList();
var contains = list
.where((s) => s.contains(pattern) && !s.startsWith(pattern))
.toList()
..sort((a, b) => a.toLowerCase().compareTo(b.toLowerCase()));
var combined = [...starts, ...contains];
print(combined);

LINQ: Improving performance of "query to find all dictionaries from list of dictionaries where given key has at least one value from list of values"

I tried searching for existing questions, but I could not find anything, so apologize if this is duplicate question.
I have following piece of code. This code runs in a loop for different values of key and listOfValues (listOfDict does not change and built only once, key and listOfValues vary for each iteration). This code currently works, but profiler shows that 50% of the execution time is spent in this LINQ query. Can I improve performance - using different LINQ construct perhaps?
// List of dictionary that allows multiple values against one key.
List<Dictionary<string, List<string>>> listOfDict = BuildListOfDict();
// Following code & LINQ query runs in a loop.
List<string> listOfValues = BuildListOfValues();
string key = GetKey();
// LINQ query to find all dictionaries from listOfDict
// where given key has at least one value from listOfValues.
List<Dictionary<string, List<string>>> result = listOfDict
.Where(dict => dict[key]
.Any(lhs => listOfValues.Any(rhs => lhs == rhs)))
.ToList();
Using HashSet will perform significantly better. You can create a HashSet<string> like so:
IEnumerable<string> strings = ...;
var hashSet = new HashSet<string>(strings);
I assume you can change your methods to return HashSets and make them run like this:
List<Dictionary<string, HashSet<string>>> listOfDict = BuildListOfDict();
HashSet<string> listOfValues = BuildListOfValues();
string key = GetKey();
List<Dictionary<string, HashSet<string>>> result = listOfDict
.Where(dict => listOfValues.Overlaps(dict[key]))
.ToList();
Here HashSet's instance method Overlaps is used. HashSet is optimized for set operations like this. In a test using one dictionary of 200 elements this runs in 3% of the time compared to your method.
UPDATED: Per #GertArnold, switched from Any/Contains to HashSet.Overlaps for slight performance improvement.
Depending on whether listOfValues or the average value for a key is longer, you can either convert listOfValues to a HashSet<string> or build your list of dictionaries to have a HashSet<string> for each value:
// optimize testing against listOfValues
var valHS = listOfValues.ToHashSet();
var result2 = listOfDict.Where(dict => valHS.Overlaps(dict[key]))
.ToList();
// change structure to optimize query
var listOfDict2 = listOfDict.Select(dict => dict.ToDictionary(kvp => kvp.Key, kvp => kvp.Value.ToHashSet())).ToList();
var result3 = listOfDict2.Where(dict => dict[key].Overlaps(listOfValues))
.ToList();
Note: if the query is repeated with differing listOfValues, it probably makes more sense to build the HashSet in the dictionaries once, rather than computing a HashSet from each listOfValues.
#LasseVågsætherKarlsen suggestion in comments to invert the structure intrigued me, so with a further refinement to handle the multiple keys, I created an index structure and tested lookups. With my Test Harness, this is about twice as fast as using a HashSet for one of the List<string>s and four times faster than the original method:
var listOfKeys = listOfDict.First().Select(d => d.Key);
var lookup = listOfKeys.ToDictionary(k => k, k => listOfDict.SelectMany(d => d[k].Select(v => (v, d))).ToLookup(vd => vd.v, vd => vd.d));
Now to filter for a particular key and list of values:
var result4 = listOfValues.SelectMany(v => lookup[key][v]).Distinct().ToList();

Take one and skip other duplicate item in a child table

I have a list of Items and every item have some list, Now I wants to select Distinct items of child. I have tried like below but it's not working.
var items = await _context.Items.
Include(i => i.Tags.Distinct()).
Include(i => i.Comments).
OrderBy(i => i.Title).ToListAsync();
//Tag items
TagId - tag
------------------
1 --- A
2 --- B
3 --- B
4 --- C
5 --- D
6 --- D
7 --- F
//Expected Result
Item.Tags -> [A,B,C,D,F]
how can I do this in EF Core? Thanks.
You can use the MoreLinq library to get DistinctBy or write your own using this post.
Then use this:
var items = await _context.Items.
Include(i => i.Tags).
Include(i => i.Comments).
OrderBy(i => i.Title).
DistinctBy(d => d.Tags.tag).
ToListAsync();
You want to get distinct records based on one column; so that should do it.
Apparently you have a table of Items, where every Item has zero or more Tags. Furthermore the Items have a property Comments, of which we do not know whether it is one string, or a collection of zero or more strings. Furthermore every Item has a Title.
Now you want all properties of Items, each with its Comments, and a list of unique Tags of the items. Ordered by Title
One of the slower parts of database queries is the transport of the selected data from the database management system to your local process. Hence it is wise to limit the amount of data to the minimum you are really using.
It seems that the Tags of the Items are in a separate table. Every Item has zero or more Tags, every Tag belongs to exactly one item. A simple one-to-many relation with a foreign key Tag.ItemId.
If Item with Id 300 has 1000 Tags, then you know that every one of these 1000 Tags has a foreign key ItemId of which you know that it has a value of 300. What a waste if you would transport all these foreign keys to your local process.
Whenever you query data to inspect it, Select only the properties
you really plan to use. Only use Include if you plan to update the
included item.
So your query will be:
var query = myDbContext.Items
.Where(item => ...) // only if you do not want all items
.OrderBy(item => item.Title) // if you Sort here and do not need the Title
// you don't have to Select it
.Select(item => new
{ // select only the properties you plan to use
Id = item.Id,
Title = item.Title,
Comments = item.Comments, // use this if there is only one item, otherwise
Comments = item.Comments // use this version if Item has zero or more Comments
.Where(comment => ...) // only if you do not want all comments
.Select(comment => new
{ // again, select only the Comments you plan to use
Id = comment.Id,
Text = comment.Text,
// no need for the foreign key, you already know the value:
// ItemId = comment.ItemId,
})
.ToList();
Tags = item.Tags.Select(tag => new
{ // Select only the properties you need
Id = tag.Id,
Type = tag.Type,
Text = tag.Text,
// No need for the foreign key, you already know the value
// ItemId = tag.ItemId,
})
.Distinct()
.ToList(),
});
var fetchedData = await query.ToListAsync();
I haven't tried it, but I'd say you put .Distinct() in the wrong place.
var items = await _context.Items
.Include(i => i.Tags)
.Include(i => i.Comments).
.OrderBy(i => i.Title)
.Select(i => { i.Tags = i.Tags.GroupBy(x => x.Tag).Select(x => x.First()); return i; })
.ToListAsync();

LINQ: single list which consists of multiple properties of same type of object collection

var userIds = books.Select( i => new{i.CreatedById,i.ModifiedById}).Distinct();
var lst = from i in userIds select i.CreatedById;
var lst1 = from i in userIds select i.ModifiedById;
var lstfinal = lst.Concat(lst1).Distinct();
any other way to get same result???
here: books is collection of book objects i.e. IEnumerable.
thanks
Alternative solution - SelectMany from array of required properties:
var lstfinal = books.SelectMany(b => new [] { b.CreatedById, b.ModifiedById })
.Distinct();
Or enumerating books collection twice. Union excludes duplicates, so you don't need to call Distinct (I'd go this way):
var lstfinal = books.Select(b => b.CreatedById)
.Union(books.Select(b => b.ModifiedById));

LINQ create multiple new entities using a single ID field

I have a newbie LINQ question. I need to create two objects of same type from a list of strings. I need to append a text 'Direct' & "Indirect' to the string and use them as ID to create the two unique objects.
var vStrings = new List { "Milk", "Eggs", "Cheese" };
var vProducts = (from s in vStrings
select new Product { ID = s + "-Direct" })
.Union(
from s in vStrings
select new Product { ID = s + "-InDirect" });
You can see in the example above, I am using a Union to create two different objects, Is there a better way to rewrite this LINQ query?
Thanks for your suggestions
If you ever needed more suffixes, this might be a better way:
var strings = new List<string> { "Milk", "Eggs", "Cheese" };
var suffixes = new List<string> {"-Direct", "-InDirect"};
var products = strings
.SelectMany(_ => suffixes, (x, y) => new Product() {ID = x + y});
And it would only iterate over the original set of strings once.
This way isn't much shorter but I think it would be a little better such as there is only one Concat instead of many Union:
var vProducts2 = (from s in vStrings
select s + "-Direct").Concat(
from s in vStrings
select s + "-InDirect");

Resources