Linq: Ignore non joinable items from two lists without throwing error? - linq

In the code below, the two lists are joined on Index. But either list could have more items than the other and i just want to join up to the list with the least items and throw the rest out from the other list. So, if list 1 has 5 items and list 2 has 7 items, I want to join both up to item 5, and ignore list 2's remaining items. (and vice versa)
var joinLbxs = lbxShtCols.Items
.Cast<ListItem>()
.Select((xlFldList, index) => new
{
xlFldList,
tblFldList = lbxSqlTablesCols.Items[index]
});

Zip is not too complicated to implement by yourself.
public static IEnumerable<TResult> Zip<TSource, TOther, TResult>(
this IEnumerable<TSource> source,
IEnumerable<TOther> other,
Func<TSource, TOther, TResult> resultSelector)
{
using (var e1 = source.GetEnumerator())
{
using (var e2 = other.GetEnumerator())
{
while (e1.MoveNext() && e2.MoveNext())
{
yield return resultSelector(e1.Current, e2.Current);
}
}
}
}

As #Steven suggested in a comment, if you are using .Net 4.0, use the Zip() method. If you don't, you could use MoreLinq to provide the same functionality.
Or you could do it yourself (assuming both lists are IList<T> and have fast indexer):
from i in Enumerable.Range(0, new[] { list1.Count, list2.Count }.Min())
select new
{
item1 = list1[i],
item2 = list2[i]
}

Try intersecting the 2 lists; that'll give you the common items. Take() the number smallest of the 2 lists that you want. It's not clear whether you know which list would have the smallest (by convention), so decide that beforehand. Optionally sort the list if you need BEFORE the Take().
int numToTake = (lbxShtCols.Count >= lbxSqlTablesCols.Count)
?lbxShtCols.Count
:lbxSqlTablesCols.Count;
var commons = lbxShtCols.Intersect(lbxSqlTablesCols)
.Take(numToTake);

Related

LINQ for nested for each

Can you please give me a solution for the method as Linq?
I need Linq for the below method:
private List<Model> ConvertMethod(List<Model> List1, List<Model> List2)
{
foreach (var Firstitem in List1)
{
foreach (var Seconditem in List2)
{
if (Firstitem.InnerText.Trim() == Seconditem.InnerText.Trim())
{
Seconditem.A= Firstitem.A;
Seconditem.B= Firstitem.B;
Seconditem.C= Firstitem.C;
Seconditem.D= Firstitem.D;
Seconditem.E= Firstitem.E;
Seconditem.F= Firstitem.F;
}
}
}
return List2;
}
Your task is to assign values, so modify objects. That's not the purpose of LINQ which is to query datasources. So you could use LINQ to build a query that returns all items that need to be updated. Then you can use a foreach to assign the values(as you did):
var sameItems = from l1 in List1 join l2 in List2
on l1.InnerText.Trim() equals l2.InnerText.Trim()
select new { l1, l2 };
foreach(var itemsToUpdate in sameItems)
{
itemsToUpdate.l2.A = itemsToUpdate.l1.A;
// ...
}
It helps if you think about what this code is supposed to do - update records in the second list with the values of matching records from the first list.
There are various ways you can do that. One option is to replace each foreach with from and filter the rows:
var matches = from var Firstitem in List1
from var Seconditem in List2
where Firstitem.InnerText.Trim() == Seconditem.InnerText.Trim()
select (Firstitem,Seconditem);
foreach(var (Firstitem,Seconditem) in matches)
{
Seconditem.A= Firstitem.A;
Seconditem.B= Firstitem.B;
Seconditem.C= Firstitem.C;
Seconditem.D= Firstitem.D;
Seconditem.E= Firstitem.E;
Seconditem.F= Firstitem.F;
}
I'm "cheating" a bit here, using tuples and decomposition to reduce noise
Another option is to use join. In this case, the two options are identical :
var matches = from Firstitem in List1
join Seconditem in List2
on Firstitem.InnerText.Trim() equals Seconditem.InnerText.Trim()
select (Firstitem,Seconditem);
The rest of the code remains the same

how to convert forEach to lambda

Iterator<Rate> rateIt = rates.iterator();
int lastRateOBP = 0;
while (rateIt.hasNext())
{
Rate rate = rateIt.next();
int currentOBP = rate.getPersonCount();
if (currentOBP == lastRateOBP)
{
rateIt.remove();
continue;
}
lastRateOBP = currentOBP;
}
how can i use above code convert to lambda by stream of java 8? such as list.stream().filter().....but i need to operation list.
The simplest solution is
Set<Integer> seen = new HashSet<>();
rates.removeIf(rate -> !seen.add(rate.getPersonCount()));
it utilizes the fact that Set.add will return false if the value is already in the Set, i.e. has been already encountered. Since these are the elements you want to remove, all you have to do is negating it.
If keeping an arbitrary Rate instance for each group with the same person count is sufficient, there is no sorting needed for this solution.
Like with your original Iterator-based solution, it relies on the mutability of your original Collection.
If you really want distinct and sorted as you say in your comments, than it is as simple as :
TreeSet<Rate> sorted = rates.stream()
.collect(Collectors.toCollection(() ->
new TreeSet<>(Comparator.comparing(Rate::getPersonCount))));
But notice that in your example with an iterator you are not removing duplicates, but only duplicates that are continuous (I've exemplified that in the comment to your question).
EDIT
It seems that you want distinct by a Function; or in simpler words you want distinct elements by personCount, but in case of a clash you want to take the max pos.
Such a thing is not yet available in jdk. But it might be, see this.
Since you want them sorted and distinct by key, we can emulate that with:
Collection<Rate> sorted = rates.stream()
.collect(Collectors.toMap(Rate::getPersonCount,
Function.identity(),
(left, right) -> {
return left.getLos() > right.getLos() ? left : right;
},
TreeMap::new))
.values();
System.out.println(sorted);
On the other hand if you absolutely need to return a TreeSet to actually denote that this are unique elements and sorted:
TreeSet<Rate> sorted = rates.stream()
.collect(Collectors.collectingAndThen(
Collectors.toMap(Rate::getPersonCount,
Function.identity(),
(left, right) -> {
return left.getLos() > right.getLos() ? left : right;
},
TreeMap::new),
map -> {
TreeSet<Rate> set = new TreeSet<>(Comparator.comparing(Rate::getPersonCount));
set.addAll(map.values());
return set;
}));
This should work if your Rate type has natural ordering (i.e. implements Comparable):
List<Rate> l = rates.stream()
.distinct()
.sorted()
.collect(Collectors.toList());
If not, use a lambda as a custom comparator:
List<Rate> l = rates.stream()
.distinct()
.sorted( (r1,r2) -> ...some code to compare two rates... )
.collect(Collectors.toList());
It may be possible to remove the call to sorted if you just need to remove duplicates.

Most efficient way to determine if there are any differences between specific properties of 2 lists of items?

In C# .NET 4.0, I am struggling to come up with the most efficient way to determine if the contents of 2 lists of items contain any differences.
I don't need to know what the differences are, just true/false whether the lists are different based on my criteria.
The 2 lists I am trying to compare contain FileInfo objects, and I want to compare only the FileInfo.Name and FileInfo.LastWriteTimeUtc properties of each item. All the FileInfo items are for files located in the same directory, so the FileInfo.Name values will be unique.
To summarize, I am looking for a single Boolean result for the following criteria:
Does ListA contain any items with FileInfo.Name not in ListB?
Does ListB contain any items with FileInfo.Name not in ListA?
For items with the same FileInfo.Name in both lists, are the FileInfo.LastWriteTimeUtc values different?
Thank you,
Kyle
I would use a custom IEqualityComparer<FileInfo> for this task:
public class FileNameAndLastWriteTimeUtcComparer : IEqualityComparer<FileInfo>
{
public bool Equals(FileInfo x, FileInfo y)
{
if(Object.ReferenceEquals(x, y)) return true;
if (x == null || y == null) return false;
return x.FullName.Equals(y.FullName) && x.LastWriteTimeUtc.Equals(y.LastWriteTimeUtc);
}
public int GetHashCode(FileInfo fi)
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
hash = hash * 23 + fi.FullName.GetHashCode();
hash = hash * 23 + fi.LastWriteTimeUtc.GetHashCode();
return hash;
}
}
}
Now you can use a HashSet<FileInfo> with this comparer and HashSet<T>.SetEquals:
var comparer = new FileNameAndLastWriteTimeUtcComparer();
var uniqueFiles1 = new HashSet<FileInfo>(list1, comparer);
bool anyDifferences = !uniqueFiles1.SetEquals(list2);
Note that i've used FileInfo.FullName instead of Name since names aren't unqiue at all.
Sidenote: another advantage is that you can use this comparer for many LINQ methods like GroupBy, Except, Intersect or Distinct.
This is not the most efficient way (probably ranks a 4 out of 5 in the quick-and-dirty category):
var comparableListA = ListA.Select(a =>
new { Name = a.Name, LastWrite = a.LastWriteTimeUtc, Object = a});
var comparableListB = ListB.Select(b =>
new { Name = b.Name, LastWrite = b.LastWriteTimeUtc, Object = b});
var diffList = comparableListA.Except(comparableListB);
var youHaveDiff = diffList.Any();
Explanation:
Anonymous classes are compared by property values, which is what you're looking to do, which led to my thinking of doing a LINQ projection along those lines.
P.S.
You should double check the syntax, I just rattled this off without the compiler.

How to sort IEnumerable with limited result count? (another implementation of .OrderBy.Take)

I have a binary file which contains more than 100 millions of objects and I read the file using BinaryReader and return (Yield) the object (File reader and IEnumerable implementation is here: Performance comparison of IEnumerable and raising event for each item in source? )
One of object's properties indicates the object rank (like A5). Assume that I want to get sorted top n objects based on the property.
I saw the code for OrderBy function: it uses QuickSort algorithm. I tried to sort the IEnumerable result with OrderBy and Take(n) function together, but I got OutOfMemory exception, because OrderBy function creates an array with size of total objects count to implement Quicksort.
Actually, the total memory I need is n so there is no need to create a big array. For instance, if I get Take(1000) it will return only 1000 objects and it doesn't depend on the total count of whole objects.
How can I get the result of OrderBy function with Take function? In another word, I need a limited or blocked sorted list with the capacity which is defined by end-user.
If you want top N from ordered source with default LINQ operators, then only option is loading all items into memory, sorting them and selecting first N results:
items.Sort(condition).Take(N) // Out of memory
If you want to sort only top N items, then simply take items first, and sort them:
items.Take(N).Sort(condition)
UPDATE you can use buffer for keeping N max ordered items:
public static IEnumerable<T> TakeOrdered<T, TKey>(
this IEnumerable<T> source, int count, Func<T, TKey> keySelector)
{
Comparer<T, TKey> comparer = new Comparer<T,TKey>(keySelector);
List<T> buffer = new List<T>();
using (var iterator = source.GetEnumerator())
{
while (iterator.MoveNext())
{
T current = iterator.Current;
if (buffer.Count == count)
{
// check if current item is less than minimal buffered item
if (comparer.Compare(current, buffer[0]) <= 0)
continue;
buffer.Remove(buffer[0]); // remove minimual item
}
// find index of current item
int index = buffer.BinarySearch(current, comparer);
buffer.Insert(index >= 0 ? index : ~index, current);
}
}
return buffer;
}
This solution also uses custom comparer for items (to compare them by keys):
public class Comparer<T, TKey> : IComparer<T>
{
private readonly Func<T, TKey> _keySelector;
private readonly Comparer<TKey> _comparer = Comparer<TKey>.Default;
public Comparer(Func<T, TKey> keySelector)
{
_keySelector = keySelector;
}
public int Compare(T x, T y)
{
return _comparer.Compare(_keySelector(x), _keySelector(y));
}
}
Sample usage:
string[] items = { "b", "ab", "a", "abcd", "abc", "bcde", "b", "abc", "d" };
var top5byLength = items.TakeOrdered(5, s => s.Length);
var top3byValue = items.TakeOrdered(3, s => s);
LINQ does not have a built-in class that lets you take the top n elements without loading the whole collection into memory, but you can definitely build it yourself.
One simple approach would be using a SortedDictionary of lists: keep adding elements to it until you hit the limit of n. After that, check each element that you are about to add with the smallest element that you have found so far (i.e. dict.Keys.First()). If the new element is smaller, discard it; otherwise, remove the smallest element, and add a new one.
At the end of the loop your sorted dictionary will have at most n elements, and they would be sorted according to the comparator that you set on the dictionary.

LINQ OrderBy versus ThenBy

Can anyone explain what the difference is between:
tmp = invoices.InvoiceCollection
.OrderBy(sort1 => sort1.InvoiceOwner.LastName)
.OrderBy(sort2 => sort2.InvoiceOwner.FirstName)
.OrderBy(sort3 => sort3.InvoiceID);
and
tmp = invoices.InvoiceCollection
.OrderBy(sort1 => sort1.InvoiceOwner.LastName)
.ThenBy(sort2 => sort2.InvoiceOwner.FirstName)
.ThenBy(sort3 => sort3.InvoiceID);
Which is the correct approach if I wish to order by 3 items of data?
You should definitely use ThenBy rather than multiple OrderBy calls.
I would suggest this:
tmp = invoices.InvoiceCollection
.OrderBy(o => o.InvoiceOwner.LastName)
.ThenBy(o => o.InvoiceOwner.FirstName)
.ThenBy(o => o.InvoiceID);
Note how you can use the same name each time. This is also equivalent to:
tmp = from o in invoices.InvoiceCollection
orderby o.InvoiceOwner.LastName,
o.InvoiceOwner.FirstName,
o.InvoiceID
select o;
If you call OrderBy multiple times, it will effectively reorder the sequence completely three times... so the final call will effectively be the dominant one. You can (in LINQ to Objects) write
foo.OrderBy(x).OrderBy(y).OrderBy(z)
which would be equivalent to
foo.OrderBy(z).ThenBy(y).ThenBy(x)
as the sort order is stable, but you absolutely shouldn't:
It's hard to read
It doesn't perform well (because it reorders the whole sequence)
It may well not work in other providers (e.g. LINQ to SQL)
It's basically not how OrderBy was designed to be used.
The point of OrderBy is to provide the "most important" ordering projection; then use ThenBy (repeatedly) to specify secondary, tertiary etc ordering projections.
Effectively, think of it this way: OrderBy(...).ThenBy(...).ThenBy(...) allows you to build a single composite comparison for any two objects, and then sort the sequence once using that composite comparison. That's almost certainly what you want.
I found this distinction annoying in trying to build queries in a generic manner, so I made a little helper to produce OrderBy/ThenBy in the proper order, for as many sorts as you like.
public class EFSortHelper
{
public static EFSortHelper<TModel> Create<TModel>(IQueryable<T> query)
{
return new EFSortHelper<TModel>(query);
}
}
public class EFSortHelper<TModel> : EFSortHelper
{
protected IQueryable<TModel> unsorted;
protected IOrderedQueryable<TModel> sorted;
public EFSortHelper(IQueryable<TModel> unsorted)
{
this.unsorted = unsorted;
}
public void SortBy<TCol>(Expression<Func<TModel, TCol>> sort, bool isDesc = false)
{
if (sorted == null)
{
sorted = isDesc ? unsorted.OrderByDescending(sort) : unsorted.OrderBy(sort);
unsorted = null;
}
else
{
sorted = isDesc ? sorted.ThenByDescending(sort) : sorted.ThenBy(sort)
}
}
public IOrderedQueryable<TModel> Sorted
{
get
{
return sorted;
}
}
}
There are a lot of ways you might use this depending on your use case, but if you were for example passed a list of sort columns and directions as strings and bools, you could loop over them and use them in a switch like:
var query = db.People.AsNoTracking();
var sortHelper = EFSortHelper.Create(query);
foreach(var sort in sorts)
{
switch(sort.ColumnName)
{
case "Id":
sortHelper.SortBy(p => p.Id, sort.IsDesc);
break;
case "Name":
sortHelper.SortBy(p => p.Name, sort.IsDesc);
break;
// etc
}
}
var sortedQuery = sortHelper.Sorted;
The result in sortedQuery is sorted in the desired order, instead of resorting over and over as the other answer here cautions.
if you want to sort more than one field then go for ThenBy:
like this
list.OrderBy(personLast => person.LastName)
.ThenBy(personFirst => person.FirstName)
Yes, you should never use multiple OrderBy if you are playing with multiple keys.
ThenBy is safer bet since it will perform after OrderBy.

Resources