LINQ Partition List into Lists of 8 members [duplicate] - linq

This question already has answers here:
Split List into Sublists with LINQ
(34 answers)
Closed 10 years ago.
How would one take a List (using LINQ) and break it into a List of Lists partitioning the original list on every 8th entry?
I imagine something like this would involve Skip and/or Take, but I'm still pretty new to LINQ.
Edit: Using C# / .Net 3.5
Edit2: This question is phrased differently than the other "duplicate" question. Although the problems are similar, the answers in this question are superior: Both the "accepted" answer is very solid (with the yield statement) as well as Jon Skeet's suggestion to use MoreLinq (which is not recommended in the "other" question.) Sometimes duplicates are good in that they force a re-examination of a problem.

Use the following extension method to break the input into subsets
public static class IEnumerableExtensions
{
public static IEnumerable<List<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
{
List<T> toReturn = new List<T>(max);
foreach(var item in source)
{
toReturn.Add(item);
if (toReturn.Count == max)
{
yield return toReturn;
toReturn = new List<T>(max);
}
}
if (toReturn.Any())
{
yield return toReturn;
}
}
}

We have just such a method in MoreLINQ as the Batch method:
// As IEnumerable<IEnumerable<T>>
var items = list.Batch(8);
or
// As IEnumerable<List<T>>
var items = list.Batch(8, seq => seq.ToList());

You're better off using a library like MoreLinq, but if you really had to do this using "plain LINQ", you can use GroupBy:
var sequence = new[] {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
var result = sequence.Select((x, i) => new {Group = i/8, Value = x})
.GroupBy(item => item.Group, g => g.Value)
.Select(g => g.Where(x => true));
// result is: { {1,2,3,4,5,6,7,8}, {9,10,11,12,13,14,15,16} }
Basically, we use the version of Select() that provides an index for the value being consumed, we divide the index by 8 to identify which group each value belongs to. Then we group the sequence by this grouping key. The last Select just reduces the IGrouping<> down to an IEnumerable<IEnumerable<T>> (and isn't strictly necessary since IGrouping is an IEnumerable).
It's easy enough to turn this into a reusable method by factoring our the constant 8 in the example, and replacing it with a specified parameter.
It's not necessarily the most elegant solution, and it is not longer a lazy, streaming solution ... but it does work.
You could also write your own extension method using iterator blocks (yield return) which could give you better performance and use less memory than GroupBy. This is what the Batch() method of MoreLinq does IIRC.

It's not at all what the original Linq designers had in mind, but check out this misuse of GroupBy:
public static IEnumerable<IEnumerable<T>> BatchBy<T>(this IEnumerable<T> items, int batchSize)
{
var count = 0;
return items.GroupBy(x => (count++ / batchSize)).ToList();
}
[TestMethod]
public void BatchBy_breaks_a_list_into_chunks()
{
var values = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var batches = values.BatchBy(3);
batches.Count().ShouldEqual(4);
batches.First().Count().ShouldEqual(3);
batches.Last().Count().ShouldEqual(1);
}
I think it wins the "golf" prize for this question. The ToList is very important since you want to make sure the grouping has actually been performed before you try doing anything with the output. If you remove the ToList, you will get some weird side effects.

Take won't be very efficient, because it doesn't remove the entries taken.
why not use a simple loop:
public IEnumerable<IList<T>> Partition<T>(this/* <-- see extension methods*/ IEnumerable<T> src,int num)
{
IEnumerator<T> enu=src.getEnumerator();
while(true)
{
List<T> result=new List<T>(num);
for(int i=0;i<num;i++)
{
if(!enu.MoveNext())
{
if(i>0)yield return result;
yield break;
}
result.Add(enu.Current);
}
yield return result;
}
}

from b in Enumerable.Range(0,8) select items.Where((x,i) => (i % 8) == b);

The simplest solution is given by Mel:
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items,
int partitionSize)
{
int i = 0;
return items.GroupBy(x => i++ / partitionSize).ToArray();
}
Concise but slower. The above method splits an IEnumerable into chunks of desired fixed size with total number of chunks being unimportant. To split an IEnumerable into N number of chunks of equal sizes or close to equal sizes, you could do:
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items,
int numOfParts)
{
int i = 0;
return items.GroupBy(x => i++ % numOfParts);
}
To speed up things, a straightforward approach would do:
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items,
int partitionSize)
{
if (partitionSize <= 0)
throw new ArgumentOutOfRangeException("partitionSize");
int innerListCounter = 0;
int numberOfPackets = 0;
foreach (var item in items)
{
innerListCounter++;
if (innerListCounter == partitionSize)
{
yield return items.Skip(numberOfPackets * partitionSize).Take(partitionSize);
innerListCounter = 0;
numberOfPackets++;
}
}
if (innerListCounter > 0)
yield return items.Skip(numberOfPackets * partitionSize);
}
This is faster than anything currently on planet now :) The equivalent methods for a Split operation here

Related

Algorithm for unique combinations

I've been trying to find a way to get a list of unique combinations from a list of objects nested in a container. Objects within the same group cannot be combined. Objects will be unique across all the groups
Example:
Group 1: (1,2)
Group 2: (3,4)
Result
1
2
3
4
1,3
1,4
2,3
2,4
If we add another group like so:
Group 1: (1,2)
Group 2: (3,4)
Group 3: (5,6,7)
The result would be
1
2
3
4
5
6
7
1,3
1,4
1,5
1,6
1,7
2,3
2,4
2,5
2,6
2,7
3,5
3,6
3,7
4,5
4,6
4,7
1,3,5
1,3,6
1,3,7
1,4,5
1,4,6
1,4,7
2,3,5
2,3,6
2,3,7
2,4,5
2,4,6
2,4,7
I may have missed a combination above, but the combinations mentioned should be enough indication.
I have a possibility of having up 7 groups, and 20 groups in each object.
I'm trying to avoid having code that knows that it's doing combinations of doubles, triples, quadruples etc, but I'm hitting a lot of logic bumps along the way.
To be clear, I'm not asking for code, and more for an approach, pseudo code or an indication would do great.
UPDATE
Here's what I have after seeing those two answers.
From #Servy's answer:
public static IEnumerable<IEnumerable<T>> GetCombinations<T>(this IEnumerable<IEnumerable<T>> sequences)
{
var defaultArray = new[] { default(T) };
return sequences.Select(sequence =>
sequence.Select(item => item).Concat(defaultArray))
.CartesianProduct()
.Select(sequence =>
sequence.Where(item => !item.Equals(default(T)))
.Select(item => item));
}
public static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] { item })
);
}
From #AK_'s answer
public static IEnumerable<IEnumerable<T>> GetCombinations<T>(this IEnumerable<IEnumerable<T>> groups)
{
if (groups.Count() == 0)
{
yield return new T[0];
}
if (groups.Count() == 1)
{
foreach (var t in groups.First())
{
yield return new T[] { t };
}
}
else
{
var furtherResult = GetCombinations(groups.Where(x => x != groups.Last()));
foreach (var result in furtherResult)
{
yield return result;
}
foreach (var t in groups.Last())
{
yield return new T[] { t };
foreach (var result in furtherResult)
{
yield return result.Concat(new T[] { t });
}
}
}
}
Usage for both
List<List<int>> groups = new List<List<int>>();
groups.Add(new List<int>() { 1, 2 });
groups.Add(new List<int>() { 3, 4, 5 });
groups.Add(new List<int>() { 6, 7 });
groups.Add(new List<int>() { 8, 9 });
groups.Add(new List<int>() { 10, 11 });
var x = groups.GetCombinations().Where(g => g.Count() > 0).ToList().OrderBy(y => y.Count());
What would be considered the best solution? To be honest, I am able to read what's happening with #AK_'s solution much easier (had to look for a solution on how to get Cartesian Product).
So first off consider the problem of a Cartesian Product of N sequences. That is, every single combination of one value from each of the sequences. Here is a example of an implementation of that problem, with an amazing explanation.
But how do we handle the cases where the output combination has a size smaller than the number of sequences? Alone that only handles the case where the given sequences are the same size as the number of sequences. Well, imagine for a second that every single input sequence has a "null" value. That null value gets paired with every single combination of values from the other sequences (including all of their null values). We can then remove these null values at the very end, and voila, we have every combination of every size.
To do this, while still allowing the input sequences to actually use the C# literal null values, or the default value for that type (if it's not nullable) we'll need to wrap the type. We'll create a wrapper that wraps the real value, while also having it's own definition of a def ult/null value. From there we map each of our sequences into a sequence of wrappers, append the actual default value onto the end, compute the Cartesian Product, and then map the combinations back to "real" values, filtering out the default values while we're at it.
If you don't want to see the actual code, stop reading here.
public class Wrapper<T>
{
public Wrapper(T value) { Value = value; }
public static Wrapper<T> Default = new Wrapper<T>(default(T));
public T Value { get; private set; }
}
public static IEnumerable<IEnumerable<T>> Foo<T>
(this IEnumerable<IEnumerable<T>> sequences)
{
return sequences.Select(sequence =>
sequence.Select(item => new Wrapper<T>(item))
.Concat(new[] { Wrapper<T>.Default }))
.CartesianProduct()
.Select(sequence =>
sequence.Where(wrapper => wrapper != Wrapper<T>.Default)
.Select(wrapper => wrapper.Value));
}
In C#
this is actually a monad... I think...
IEnumerable<IEnumerable<int>> foo (IEnumerable<IEnumerable<int>> groups)
{
if (groups.Count == 0)
{
return new List<List<int>>();
}
if (groups.Count == 1)
{
foreach(van num in groups.First())
{
return yield new List<int>(){num};
}
}
else
{
var furtherResult = foo(groups.Where(x=> x != groups.First()));
foreach (var result in furtherResult)
{
yield return result;
}
foreach(van num in groups.First())
{
yield return new List<int>(){num};
foreach (var result in furtherResult)
{
yield return result.Concat(num);
}
}
}
}
a better version:
public static IEnumerable<IEnumerable<T>> foo<T> (IEnumerable<IEnumerable<T>> groups)
{
if (groups.Count() == 0)
{
return new List<List<T>>();
}
else
{
var firstGroup = groups.First();
var furtherResult = foo(groups.Skip(1));
IEnumerable<IEnumerable<T>> myResult = from x in firstGroup
select new [] {x};
myResult = myResult.Concat( from x in firstGroup
from result in furtherResult
select result.Concat(new T[]{x}));
myResult = myResult.Concat(furtherResult);
return myResult;
}
}

Compare each string in datatable with that of list takes longer time.poor performance

I have a datatable of 200,000 rows and want to validate each row with that of list and return that string codesList..
It is taking very long time..I want to improve the performance.
for (int i = 0; i < dataTable.Rows.Count; i++)
{
bool isCodeValid = CheckIfValidCode(codevar, codesList,out CodesCount);
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
List<Codes> tempcodes= codesList.Where(code => code.StdCode.Equals(codevar)).ToList();
if (tempcodes.Count == 0)
{
RetVal = false;
for (int i = 0; i < dataTable.Rows.Count; i++)
{
bool isCodeValid = CheckIfValidCode(codevar, codesList,out CodesCount);
}
}
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
List<Codes> tempcodes= codesList.Where(code => code.StdCode.Equals(codevar)).ToList();
if (tempcodes.Count == 0)
{
RetVal = false;
}
else
{
RetVal=true;
}
return bRetVal;
}
codelist is a list which also contains 200000 records. Please suggest. I used findAll which takes same time and also used LINQ query which also takes same time.
A few optimizations come to mind:
You could start by removing the Tolist() altogether
replace the Count() with .Any(), which returns true if there are items in the result
It's probably also a lot faster when you replace the List with a HashSet<Codes> (this requires your Codes class to implement HashCode and Equals properly. Alternatively you could populate a HashSet<string> with the contents of Codes.StdCode
It looks like you're not using the out count at all. Removing it would make this method a lot faster. Computing a count requires you to check all codes.
You could also split the List into a Dictionary> which you populate with by taking the first character of the code. That would reduce the number of codes to check drastically, since you can exclude 95% of the codes by their first character.
Tell string.Equals to use a StringComparison of type Ordinal or OrdinalIgnoreCase to speed up the comparison.
It looks like you can stop processing a lot earlier as well, the use of .Any takes care of that in the second method. A similar construct can be used in the first, instead of using for and looping through each row, you could short-circuit after the first failure is found (unless this code is incomplete and you mark each row as invalid individually).
Something like:
private bool CheckIfValidCode(string codevar, List<Codes> codesList)
{
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
return codes.Contains(codevar);
// or: return codes.Any(c => string.Equals(codevar, c, StringComparison.Ordinal);
}
If you're adamant about the count:
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
count = codes.Count(codevar);
// or: count = codes.Count(c => string.Equals(codevar, c, StringComparison.Ordinal);
return count > 0;
}
You can optimize further by creating the HashSet outside of the call and re-use the instance:
InCallingCode
{
...
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
for (/*loop*/) {
bool isValid = CheckIfValidCode(codevar, codes, out int count)
}
....
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
count = codes.Count(codevar);
// or: count = codes.Count(c => string.Equals(codevar, c, StringComparison.Ordinal);
return count > 0;
}

Replacing a foreach with LINQ

I have some very simple code that I'm trying to get running marginally quicker (there are a lot of these small types of call dotted around the code which seems to be slowing things down) using LINQ instead of standard code.
The problem is this - I have a variable outside of the LINQ which the result of the LINQ query needs to add it.
The original code looks like this
double total = 0
foreach(Crop c in p.Crops)
{
if (c.CropType.Type == t.Type)
total += c.Area;
}
return total;
This method isn't slow until the loop starts getting large, then it slows on the phone. Can this sort of code be moved to a relatively quick and simple piece of LINQ?
Looks like you could use sum: (edit: my syntax was wrong)
total = (from c in p.Crops
where c.CropType.Type == t.Type
select c.Area).Sum();
Or in extension method format:
total = p.Crops.Where(c => c.CropType.Type == t.Type).Sum(c => c.area);
As to people saying LINQ won't perform better where is your evidence? (The below is based on post from Hanselman? I ran the following in linqpad: (You will need to download and reference nbuilder to get it to run)
void Main()
{
//Nbuilder is used to create a chunk of sample data
//http://nbuilder.org
var crops = Builder<Crop>.CreateListOfSize(1000000).Build();
var t = new Crop();
t.Type = Type.grain;
double total = 0;
var sw = new Stopwatch();
sw.Start();
foreach(Crop c in crops)
{
if (c.Type == t.Type)
total += c.area;
}
sw.Stop();
total.Dump("For Loop total:");
sw.ElapsedMilliseconds.Dump("For Loop Elapsed Time:");
sw.Restart();
var result = crops.Where(c => c.Type == t.Type).Sum(c => c.area);
sw.Stop();
result.Dump("LINQ total:");
sw.ElapsedMilliseconds.Dump("LINQ Elapsed Time:");
sw.Restart();
var result2 = (from c in crops
where c.Type == t.Type
select c.area).Sum();
result.Dump("LINQ (sugar syntax) total:");
sw.ElapsedMilliseconds.Dump("LINQ (sugar syntax) Elapsed Time:");
}
public enum Type
{
wheat,
grain,
corn,
maize,
cotton
}
public class Crop
{
public string Name { get; set; }
public Type Type { get; set; }
public double area;
}
The results come out favorably to LINQ:
For Loop total: 99999900000
For Loop Elapsed Time: 25
LINQ total: 99999900000
LINQ Elapsed Time: 17
LINQ (sugar syntax) total: 99999900000
LINQ (sugar syntax) Elapsed Time: 17
The main way to optimize this would be changing p, which may or may not be possible.
Assuming p is a P, and looks something like this:
internal sealed class P
{
private readonly List<Crop> mCrops = new List<Crop>();
public IEnumerable<Crop> Crops { get { return mCrops; } }
public void Add(Crop pCrop)
{
mCrops.Add(pCrop);
}
}
(If p is a .NET type like a List<Crop>, then you can create a class like this.)
You can optimize your loop by maintaining a dictionary:
internal sealed class P
{
private readonly List<Crop> mCrops = new List<Crop>();
private readonly Dictionary<Type, List<Crop>> mCropsByType
= new Dictionary<Type, List<Crop>>();
public IEnumerable<Crop> Crops { get { return mCrops; } }
public void Add(Crop pCrop)
{
if (!mCropsByType.ContainsKey(pCrop.CropType.Type))
mCropsByType.Add(pCrop.CropType.Type, new List<Crop>());
mCropsByType[pCrop.CropType.Type].Add(pCrop);
mCrops.Add(pCrop);
}
public IEnumerable<Crop> GetCropsByType(Type pType)
{
return mCropsByType.ContainsKey(pType)
? mCropsByType[pType]
: Enumerable.Empty<Crop>();
}
}
Your code then becomes something like:
double total = 0
foreach(Crop crop in p.GetCropsByType(t.Type))
total += crop.Area;
return total;
Another possibility that would be even faster is:
internal sealed class P
{
private readonly List<Crop> mCrops = new List<Crop>();
private double mTotalArea;
public IEnumerable<Crop> Crops { get { return mCrops; } }
public double TotalArea { get { return mTotalArea; } }
public void Add(Crop pCrop)
{
mCrops.Add(pCrop);
mTotalArea += pCrop.Area;
}
}
Your code would then simply access the TotalArea property and you wouldn't even need a loop:
return p.TotalArea;
You might also consider extracting the code that manages the Crops data to a separate class, depending on what P is.
This is a pretty straight forward sum, so I doubt you will see any benefit from using LINQ.
You haven't told us much about the setup here, but here's an idea. If p.Crops is large and only a small number of the items in the sequence are of the desired type, you could build another sequence that contains just the items you need.
I assume that you know the type when you insert into p.Crops. If that's the case you could easily insert the relevant items in another collection and use that instead for the sum loop. That will reduce N and get rid of the comparison. It will still be O(N) though.

Select single random item from list of items [duplicate]

This question already has answers here:
Random row from Linq to Sql
(14 answers)
Closed 4 years ago.
I have a query that returns list of results. I want a single random item from that list.
var query = (from...
where...
select q).Random(1); //Something like this?
How to do this? Suppose there is 1 item i want that one to be selected. If there are more then i want one of those to be selected in a random order.
var query = (from...
where...
orderby Guid.NewGuid()
select q).First();
You could utilise the Shuffle example posted in Randomize a List in C# in conjunction with the LINQ Take method to create an extension method on an IList<T>, you'd need to ToList your selection prior to calling Random, unless you moved that inside the extension.
public static List<T> Random<T>(this IList<T> list, int takeNumber)
{
return list.Shuffle().Take(takeNumber);
}
public static List<T> Shuffle<T>(this IList<T> list)
{
Random rng = new Random();
int n = list.Count;
while (n > 1) {
n--;
int k = rng.Next(n + 1);
T value = list[k];
list[k] = list[n];
list[n] = value;
}
return list;
}
This Shuffle Method is better than others answers. it will not cost any performance issue when chaining and it's still LINQ.
public static IEnumerable<T> TakeRandom<T>(this IEnumerable<T> source, int count)
{
return source.Shuffle().Take(count);
}
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
{
return source.OrderBy(x => Guid.NewGuid());
}
Usage:
var query = (from...
where...
select q).TakeRandom(1);

Why AsQueryable is so slow with Linq?

I faced a rather stupid performance issue in my code. After a small investigation, i have found that AsQueryable method i used to cast my generic list slows down the code up to 8000 times.
So the the question is, why is that?
Here is the example
class Program
{
static void Main(string[] args)
{
var c = new ContainerTest();
c.FillList();
var s = Environment.TickCount;
for (int i = 0; i < 10000; ++i)
{
c.TestLinq(true);
}
var e = Environment.TickCount;
Console.WriteLine("TestLinq AsQueryable - {0}", e - s);
s = Environment.TickCount;
for (int i = 0; i < 10000; ++i)
{
c.TestLinq(false);
}
e = Environment.TickCount;
Console.WriteLine("TestLinq as List - {0}", e - s);
Console.WriteLine("Press enter to finish");
Console.ReadLine();
}
}
class ContainerTest
{
private readonly List<int> _list = new List<int>();
private IQueryable<int> _q;
public void FillList()
{
_list.Clear();
for (int i = 0; i < 10; ++i)
{
_list.Add(i);
}
_q = _list.AsQueryable();
}
public Tuple<int, int> TestLinq(bool useAsQ)
{
var upperBorder = useAsQ ? _q.FirstOrDefault(i => i > 7) : _list.FirstOrDefault(i => i > 7);
var lowerBorder = useAsQ ? _q.TakeWhile(i => i < 7).LastOrDefault() : _list.TakeWhile(i => i < 7).LastOrDefault();
return new Tuple<int, int>(upperBorder, lowerBorder);
}
}
UPD As i understand, i have to avoid AsQueryable method as much as possible(if it's not in the line of inheritance of the container), because i'll get immediately performance issue
"and avoid the moor in those hours of darkness when the powers of evil are exalted"
Just faced the same issue.
The thing is that IQueryable<T> takes Expression<Func<T, Bool>> as parameter for filtering in Where()/FirstOrDefault() calls - as opposed of just the Func<T, Bool> pre-compiled delegate taken in simple IEnumerable's correspondent methods.
That means there will be a compile phase to transform the Expression into a delegate. And this costs quite a lot.
Now you need that in a loop (just I did)? You'll get in some trouble...
PS: It seems .NET Core/.NET 5 improves this significantly. Unfortunately, our projects are not there yet...
at least use LINQ with List too
manual implementation will always be faster than LINQ
EDIT
you know that both test doesn't give the same result
Because AsQueryable returns an IQueryable, which has a completely different set of extension methods for the LINQ standard query operators from the one intended for things like List.
Queryable collections are meant to have a backing store of an RDBMS or something similar, and you are building a different, more complex code expression tree when you call IQueryable.FirstOrDefault() as opposed to List<>.FirstOrDefault().

Resources