Select single random item from list of items [duplicate] - linq

This question already has answers here:
Random row from Linq to Sql
(14 answers)
Closed 4 years ago.
I have a query that returns list of results. I want a single random item from that list.
var query = (from...
where...
select q).Random(1); //Something like this?
How to do this? Suppose there is 1 item i want that one to be selected. If there are more then i want one of those to be selected in a random order.

var query = (from...
where...
orderby Guid.NewGuid()
select q).First();

You could utilise the Shuffle example posted in Randomize a List in C# in conjunction with the LINQ Take method to create an extension method on an IList<T>, you'd need to ToList your selection prior to calling Random, unless you moved that inside the extension.
public static List<T> Random<T>(this IList<T> list, int takeNumber)
{
return list.Shuffle().Take(takeNumber);
}
public static List<T> Shuffle<T>(this IList<T> list)
{
Random rng = new Random();
int n = list.Count;
while (n > 1) {
n--;
int k = rng.Next(n + 1);
T value = list[k];
list[k] = list[n];
list[n] = value;
}
return list;
}

This Shuffle Method is better than others answers. it will not cost any performance issue when chaining and it's still LINQ.
public static IEnumerable<T> TakeRandom<T>(this IEnumerable<T> source, int count)
{
return source.Shuffle().Take(count);
}
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
{
return source.OrderBy(x => Guid.NewGuid());
}
Usage:
var query = (from...
where...
select q).TakeRandom(1);

Related

Compare each string in datatable with that of list takes longer time.poor performance

I have a datatable of 200,000 rows and want to validate each row with that of list and return that string codesList..
It is taking very long time..I want to improve the performance.
for (int i = 0; i < dataTable.Rows.Count; i++)
{
bool isCodeValid = CheckIfValidCode(codevar, codesList,out CodesCount);
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
List<Codes> tempcodes= codesList.Where(code => code.StdCode.Equals(codevar)).ToList();
if (tempcodes.Count == 0)
{
RetVal = false;
for (int i = 0; i < dataTable.Rows.Count; i++)
{
bool isCodeValid = CheckIfValidCode(codevar, codesList,out CodesCount);
}
}
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
List<Codes> tempcodes= codesList.Where(code => code.StdCode.Equals(codevar)).ToList();
if (tempcodes.Count == 0)
{
RetVal = false;
}
else
{
RetVal=true;
}
return bRetVal;
}
codelist is a list which also contains 200000 records. Please suggest. I used findAll which takes same time and also used LINQ query which also takes same time.
A few optimizations come to mind:
You could start by removing the Tolist() altogether
replace the Count() with .Any(), which returns true if there are items in the result
It's probably also a lot faster when you replace the List with a HashSet<Codes> (this requires your Codes class to implement HashCode and Equals properly. Alternatively you could populate a HashSet<string> with the contents of Codes.StdCode
It looks like you're not using the out count at all. Removing it would make this method a lot faster. Computing a count requires you to check all codes.
You could also split the List into a Dictionary> which you populate with by taking the first character of the code. That would reduce the number of codes to check drastically, since you can exclude 95% of the codes by their first character.
Tell string.Equals to use a StringComparison of type Ordinal or OrdinalIgnoreCase to speed up the comparison.
It looks like you can stop processing a lot earlier as well, the use of .Any takes care of that in the second method. A similar construct can be used in the first, instead of using for and looping through each row, you could short-circuit after the first failure is found (unless this code is incomplete and you mark each row as invalid individually).
Something like:
private bool CheckIfValidCode(string codevar, List<Codes> codesList)
{
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
return codes.Contains(codevar);
// or: return codes.Any(c => string.Equals(codevar, c, StringComparison.Ordinal);
}
If you're adamant about the count:
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
count = codes.Count(codevar);
// or: count = codes.Count(c => string.Equals(codevar, c, StringComparison.Ordinal);
return count > 0;
}
You can optimize further by creating the HashSet outside of the call and re-use the instance:
InCallingCode
{
...
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
for (/*loop*/) {
bool isValid = CheckIfValidCode(codevar, codes, out int count)
}
....
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
count = codes.Count(codevar);
// or: count = codes.Count(c => string.Equals(codevar, c, StringComparison.Ordinal);
return count > 0;
}

Replacing a foreach with LINQ

I have some very simple code that I'm trying to get running marginally quicker (there are a lot of these small types of call dotted around the code which seems to be slowing things down) using LINQ instead of standard code.
The problem is this - I have a variable outside of the LINQ which the result of the LINQ query needs to add it.
The original code looks like this
double total = 0
foreach(Crop c in p.Crops)
{
if (c.CropType.Type == t.Type)
total += c.Area;
}
return total;
This method isn't slow until the loop starts getting large, then it slows on the phone. Can this sort of code be moved to a relatively quick and simple piece of LINQ?
Looks like you could use sum: (edit: my syntax was wrong)
total = (from c in p.Crops
where c.CropType.Type == t.Type
select c.Area).Sum();
Or in extension method format:
total = p.Crops.Where(c => c.CropType.Type == t.Type).Sum(c => c.area);
As to people saying LINQ won't perform better where is your evidence? (The below is based on post from Hanselman? I ran the following in linqpad: (You will need to download and reference nbuilder to get it to run)
void Main()
{
//Nbuilder is used to create a chunk of sample data
//http://nbuilder.org
var crops = Builder<Crop>.CreateListOfSize(1000000).Build();
var t = new Crop();
t.Type = Type.grain;
double total = 0;
var sw = new Stopwatch();
sw.Start();
foreach(Crop c in crops)
{
if (c.Type == t.Type)
total += c.area;
}
sw.Stop();
total.Dump("For Loop total:");
sw.ElapsedMilliseconds.Dump("For Loop Elapsed Time:");
sw.Restart();
var result = crops.Where(c => c.Type == t.Type).Sum(c => c.area);
sw.Stop();
result.Dump("LINQ total:");
sw.ElapsedMilliseconds.Dump("LINQ Elapsed Time:");
sw.Restart();
var result2 = (from c in crops
where c.Type == t.Type
select c.area).Sum();
result.Dump("LINQ (sugar syntax) total:");
sw.ElapsedMilliseconds.Dump("LINQ (sugar syntax) Elapsed Time:");
}
public enum Type
{
wheat,
grain,
corn,
maize,
cotton
}
public class Crop
{
public string Name { get; set; }
public Type Type { get; set; }
public double area;
}
The results come out favorably to LINQ:
For Loop total: 99999900000
For Loop Elapsed Time: 25
LINQ total: 99999900000
LINQ Elapsed Time: 17
LINQ (sugar syntax) total: 99999900000
LINQ (sugar syntax) Elapsed Time: 17
The main way to optimize this would be changing p, which may or may not be possible.
Assuming p is a P, and looks something like this:
internal sealed class P
{
private readonly List<Crop> mCrops = new List<Crop>();
public IEnumerable<Crop> Crops { get { return mCrops; } }
public void Add(Crop pCrop)
{
mCrops.Add(pCrop);
}
}
(If p is a .NET type like a List<Crop>, then you can create a class like this.)
You can optimize your loop by maintaining a dictionary:
internal sealed class P
{
private readonly List<Crop> mCrops = new List<Crop>();
private readonly Dictionary<Type, List<Crop>> mCropsByType
= new Dictionary<Type, List<Crop>>();
public IEnumerable<Crop> Crops { get { return mCrops; } }
public void Add(Crop pCrop)
{
if (!mCropsByType.ContainsKey(pCrop.CropType.Type))
mCropsByType.Add(pCrop.CropType.Type, new List<Crop>());
mCropsByType[pCrop.CropType.Type].Add(pCrop);
mCrops.Add(pCrop);
}
public IEnumerable<Crop> GetCropsByType(Type pType)
{
return mCropsByType.ContainsKey(pType)
? mCropsByType[pType]
: Enumerable.Empty<Crop>();
}
}
Your code then becomes something like:
double total = 0
foreach(Crop crop in p.GetCropsByType(t.Type))
total += crop.Area;
return total;
Another possibility that would be even faster is:
internal sealed class P
{
private readonly List<Crop> mCrops = new List<Crop>();
private double mTotalArea;
public IEnumerable<Crop> Crops { get { return mCrops; } }
public double TotalArea { get { return mTotalArea; } }
public void Add(Crop pCrop)
{
mCrops.Add(pCrop);
mTotalArea += pCrop.Area;
}
}
Your code would then simply access the TotalArea property and you wouldn't even need a loop:
return p.TotalArea;
You might also consider extracting the code that manages the Crops data to a separate class, depending on what P is.
This is a pretty straight forward sum, so I doubt you will see any benefit from using LINQ.
You haven't told us much about the setup here, but here's an idea. If p.Crops is large and only a small number of the items in the sequence are of the desired type, you could build another sequence that contains just the items you need.
I assume that you know the type when you insert into p.Crops. If that's the case you could easily insert the relevant items in another collection and use that instead for the sum loop. That will reduce N and get rid of the comparison. It will still be O(N) though.

linq select a random row

I have a table called Quotes in linq-to-sql that contains 2 columns: author and quote. How do you select both columns of a random row?
Random rand = new Random();
int toSkip = rand.Next(0, context.Quotes.Count);
context.Quotes.Skip(toSkip).Take(1).First();
If you're doing Linq-to-Objects and don't need this to work on SQL, you can use ElementAt() instead of the more verbose Skip(toSkip).Take(1).First() :
var rndGen = new Random(); // do this only once in your app/class/IoC container
int random = rndGen.Next(0, context.Quotes.Count);
context.Quotes.ElementAt(random);
I did it something like this:
list.ElementAt(rand.Next(list.Count());
I stuck a bunch of random operations, including select and shuffle, as extension methods. This makes them available just like all the other collection extension methods.
You can see my code in the article Extending LINQ with Random Operations.
Here is one way to achieve what you want to do:
var quotes = from q in dataContext.Quotes select q;
int count = quotes.Count();
int index = new Random().Next(count);
var randomQuote = quotes.Skip(index).FirstOrDefault();
try it:
list.OrderBy(x => Guid.NewGuid()).Take(1).first();
1 First create a class with rend property
public class tbl_EmpJobDetailsEntity
{
public int JpId { get; set; }
public int rend
{
get
{
Random rnd = new Random();
return rnd.Next(1, 100);
}
}
}
2 Linq query
var rendomise = (from v in db.tbl_EmpJobDetails
select new tbl_EmpJobDetailsEntity
{
JpId=v.JpId
}).OrderBy(o=>o.rend);

LINQ Partition List into Lists of 8 members [duplicate]

This question already has answers here:
Split List into Sublists with LINQ
(34 answers)
Closed 10 years ago.
How would one take a List (using LINQ) and break it into a List of Lists partitioning the original list on every 8th entry?
I imagine something like this would involve Skip and/or Take, but I'm still pretty new to LINQ.
Edit: Using C# / .Net 3.5
Edit2: This question is phrased differently than the other "duplicate" question. Although the problems are similar, the answers in this question are superior: Both the "accepted" answer is very solid (with the yield statement) as well as Jon Skeet's suggestion to use MoreLinq (which is not recommended in the "other" question.) Sometimes duplicates are good in that they force a re-examination of a problem.
Use the following extension method to break the input into subsets
public static class IEnumerableExtensions
{
public static IEnumerable<List<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
{
List<T> toReturn = new List<T>(max);
foreach(var item in source)
{
toReturn.Add(item);
if (toReturn.Count == max)
{
yield return toReturn;
toReturn = new List<T>(max);
}
}
if (toReturn.Any())
{
yield return toReturn;
}
}
}
We have just such a method in MoreLINQ as the Batch method:
// As IEnumerable<IEnumerable<T>>
var items = list.Batch(8);
or
// As IEnumerable<List<T>>
var items = list.Batch(8, seq => seq.ToList());
You're better off using a library like MoreLinq, but if you really had to do this using "plain LINQ", you can use GroupBy:
var sequence = new[] {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
var result = sequence.Select((x, i) => new {Group = i/8, Value = x})
.GroupBy(item => item.Group, g => g.Value)
.Select(g => g.Where(x => true));
// result is: { {1,2,3,4,5,6,7,8}, {9,10,11,12,13,14,15,16} }
Basically, we use the version of Select() that provides an index for the value being consumed, we divide the index by 8 to identify which group each value belongs to. Then we group the sequence by this grouping key. The last Select just reduces the IGrouping<> down to an IEnumerable<IEnumerable<T>> (and isn't strictly necessary since IGrouping is an IEnumerable).
It's easy enough to turn this into a reusable method by factoring our the constant 8 in the example, and replacing it with a specified parameter.
It's not necessarily the most elegant solution, and it is not longer a lazy, streaming solution ... but it does work.
You could also write your own extension method using iterator blocks (yield return) which could give you better performance and use less memory than GroupBy. This is what the Batch() method of MoreLinq does IIRC.
It's not at all what the original Linq designers had in mind, but check out this misuse of GroupBy:
public static IEnumerable<IEnumerable<T>> BatchBy<T>(this IEnumerable<T> items, int batchSize)
{
var count = 0;
return items.GroupBy(x => (count++ / batchSize)).ToList();
}
[TestMethod]
public void BatchBy_breaks_a_list_into_chunks()
{
var values = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var batches = values.BatchBy(3);
batches.Count().ShouldEqual(4);
batches.First().Count().ShouldEqual(3);
batches.Last().Count().ShouldEqual(1);
}
I think it wins the "golf" prize for this question. The ToList is very important since you want to make sure the grouping has actually been performed before you try doing anything with the output. If you remove the ToList, you will get some weird side effects.
Take won't be very efficient, because it doesn't remove the entries taken.
why not use a simple loop:
public IEnumerable<IList<T>> Partition<T>(this/* <-- see extension methods*/ IEnumerable<T> src,int num)
{
IEnumerator<T> enu=src.getEnumerator();
while(true)
{
List<T> result=new List<T>(num);
for(int i=0;i<num;i++)
{
if(!enu.MoveNext())
{
if(i>0)yield return result;
yield break;
}
result.Add(enu.Current);
}
yield return result;
}
}
from b in Enumerable.Range(0,8) select items.Where((x,i) => (i % 8) == b);
The simplest solution is given by Mel:
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items,
int partitionSize)
{
int i = 0;
return items.GroupBy(x => i++ / partitionSize).ToArray();
}
Concise but slower. The above method splits an IEnumerable into chunks of desired fixed size with total number of chunks being unimportant. To split an IEnumerable into N number of chunks of equal sizes or close to equal sizes, you could do:
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items,
int numOfParts)
{
int i = 0;
return items.GroupBy(x => i++ % numOfParts);
}
To speed up things, a straightforward approach would do:
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items,
int partitionSize)
{
if (partitionSize <= 0)
throw new ArgumentOutOfRangeException("partitionSize");
int innerListCounter = 0;
int numberOfPackets = 0;
foreach (var item in items)
{
innerListCounter++;
if (innerListCounter == partitionSize)
{
yield return items.Skip(numberOfPackets * partitionSize).Take(partitionSize);
innerListCounter = 0;
numberOfPackets++;
}
}
if (innerListCounter > 0)
yield return items.Skip(numberOfPackets * partitionSize);
}
This is faster than anything currently on planet now :) The equivalent methods for a Split operation here

Linq to dataset select row based on max value of column

I have a dataset table, I want to group it by column MOID, and then within this group I want to select the row which has max value of column radi.
Can anybody show me how to do it via LINQ to dataset?
Although the solution posted by Barry should work (with a few fixes), it is sub-optimal : you don't need to sort a collection to find the item with the maximum value of a field. I wrote a WithMax extension method, which returns the item with the maximum value of the specified function :
public static T WithMax<T, TValue>(this IEnumerable<T> source, Func<T, TValue> selector)
{
var max = default(TValue);
var withMax = default(T);
bool first = true;
var comparer = Comparer<TValue>.Default;
foreach (var item in source)
{
var value = selector(item);
int compare = comparer.Compare(value, max);
if (compare > 0 || first)
{
max = value;
withMax = item;
}
first = false;
}
return withMax;
}
It iterates the collection only once, which is much faster than sorting it just to get the first item.
You can then use it as follows
var query =
from row in table.AsEnumerable()
group row by row.Field<int>("MOID") into g
select g.WithMax(r => r.Field<int>("radi"));
This is untested but I think something like this should work:
var qry = from m in [YourDataSource]
group p by m.MOID into grp
select grp.OrderByDescending(a => a.RADI).First();
this works with one query!
public static T WithMax<T, TValue>(this IEnumerable<T> source, Func<T, TValue> keySelector)
{
return source.OrderByDescending(keySelector).FirstOrDefault();
}

Resources