Performing linq search using substring - is there a better way? - linq

I'm using Entity Framework and have started writing queries using linq.
I have a Store table where each store has a name field. One of the searches I want to do is by initial letter and I'm trying to find the best way of achieving it. By best I mean most efficient.
Most searches only look for one key, 'A', 'B', 'C' etc but one of the searches is different in that the group '0-9' will contain a list of keys, one for 0, one for 1 etc. So my starting point is a list of some kind.
I then need to say if a Store name starts with any key in the list, because a Store does not store the initial letter in the table.
There are 2 things I'm looking for help with. Firstly how to get the linq working as I've outlined. Secondly, any advice as to whether there this is the best/only approach to bringing back the data or whether there may be a better way.

Your question is not very clear, but from what I understand you want to search for stores which begin with a key found in a list of keys. You can achieve that like this:
List<string> keys = new List<string>() { "A", "B", "M" };
var result = stores.Where(store => keys.Any(key => store.Name.StartsWith(key));

If the queries can differ, then for the letters:
stores.Where(s=>s.Name.First == c);
And for numerals:
stores.Where(char.IsDigit);
If they must be the same, then I suggest a character range:
stores.Where(s=> c1 <= s.Name.First && s.Name.First <= c2)
You can represent the ranges by Tuples if you want:
Tuple<char, char> range = Tuple.Create('A', 'A');
//Tuple<char, char> range = Tuple.Create('0', '9');
stores.Where(s=> range.Item1 <= s.Name.First && s.Name.First <= range.Item2)
EDIT: Using the function in the Entities Framework
Tuple<char, char> range = Tuple.Create('A', 'A');
//Tuple<char, char> range = Tuple.Create('0', '9');
stores.Where(s=> range.Item1 <= s.Name.Substring(0, 1) && s.Name.Substring(0, 1) <= range.Item2)

I asked a more specific question on another thread.
Here is the answer:
Linq to entities - first letter of string between 2 keys

Related

LINQ return records where string[] values match Comma Delimited String Field

I am trying to select some records using LINQ for Entities (EF4 Code First).
I have a table called Monitoring with a field called AnimalType which has values such as
"Lion,Tiger,Goat"
"Snake,Lion,Horse"
"Rattlesnake"
"Mountain Lion"
I want to pass in some values in a string array (animalValues) and have the rows returned from the Monitorings table where one or more values in the field AnimalType match the one or more values from the animalValues. The following code ALMOST works as I wanted but I've discovered a major flaw with the approach I've taken.
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
var result = from m in db.Monitorings
where animalValues.Any(c => m.AnimalType.Contains(c))
select m;
return result;
}
To explain the problem, if I pass in animalValues = { "Lion", "Tiger" } I find that three rows are selected due to the fact that the 4th record "Mountain Lion" contains the word "Lion" which it regards as a match.
This isn't what I wanted to happen. I need "Lion" to only match "Lion" and not "Mountain Lion".
Another example is if I pass in "Snake" I get rows which include "Rattlesnake". I'm hoping somebody has a better bit of LINQ code that will allow for matches that match the exact comma delimited value and not just a part of it as in "Snake" matching "Rattlesnake".
This is a kind of hack that will do the work:
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
var values = animalValues.Select(x => "," + x + ",");
var result = from m in db.Monitorings
where values.Any(c => ("," + m.AnimalType + ",").Contains(c))
select m;
return result;
}
This way, you will have
",Lion,Tiger,Goat,"
",Snake,Lion,Horse,"
",Rattlesnake,"
",Mountain Lion,"
And check for ",Lion," and "Mountain Lion" won't match.
It's dirty, I know.
Because the data in your field is comma delimited you really need to break those entries up individually. Since SQL doesn't really support a way to split strings, the option that I've come up with is to execute two queries.
The first query uses the code you started with to at least get you in the ballpark and minimize the amount of data you're retrieving. It converts it to a List<> to actually execute the query and bring the results into memory which will allow access to more extension methods like Split().
The second query uses the subset of data in memory and joins it with your database table to then pull out the exact matches:
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
// execute a query that is greedy in its matches, but at least
// it's still only a subset of data. The ToList()
// brings the data into memory, so to speak
var subsetData = (from m in db.Monitorings
where animalValues.Any(c => m.AnimalType.Contains(c))
select m).ToList();
// given that subset of data in the List<>, join it against the DB again
// and get the exact matches this time
var result = from data in subsetData
join m in db.Monitorings on data.ID equals m.ID
where data.AnimalType.Split(',').Intersect(animalValues).Any ()
select m;
return result;
}

Finding strings that are not in DB already

I have some bad performance issues in my application. One of the big operations is comparing strings.
I download a list of strings, approximately 1000 - 10000. These are all unique strings.
Then I need to check if these strings already exists in the database.
The linq query that I'm using looks like this:
IEnumerable<string> allNewStrings = DownloadAllStrings();
var selection = from a in allNewStrings
where !(from o in context.Items
select o.TheUniqueString).Contains(a)
select a;
Am I doing something wrong or how could I make this process faster preferably with Linq?
Thanks.
You did query the same unique strings 1000 - 10000 times for every element in allNewStrings, so it's extremely inefficient.
Try to query unique strings separately in order that it is executed once:
IEnumerable<string> allNewStrings = DownloadAllStrings();
var uniqueStrings = from o in context.Items
select o.TheUniqueString;
var selection = from a in allNewStrings
where !uniqueStrings.Contains(a)
select a;
Now you can see that the last query could be written using Except which is more efficient for the case of set operators like your example:
var selection = allNewStrings.Except(uniqueStrings);
An alternative solution would be to use a HashSet:
var set = new HashSet<string>(DownloadAllStrings());
set.ExceptWith(context.Items.Select(s => s.TheUniqueString));
The set will now contain the the strings that are not in the DB.

EF - Linq Expression and using a List of Ints to get best performance

So I have a list(table) of about 100k items and I want to retrieve all values that match a given list.
I have something like this.
the Table Sections key is NOT a primary key, so I'm expecting each value in listOfKeys to return a few rows.
List<int> listOfKeys = new List<int>(){1,3,44};
var allSections = Sections.Where(s => listOfKeys.Contains(s.id));
I don't know if it makes a difference but generally listOfKeys will only have between 1 to 3 items.
I'm using the Entity Framework.
So my question is, is this the best / fastest way to include a list in a linq expression?
I'm assuming that it isn't better to use another .NETICollection data object. Should I be using a Union or something?
Thanks
Suppose the listOfKeys will contain only small about of items and it's local list (not from database), like <50, then it's OK. The query generated will be basically WHERE id in (...) or WHERE id = ... OR id = ... ... and that's OK for database engine to handle it.
A Join would probably be more efficient:
var allSections =
from s in Sections
join k in listOfKeys on s.id equals k
select s;
Or, if you prefer the extension method syntax:
var allSections = Sections.Join(listOfKeys, s => s.id, k => k, (s, k) => s);

How to order integers according to size and track their positions by variable name

I have a program with multiple int variables where individual counts are added to the specific variable each time a set fail condition is encountered. I want the user to be able to track how many failures of each category they have encountered by a button click. I want to display the range on a datagridview in order from highest value integer down to lowest. I also need to display in the adjacent column the name of the test step that relates to the value. My plan was to use Array.sort to order the integers but i then lose track of their names so cant assign the adjacent string column. I tried using a hashtable but if i use the string as a key it sorts alphabetically not numerically and if i use the integer as a key i get duplicate entries which dont get added to the hash table. here is some of the examples i tried but they have the aforementioned problems. essentially i want to end with two arrays where the order matches the naming and value convention. FYI the variables were declared before this section of code, variables ending in x are the string name for the (non x) value of the integer.
Hashtable sorter = new Hashtable();
sorter[download] = downloadx;
sorter[power] = powerx;
sorter[phase] = phasex;
sorter[eeprom] = eepromx;
sorter[upulse] = upulsex;
sorter[vpulse] = vpulsex;
sorter[wpulse] = wpulsex;
sorter[volts] = voltsx;
sorter[current] = currentx;
sorter[ad] = adx;
sorter[comms] = commsx;
sorter[ntc] = ntcx;
sorter[prt] = prtx;
string list = "";
string[] names = new string[13];
foreach (DictionaryEntry child in sorter)
{
list += child.Value.ToString() + "z";
}
int[] ordered = new int[] { download, power, phase, eeprom, upulse, vpulse, wpulse, volts, current, ad, comms, ntc, prt };
Array.Sort(ordered);
Array.Reverse(ordered);
for (int i = 0; i < sorter.Count; i++)
{
int pos = list.IndexOf("z");
names[i] = list.Substring(0, pos);
list = list.Substring(pos + 1);
}
First question here so hope its not too longwinded.
Thanks
Use a Dictionary. And you can order it by the value : myDico.OrderBy(x => x.Value).Reverse(), the sort will be numerical descending. You just have to enumerate the result.
I hope I understand your need. Otherwise ignore me.
You want to be using a
Dictionary <string, int>
to store your numbers.I'm not clear on how you're displaying results at the end - do you have a grid or a list control?
You ask about usings. Which ones do you already have?
EDIT for .NET 2.0
There might be a more elegant solution, but you could implement the logic by putting your rows in a DataTable. Then you can make a DataView of that table and sort by whichever column you like, ascending or descending.
See http://msdn.microsoft.com/en-us/library/system.data.dataview(v=VS.80).aspx for example.
EDIT for .NET 3.5 and higher
As far as sorting a Dictionary by its values:
var sortedEntries = myDictionary.OrderBy(pair => pair.Value);
If you need the results to be a Dictionary, you can call .ToDictionary() on that. For reverse order, use .OrderByDescending(pair => pair.Value).

minimum value in dictionary using linq

I have a dictionary of type
Dictionary<DateTime,double> dictionary
How can I retrive a minimum value and key coresponding to this value from this dictionary using linq ?
var min = dictionary.OrderBy(kvp => kvp.Value).First();
var minKey = min.Key;
var minValue = min.Value;
This is not very efficient though; you might want to consider MoreLinq's MinBy extension method.
If you are performing this query very often, you might want to consider a different data-structure.
Aggregate
var minPair = dictionary.Aggregate((p1, p2) => (p1.Value < p2.Value) ? p1 : p2);
Using the mighty Aggregate method.
I know that MinBy is cleaner in this case, but with Aggregate you have more power and its built-in. ;)
Dictionary<DateTime, double> dictionary;
//...
double min = dictionary.Min(x => x.Value);
var minMatchingKVPs = dictionary.Where(x => x.Value == min);
You could combine it of course if you really felt like doing it on one line, but I think the above is easier to read.
var minMatchingKVPs = dictionary.Where(x => x.Value == dictionary.Min(y => y.Value));
You can't easily do this efficiently in normal LINQ - you can get the minimal value easily, but finding the key requires another scan through. If you can afford that, use Jess's answer.
However, you might want to have a look at MinBy in MoreLINQ which would let you write:
var pair = dictionary.MinBy(x => x.Value);
You'd then have the pair with both the key and the value in, after just a single scan.
EDIT: As Nappy says, MinBy is also in System.Interactive in Reactive Extensions.

Resources