I have an in memory List of objects. I want to check if each one exists in a database and if not, set a bool property on that object to true.
Object
class Part
{
public bool NewPart { get; set; }
public string PartNumber { get; set; }
public string Revision { get; set; }
public string Description { get; set; }
}
List contains the collection of parts. For each part, if it exists in the database then NewPart should be set to FALSE, else TRUE. I'm looking for the most efficient way to do this as there are likely to be hundred of parts so I'm thinking that running a SQL query for each part may not be the most efficient method.
Ideas on the best way to achieve this appreciated.
It depends on which ORM you are using, but with Linq2Sql you can use a query like:
var query = from p in db.parts
where myList.Contains(p.PartNumber)
select p.PartNumber;
You can then use the IEnumerable returned to set your newPart field
As an alternative, if your ultimate goal is to do an Upsert type action, then check out this question and its answers Insert Update stored proc on SQL Server (needs SQL level implementation, not linq)
The following will only hit the database once.
var myList = (from r in parts select r.PartNumber).ToList();
var existingParts = (from r in dc.Parts
where myList.Contains(r.PartNumber) select r.PartNumber).ToList();
foreach(var r in parts)
r.NewPart = existingParts.Contains(r.PartNumber);
Note, the generated sql could very well be something like
SELECT PartNumber
FROM Parts Where PartNumber in (#p0, #p1, #p2, #p3 .... )
so this should work if the parts list of a hundred or so, but not if it is over 2100.
This is one of those cases where the most efficient approach depends upon the actual data.
The first obtains all partNums from the database:
HashSet<int> partNums = new HashSet<int>(from p in GetTable<DBPart> select p.PartNumber);
foreach(var p in parts)
p.NewPart = partNums.Contains(p.PartNumber);
The second queries the database with the relevant partNumbers:
HashSet<int> partNums = new HashSet<int>(
from p in GetTable<DBPart> where (from mp in parts select mp.PartNumber).Contains(p.PartNumber) select p.PartNumber);
foreach(var p in parts)
p.NewPart = partNums.Contains(p.PartNumber);
The former will be more efficient above a certain number of rows in the database, and less efficient above it, because the latter takes a longer time to build a more complicated query, but the former returns everything.
Another factor is the percentage of hits expected. If this number is relatively low (i.e. only a small number of the parts in the list will be in the database) then it could be more efficient to do:
Dictionary<int, Part> dict = partsSource.ToDictionary(p => p.PartNumber, p);
foreach(int pn in
from p in GetTable<DBPart> where (from kv in dict select kv.Key).Contains(p.PartNumber) select p.PartNumber);
dict[pn].NewPart = true;
Where partsSource is the means by which the List parts was obtained in the first place, here instead of obtaining a list, we obtain a dictionary, which makes for more efficient retrieval of those we want. However, it we're going to obtain parts as a list anyway, then we can't really gain here, as we use slightly more effort building the dictionary in the first place, than iterating through the list.
Related
I'm looking to factor out some common queries over several tables. In a very simple example all tables have a DataDate column, so I have queries like this:
let dtexp1 = query { for x in table1 do maxBy x.Datadate }
let dtexp2 = query { for x in table2 do maxBy x.Datadate }
Based on a previous question I can do the following:
let mkQuery t q =
query { for rows in t do maxBy ((%q) rows) }
let getMaxDt1 = mkQuery table1 (<# fun q -> q.Datadate #>)
let getMaxDt2 = mkQuery table2 (<# fun q -> q.Datadate #>)
I would be interested if there are any other solutions not using quotations. The reason being is that for more complicated queries the quotations and the splicing become difficult to read.
This for example won't work, obviously, as we don't know that x has property DataDate.
let getMaxDt t = query { for x in t do maxBy x.Datadate }
Unless I can abstract over the type of table1, table2, etc. which are generated by SqlProvider.
The answer very much depends on what kind of queries you need to construct and how static or dynamic they are. Generally speaking:
LINQ is great if they are mostly static and if you can easily list all the templates for all queries you'll need - the main nice thing is that it statically type checks the queries
LINQ is not so great when your query structure is very dynamic, because then you end up composing lots of quotations and the type checking sometimes gets into the way.
If your queries are very dynamic (including selecting the source dynamically), but are not too complex (e.g. no fancy groupings no fancy joins), then it might be easier to write code to generate SQL query from an F# domain model.
For your simple example, the query is really just a table name and aggregation:
type Column = string
type Table = string
type QueryAggregate =
| MaxBy of Column
type Query =
{ Table : Table
Aggregate : QueryAggregate }
You can then create your two queries using:
let q1 = { Table = "table1"; Aggregate = MaxBy "Datadate" }
let q2 = { Table = "table2"; Aggregate = MaxBy "Datadate" }
Translating those queries to SQL is quite simple:
let translateAgg = function
| MaxBy col -> sprintf "MAX(%s)" col
let translateQuery q =
sprintf "SELECT %s FROM %s" (translateAgg q.Aggregate) q.Table
Depending on how rich your queries can be, the translation can get very complicated, but if the structure is fairly simple then this might just be an easier alternative than constructing the query using LINQ. As I said, it's hard to say what will be better without knowing the exact use case!
I want to use DocumentDB to store roughly 200.000 documents of the same type. The documents each get an integer id field and I would like to retrieve them paged, in reverse order (highest id first).
So recently I found out there is no sorting for DocumentDB (see also DocumentDB - query result order). Perhaps it is better to go for a different database (such as RavenDB) however, time is pressing and I want to avoid the cost of switching to another database.
The question:
I have been looking at implementing my own sorted index of the documents on the client side (ASP Web API 2). I was thinking of creating a SortedList of key(id) and value(document.selflink). Then I could create a Getter with parameters for count, offset and a predicate to filter the documents. Below I added a quick example.
I just have the feeling this is a bad idea; either slow, costing too many resources or can be better done another way. So I am open for implementation suggestions...
public class SortableDocumentDbRepository
{
private SortedList _sorted = new SortedList();
private readonly string _sortedPropertyName;
private DocumentCollection ReadOrCreateCollection(string databaseLink) {
DocumentCollection col = base.ReadOrCreateCollection(databaseLink);
var docs = Client.CreateDocumentQuery(Collection.DocumentsLink)
.AsEnumerable();
lock (_sorted.SyncRoot) {
foreach (Document doc in docs) {
var propVal = doc.GetPropertyValue<string>(_sortedPropertyName);
if (propVal != null) {
_sorted.Add(propVal, doc.SelfLink);
}
}
}
return col;
}
public List<T> GetItems<T>(int count, int offset, Expression<Func<T, bool>> predicate) {
List<T> result = new List<T>();
lock (_sorted.SyncRoot) {
var values = _sorted.GetValueList();
for (int i = offset; i < _sorted.Count; i++) {
var queryable = predicate != null ?
Client.CreateDocumentQuery<T>(values[i].ToString()).Where(predicate) :
Client.CreateDocumentQuery<T>(values[i].ToString());
T item = queryable.AsEnumerable().FirstOrDefault();
if (item == null || item.Equals(default(T))) continue;
result.Add(item);
if (result.Count >= count) return result;
}
}
return result;
}
}
Microsoft has implemented Sorting:
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sql-query-reference#bk_orderby_clause
Example: SELECT * FROM c ORDER BY c._ts DESC
As you mentioned, order by unfortunately isn't implemented yet.
Your approach looks reasonable to me.
I see you are using a predicate to narrow the query result set (pulling 200,000 records for any DB will be costly).
Since it looks like you are looking to order by id - you can also look in to setting up a range index on id allowing you to perform range queries (e.g. < and >) on the id and further narrow the query result set. There is also a range index included by default on the _ts (timestamp) system property on documents that may also be helpful in this context.
See: http://azure.microsoft.com/en-us/documentation/articles/documentdb-indexing-policies/
I am trying to select some records using LINQ for Entities (EF4 Code First).
I have a table called Monitoring with a field called AnimalType which has values such as
"Lion,Tiger,Goat"
"Snake,Lion,Horse"
"Rattlesnake"
"Mountain Lion"
I want to pass in some values in a string array (animalValues) and have the rows returned from the Monitorings table where one or more values in the field AnimalType match the one or more values from the animalValues. The following code ALMOST works as I wanted but I've discovered a major flaw with the approach I've taken.
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
var result = from m in db.Monitorings
where animalValues.Any(c => m.AnimalType.Contains(c))
select m;
return result;
}
To explain the problem, if I pass in animalValues = { "Lion", "Tiger" } I find that three rows are selected due to the fact that the 4th record "Mountain Lion" contains the word "Lion" which it regards as a match.
This isn't what I wanted to happen. I need "Lion" to only match "Lion" and not "Mountain Lion".
Another example is if I pass in "Snake" I get rows which include "Rattlesnake". I'm hoping somebody has a better bit of LINQ code that will allow for matches that match the exact comma delimited value and not just a part of it as in "Snake" matching "Rattlesnake".
This is a kind of hack that will do the work:
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
var values = animalValues.Select(x => "," + x + ",");
var result = from m in db.Monitorings
where values.Any(c => ("," + m.AnimalType + ",").Contains(c))
select m;
return result;
}
This way, you will have
",Lion,Tiger,Goat,"
",Snake,Lion,Horse,"
",Rattlesnake,"
",Mountain Lion,"
And check for ",Lion," and "Mountain Lion" won't match.
It's dirty, I know.
Because the data in your field is comma delimited you really need to break those entries up individually. Since SQL doesn't really support a way to split strings, the option that I've come up with is to execute two queries.
The first query uses the code you started with to at least get you in the ballpark and minimize the amount of data you're retrieving. It converts it to a List<> to actually execute the query and bring the results into memory which will allow access to more extension methods like Split().
The second query uses the subset of data in memory and joins it with your database table to then pull out the exact matches:
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
// execute a query that is greedy in its matches, but at least
// it's still only a subset of data. The ToList()
// brings the data into memory, so to speak
var subsetData = (from m in db.Monitorings
where animalValues.Any(c => m.AnimalType.Contains(c))
select m).ToList();
// given that subset of data in the List<>, join it against the DB again
// and get the exact matches this time
var result = from data in subsetData
join m in db.Monitorings on data.ID equals m.ID
where data.AnimalType.Split(',').Intersect(animalValues).Any ()
select m;
return result;
}
I'm programming a search for a SQLite-database using C# and LINQ.
The idea of the search is, that you can provide one or more keywords, any of which must be contained in any of several column-entries for that row to be added to the results.
The implementation consists of several linq-queries which are all put together by union. More keywords and columns that have to be considered result in a more complicated query that way. This can lead to SQL-code, which is to long for the SQLite-parser.
Here is some sample code to illustrate:
IQueryable<Reference> query = null;
if (searchAuthor)
foreach (string w in words)
{
string word = w;
var result = from r in _dbConnection.GetTable<Reference>()
where r.ReferenceAuthor.Any(a => a.Person.LastName.Contains(word) || a.Person.FirstName.Contains(word))
orderby r.Title
select r;
query = query == null ? result : query.Union(result);
}
if (searchTitle)
foreach (string word in words)
{
var result = from r in _dbConnection.GetTable<Reference>()
where r.Title.Contains(word)
orderby r.Title
select r;
query = query == null ? result : query.Union(result);
}
//...
Is there a way to structure the query in a way that results in more compact SQL?
I tried to force the creation of smaller SQL-statments by calling GetEnumerator() on the query after every loop. But apparently Union() doesn't operate on data, but on the underlying LINQ/SQL statement, so I was generating to long statements regardless.
The only solution I can think of right now, is to really gather the data after every "sub-query" and doing a union on the actual data and not in the statement. Any ideas?
For something like that, you might want to use a PredicateBuilder, as shown in the chosen answer to this question.
I was looking at some linq to SQL code that is used for a paged table. In it, it needs to return a sub-set of records, and the total number of records in the database. The code looks like this:
var query = (from p in MyTable select new {p.HostCable, p.PatchingSet});
int total = query.ToList().Count;
query = query.Skip(5).Take(10);
I wanted to dig a bit into what happens when this executes, I see that 2 queries happen - one to get ALL rows from the db, and one to get the subset. Needless to say the performance implications of getting all the records is not good. I guess that the "ToList" forces the query to be executed, then the Count method runs against the entire List.
In re-factoring the statement to be more efficient - this is my improved version:
int total = MyTable.Count();
var query = (from p in MyTable select new {p.HostCable, p.PatchingSet}).Skip(5).Take(10);
This results in a SQL hit for a "Select count.." and then a SQL hit for the actual select of records.
Is this optimal, are there better solutions?
Thanks!
From what I've seen, this is not possible in LINQ to SQL without a bit of hacking.
I've found that this method works. A quick summary of this method: I convert the IQueryable object to a command object and modify the command text to include the total count in the result set. The original LINQ to SQL converter SQL query uses the ROW_NUMBER() OVER() syntax to page the rows, I just add COUNT(*) OVER() to get the total count.
Add this method to your DataContext class.
public IEnumerable<TWithTotal> ExecutePagedQuery<T, TWithTotal>(IQueryable<T> query, int pageSize, int pageNumber, out int count)
where TWithTotal : IWithTotal
{
var cmd = this.GetCommand(query.Skip(pageSize * pageNumber).Take(pageSize));
var commandText = cmd.CommandText.Replace("SELECT ROW_NUMBER() OVER", "SELECT COUNT(*) OVER() AS TOTALROWS, ROW_NUMBER() OVER");
commandText = "SELECT TOTALROWS AS TotalCount," + commandText.Remove(0, 6);
cmd.CommandText = commandText;
var reader = cmd.ExecuteReader();
var list = this.Translate<TWithTotal>(reader).ToList();
if (list.Count > 0)
count = list[0].TotalCount;
else
count = 0;
return list;
}
You'll have to create a new class that contains all the properties of the original object implementing the IWithTotal interface.
UPDATE: You can't mix mapped and unmapped columns in LINQ to SQL.
public interface IWithTotal
{
int TotalCount { get; set; }
}
public class Project : IWithTotal
{
public int TotalCount { get; set; }
public int ProjectID { get; set; }
public string Name { get; set; }
}
The DataContext.Translate has some requirements so ensure that your query satisfies those requirements if you have any issues.
Depending on the complexity of the query, you should be able to see an increase in performance. Below are metrics from a testing I did with a count query, a paged select query, and a paged select with count query. The percentages are the query cost relative to batch.
Count - 6%
Select with paging - 47%
Select with paging + Count - 48%
int total = query.Count();
Rather than selecting everything then doing a count of the objects in memory this will perform the count via a query and just return that number which should be a lot faster.
In response to your update, I don't think you'll get a much better way of doing it. It ultimately boils down into two queries, one to get the total number of records and another to get a subset of them.