Best Practice Checking for duplicate rows before inserting list of items

Best Practice Checking for duplicate rows before inserting list of items - linq

I have a an array of objects that I want to enter into the database.
My method call looks like this.
public void Add(CardElement[] cardElements){
foreach (var cardElement in cardElements)
{
Data.Entry(cardElement).State = System.Data.EntityState.Added;
}
Data.SaveChanges();
}
The database table resembles this
MS SQL = Table mytable Columns a,b,c,d,e,f
Unique Constraint a,b,c
The data I want to insert resembles this.
var obj [] = new [] {
new MyObject () { a = 1, b =1, c = 1 },
new MyObject () { a = 1, b =1, c = 2 }
new MyObject () { a = 1, b =1, c = 3 }
};
So, I want to check the database for these three rows before I add them to the database.
I could do something like but I assume this should cause some extra trips to the database.
private bool checkExists()...
foreach (var cardElement in cardElements)
{
var exists = (from ce in Data.CardElements
where ce.CardId == cardElement.CardId
where ce.Area == cardElement.Area
where ce.ElementName == cardElement.ElementName
select ce).Any();
if(exists return true)
}
return false
So, how could I handle this more gracefully?
Is it even worth trying to accomplish this using linq?
Should I write some stored procedures for performance?

I agree that you should let the db make the decision.
Please have a look at using UPSERT as stated in this post

Why not just attempt the insert and let the database tell you if any unique constraint violations have occurred (using try/catch)?

The problem is that even if you query data somebody else can insert the record between your query and saving changes. You will still have to handle exception for violating unique constraint despite your additional queries - and yes, every check will do additional trip to database.
If your main concern is performance use stored procedure where you can additionally use table hint to lock table for inserts during initial check for existence.

Related

Is there any better way to check if the same data is present in a table in .Net core 3.1?

I'm pulling data from a third party api. The api runs multiple times in a day. So, if the same data is present in the table it should ignore that record, else if there are any changes it should update that record or insert a new record if anything new shows up in the json received.
I'm using the below code for inserting any new data.
var input = JsonConvert.DeserializeObject<List<DeserializeLookup>>(resultJson).ToList();
var entryset = input.Select(y => new Lookup
{
lookupType = "JOBCODE",
code = y.Code,
description = y.Description,
isNew = true,
lastUpdatedDate = DateTime.UtcNow
}).ToList();
await _context.Lookup.AddRangeAsync(entryset);
await _context.SaveChangesAsync();
But, after the first run, when the api runs again it's again inserting the same data in the table. As a result, duplicate entries are getting into table. To handle the same, I used a foreach loop as below before inserting data to the table.
foreach (var item in input)
{
if (!_context.Lookup.Any(r =>
r.code== item.Code))
{
//above insert code
}
}
But, the same doesn't work as expected. Also, the api takes a lot of time to run when I put a foreach loop. Is there a solution to this in .net core 3.1

List<DeserializeLookup> newList=new();
foreach (var item in input)
{
if (!_context.Lookup.Any(r =>
r.code== item.Code))
{
newList.add(item);
//above insert code
}
}
await _context.Lookup.AddRangeAsync(newList);
await _context.SaveChangesAsync();
It will be better if you try this way

I’m on my phone so forgive me for not being able to format the code in my response. The solution to your problem is something I actually just encountered myself while syncing data from an azure function and third party app and into a sql database.
Depending on your table schema, you would need one column with a unique identifier. Make this column a primary key (first step to preventing duplicates). Here’s a resource for that: https://www.w3schools.com/sql/sql_primarykey.ASP
The next step you want to take care of is your stored procedure. You’ll need to perform what’s commonly referred to as an UPSERT. To do this you’ll need to merge a table with the incoming data...on a specified column (whichever is your primary key).
That would look something like this:
MERGE
Table_1 AS T1
USING
Incoming_Data AS source
ON
T1.column1 = source.column1
/// you can use an AND / OR operator in here for matching on additional values or combinations
WHEN MATCHED THEN
UPDATE SET T1.column2= source.column2
//// etc for more columns
WHEN NOT MATCHED THEN
INSERT (column1, column2, column3) VALUES (source.column1, source.column2, source.column3);

First of all, you should decouple the format in which you get your data from your actual data handling. In your case: get rid of the JSon before you actually interpret the data.
Alas, I haven't got a clue what your data represents, so Let's assume your data is a sequence of Customer Orders. When you get new data, you want to Add all new orders, and you want to update changed orders.
So somewhere you have a method with input your json data, and as output a sequence of Orders:
IEnumerable<Order> InterpretJsonData(string jsonData)
{
...
}
You know Json better than I do, besides this conversion is a bit beside your question.
You wrote:
So, if the same data is present in the table it should ignore that record, else if there are any changes it should update that record or insert a new record
You need an Equality Comparer
To detect whether there are Added or Changed Customer Orders, you need something to detect whether Order A equals Order B. There must be at least one unique field by which you can identify an Order, even if all other values are of the Order are changed.
This unique value is usually called the primary key, or the Id. I assume your Orders have an Id.
So if your new Order data contains an Id that was not available before, then you are certain that the Order was Added.
If your new Order data has an Id that was already in previously processed Orders, then you have to check the other values to detect whether it was changed.
For this you need Equality comparers: one that says that two Orders are equal if they have the same Id, and one that says checks all values for equality.
A standard pattern is to derive your comparer from class EqualityComparer<Order>
class OrderComparer : EqualityComparer<Order>
{
public static IEqualityComparer<Order> ByValue = new OrderComparer();
... // TODO implement
}
Fist I'll show you how to use this to detect additions and changes, then I'll show you how to implement it.
Somewhere you have access to the already processed Orders:
IEnumerable<Order> GetProcessedOrders() {...}
var jsondata = FetchNewJsonOrderData();
// convert the jsonData into a sequence of Orders
IEnumerable<Order> orders = this.InterpretJsonData(jsondata);
To detect which Orders are added or changed, you could make a Dictonary of the already Processed orders and check the orders one-by-one if they are changed:
IEqualityComparer<Order> comparer = OrderComparer.ByValue;
Dictionary<int, Order> processedOrders = this.GetProcessedOrders()
.ToDictionary(order => order.Id);
foreach (Order order in Orders)
{
if(processedOrders.TryGetValue(order.Id, out Order originalOrder)
{
// order already existed. Is it changed?
if(!comparer.Equals(order, originalOrder))
{
// unequal!
this.ProcessChangedOrder(order);
// remember the changed values of this Order
processedOrder[order.Id] = Order;
}
// else: no changes, nothing to do
}
else
{
// Added!
this.ProcessAddedOrder(order);
processedOrder.Add(order.Id, order);
}
}
Immediately after Processing the changed / added order, I remember the new value, because the same Order might be changed again.
If you want this in a LINQ fashion, you have to GroupJoin the Orders with the ProcessedOrders, to get "Orders with their zero or more Previously processed Orders" (there will probably be zero or one Previously processed order).
var ordersWithTPreviouslyProcessedOrder = orders.GroupJoin(this.GetProcessedOrders(),
order => order.Id, // from every Order take the Id
processedOrder => processedOrder.Id, // from every previously processed Order take the Id
// parameter resultSelector: from every Order, with its zero or more previously
// processed Orders make one new:
(order, previouslyProcessedOrders) => new
{
Order = order,
ProcessedOrder = previouslyProcessedOrders.FirstOrDefault(),
})
.ToList();
I use GroupJoin instead of Join, because this way I also get the "Orders that have no previously processed orders" (= new orders). If you would use a simple Join, you would not get them.
I do a ToList, so that in the next statements the group join is not done twice:
var addedOrders = ordersWithTPreviouslyProcessedOrder
.Where(orderCombi => orderCombi.ProcessedOrder == null);
var changedOrders = ordersWithTPreviouslyProcessedOrder
.Where(orderCombi => !comparer.Equals(orderCombi.Order, orderCombi.PreviousOrder);
Implementation of "Compare by Value"
// equal if all values equal
protected override bool Equals(bool x, bool y)
{
if (x == null) return y == null; // true if both null, false if x null but y not null
if (y == null) return false; // because x not null
if (Object.ReferenceEquals(x, y) return true;
if (x.GetType() != y.GetType()) return false;
// compare all properties one by one:
return x.Id == y.Id
&& x.Date == y.Date
&& ...
}
For GetHashCode is one rule: if X equals Y then they must have the same hash code. If not equal, then there is no rule, but it is more efficient for lookups if they have different hash codes. Make a tradeoff between calculation speed and hash code uniqueness.
In this case: If two Orders are equal, then I am certain that they have the same Id. For speed I don't check the other properties.
protected override int GetHashCode(Order x)
{
if (x == null)
return 34339d98; // just a hash code for all null Orders
else
return x.Id.GetHashCode();
}

Multiple rows update without select

An old question for Linq 2 Entities. I'm just asking it again, in case someone has came up with the solution.
I want to perform query that does this:
UPDATE dbo.Products WHERE Category = 1 SET Category = 5
And I want to do it with Entity Framework 4.3.1.
This is just an example, I have a tons of records I just want 1 column to change value, nothing else. Loading to DbContext with Where(...).Select(...), changing all elements, and then saving with SaveChanges() does not work well for me.
Should I stick with ExecuteCommand and send direct query as it is written above (of course make it reusable) or is there another nice way to do it from Linq 2 Entities / Fluent.
Thanks!

What you are describing isnt actually possible with Entity Framework. You have a few options,
You can write it as a string and execute it via EF with .ExecuteSqlCommand (on the context)
You can use something like Entity Framework Extended (however from what ive seen this doesnt have great performance)

You can update an entity without first fetching it from db like below
using (var context = new DBContext())
{
context.YourEntitySet.Attach(yourExistingEntity);
// Update fields
context.SaveChanges();
}

If you have set-based operations, then SQL is better suited than EF.
So, yes - in this case you should stick with ExecuteCommand.

I don't know if this suits you but you can try creating a stored procedure that will perform the update and then add that procedure to your model as a function import. Then you can perform the update in a single database call:
using(var dc = new YourDataContext())
{
dc.UpdateProductsCategory(1, 5);
}
where UpdateProductsCategory would be the name of the imported stored procedure.

Yes, ExecuteCommand() is definitely the way to do it without fetching all the rows' data and letting ChangeTracker sort it out. Just to provide an example:
Will result in all rows being fetched and an update performed for each row changed:
using (YourDBContext yourDB = new YourDBContext()) {
yourDB.Products.Where(p => p.Category = 1).ToList().ForEach(p => p.Category = 5);
yourDB.SaveChanges();
}
Just a single update:
using (YourDBContext yourDB = new YourDBContext()) {
var sql = "UPDATE dbo.Products WHERE Category = #oldcategory SET Category = #newcategory";
var oldcp = new SqlParameter { ParameterName = "oldcategory", DbType = DbType.Int32, Value = 1 };
var newcp = new SqlParameter { ParameterName = "newcategory", DbType = DbType.Int32, Value = 5 };
yourDB.Database.ExecuteSqlCommand(sql, oldcp, newcp);
}

Can I refer to items within a LINQ result set by index?

I'm trying to work with a LINQ result set of 4 tables retrieved with html agility pack. I'd like to process each one slightly differently by setting a variable for each (switch statement below), and then processing the rows within the table. The variable would ideally be the index for each of the tables in the set, 0 to 3, and would be used in the switch statement and to select the rows. I haven't been able to locate the index property, but I see it used in situations such as SelectChildNode.
My question is can I refer to items within a LINQ result set by index? My "ideal scenario" is the last commented out line. Thanks in advance.
var ratingsChgs = from table in htmlDoc.DocumentNode
.SelectNodes("//table[#class='calendar-table']")
.Cast<HtmlNode>()
select table;
String rtgChgType;
for (int ratingsChgTbl = 0; ratingsChgTbl < 4; ratingsChgTbl++)
{
switch (ratingsChgTbl)
{
case 0:
rtgChgType = "Upgrades";
break;
case 1:
rtgChgType = "Downgrades";
break;
case 2:
rtgChgType = "Coverage Initiated";
break;
case 3:
rtgChgType = "Coverage Reit/ Price Tgt Changed";
break;
//This is what I'd like to do.
var tblRowsByChgType = from row in ratingsChgs[ratingsChgTbl]
.SelectNodes("tr")
select row;
//Processing of returned rows.
}
}

ElementAt does what you're asking for. I don't recommend using it in your example, though, because each time you call it, your initial LINQ query will be executed. The easy fix is to have ratingsChgs be a List or Array.
You can also refactor out the switch statement. It is overkill when you only need to iterate through a list of items. Here is a possible solution:
var ratingsChgs = from table in htmlDoc.DocumentNode
.SelectNodes("//table[#class='calendar-table']")
.Cast<HtmlNode>()
select table;
var rtgChgTypeNames = new List
{
"Upgrades",
"Downgrades",
"Coverage Initiated",
"Coverage Reit/ Price Tgt Changed"
};
var changeTypes = ratingsChgs.Zip(rtgChgTypeNames, (changeType, name) => new
{
Name = name,
Rows = changeType.SelectNodes("tr")
});
foreach( var changeType in changeTypes)
{
var name = changeType.Name;
var rows = changeType.Rows;
//Processing of returned rows.
}
Also, why not store your rating change types in the HTML doc? It seems odd to have table information defined in the business logic.

Failed to batch insert in Subsonic3 with error "Must declare the scalar variable..."

I have met a problem about inserting multiple rows in a batch with Subsonic3. My development environment includes:
1. Visual Studio 2010, but use .NET 3.5
2. Active Record Mode in SubSonic 3.0.0.4
3. SQL Server 2005 express
4. Northwind sample database
I am using Active Reecord mode to insert mutiple "Product" into table "Products". If I insert the rows one by one, either call "aProduct.Add()" or call "Insert.Execute()" mutiple times (just like the codes below), it works fine.
private static Product[] CreateProducts(int count)
{
Product[] products = new Product[count];
for (int index = 0; index < products.Length; ++index)
{
products[index] = new Product
{
ProductName = string.Format("cheka-test-{0}", index.ToString()),
Discontinued = (index % 2 == 0),
};
}
return products;
}
private static void SucceedByMultiExecuteInsert()
{
Product[] products = CreateProducts(2);
// -------------------------------- prepare batch
NorthwindDB db = new NorthwindDB();
var inserts = from prod in products
select db.Insert.Into<Product>(x => x.ProductName, x => x.Discontinued).Values(prod.ProductName, prod.Discontinued);
// -------------------------------- batch insert
var selectAll = Product.All();
Console.WriteLine("--- before total rows = {0}", selectAll.Count().ToString());
foreach (Insert insert in inserts)
insert.Execute();
Console.WriteLine("+++ after inserting {0} rows, now total rows = {1}",
products.Length.ToString(), selectAll.Count().ToString());
}
but if I use "BatchQuery" like the codes below,
private static void FailByBatchInsert()
{
Product[] products = CreateProducts(2);
// -------------------------------- prepare batch
NorthwindDB db = new NorthwindDB();
BatchQuery batchquery = new BatchQuery(db.Provider, db.QueryProvider);
var inserts = from prod in products
select db.Insert.Into<Product>(x => x.ProductName, x => x.Discontinued).Values(prod.ProductName, prod.Discontinued);
foreach (Insert insert in inserts)
batchquery.Queue(insert);
// -------------------------------- batch insert
var selectAll = Product.All();
Console.WriteLine("--- before total rows = {0}", selectAll.Count().ToString());
batchquery.Execute();
Console.WriteLine("+++ after inserting {0} rows, now total rows = {1}",
products.Length.ToString(), selectAll.Count().ToString());
}
then it failed with the exception :
"
Unhandled Exception: System.Data.SqlClient.SqlException: Must declare the scalar variable "#ins_ProductName".
Must declare the scalar variable "#ins_ProductName".
"
Please give me some help to solve this problem. Many thanks.

I ran into this problem as well. If you look at the query it's attempting to run, you'll see it doing something like this (this isn't actual code but you'll get the point):
exec_sql N'insert into MyTable (SomeField) Values (#ins_SomeField)',N'#0 varchar(32)','#0=SomeValue'
For some reason it defines the parameters in the query with "#ins_"+FieldName but then passes the parameters as ordinals. I have yet to determine the pattern for why/when it does this but I've lost enough time during this dev cycle futzing with SubSonic to try and diagnose the problem properly.
The work-around I implemented will involve you downloading the 3.0.0.4 source from github and making a change on line 179 of Insert.cs.
Where it reads
ParameterName = _provider.ParameterPrefix + "ins_" + columnName.ToAlphaNumericOnly(),
Changing it to
ParameterName = _provider.ParameterPrefix + Inserts.Count.ToString(),
seemed to do the trick for me. I make no warranties about this solution for you, expressed or implied. It did work for me but your mileage may vary.
I should also note that there's similar logic around the "update" statements as well in Update.cs on lines 181 and 194 but I haven't had these give me problems... yet.
Honestly, I don't think SubSonic is ready for primetime and that's a shame because I really like how Rob set it up. That said, it's in my product for better or worse now so you make the best with what you got.

DataTable Query

I am new to LINQ. I am trying to find the rows that does not exists in the second data table.
report_list and benchmark both type are : DataTable. Both these datatables are being populated using OleDbCommand,OleDbDataAdapter. I am getting an error "Specified cast is not valid." in foreach ... loop. I would appreciate your help.
var result = from a in report_list.AsEnumerable()
where !(from b in benchmark.AsEnumerable()
select b.Field<int>("bench_id")
)
.Contains(a.Field<int>("BenchmarkID"))
select a;
foreach (var c in result)
{
Console.WriteLine(c.Field<string>("Name"));
}

I don't know if I understood your question. Are you trying to get the items that exists in the first table but not in the second?
var first = new string[] { "b", "c" };
var second = new string[] { "a", "c" };
//find the itens that exist in "first" but not in "second"
var q = from f in first
where !second.Contains(f)
select f;
foreach (var s in q) {
Console.WriteLine(s);
}
//Prints:
//b
I suggest you to make the inner query first, once it does not depend on the outer record.

From a in report_list
Group Join b in benchmark On a.bench_id Equals b.bench_id Into g = Group
Where g.Count = 0
Select a
Note that this is VB syntax.

My suspicion is that one of the fields you are comparing is not an integer in the database. I believe that the invalid cast exception is being thrown by one of the Field<int>() calls since that is one of the three different exceptions that this method can throw. See docs here.

Perhaps use the .Except() extension to get the set difference of the two sets?
(from b in benchmark.AsEnumerable()
select new { id = b.Field<int>("bench_id")}).Except(
from a in report_list.AsEnumerable()
select new {id = a.Field<int>("BenchmarkID")})
Not actually sure of the precise syntax, but that should work by taking the ids in benchmark, and then removing all equivalent ids in report_list, leaving only the ids that don't match. (I hope this is the order you were after...)
Note: This is also assuming that the above issue mentioned by tvanfosson isn't also a problem

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Best Practice Checking for duplicate rows before inserting list of items - linq

I agree that you should let the db make the decision. Please have a look at using UPSERT as stated in this post

Why not just attempt the insert and let the database tell you if any unique constraint violations have occurred (using try/catch)?

Related

Is there any better way to check if the same data is present in a table in .Net core 3.1?

Multiple rows update without select

Can I refer to items within a LINQ result set by index?

Failed to batch insert in Subsonic3 with error "Must declare the scalar variable..."

DataTable Query

Categories

Resources