Is there any better way to check if the same data is present in a table in .Net core 3.1? - linq

I'm pulling data from a third party api. The api runs multiple times in a day. So, if the same data is present in the table it should ignore that record, else if there are any changes it should update that record or insert a new record if anything new shows up in the json received.
I'm using the below code for inserting any new data.
var input = JsonConvert.DeserializeObject<List<DeserializeLookup>>(resultJson).ToList();
var entryset = input.Select(y => new Lookup
{
lookupType = "JOBCODE",
code = y.Code,
description = y.Description,
isNew = true,
lastUpdatedDate = DateTime.UtcNow
}).ToList();
await _context.Lookup.AddRangeAsync(entryset);
await _context.SaveChangesAsync();
But, after the first run, when the api runs again it's again inserting the same data in the table. As a result, duplicate entries are getting into table. To handle the same, I used a foreach loop as below before inserting data to the table.
foreach (var item in input)
{
if (!_context.Lookup.Any(r =>
r.code== item.Code))
{
//above insert code
}
}
But, the same doesn't work as expected. Also, the api takes a lot of time to run when I put a foreach loop. Is there a solution to this in .net core 3.1

List<DeserializeLookup> newList=new();
foreach (var item in input)
{
if (!_context.Lookup.Any(r =>
r.code== item.Code))
{
newList.add(item);
//above insert code
}
}
await _context.Lookup.AddRangeAsync(newList);
await _context.SaveChangesAsync();
It will be better if you try this way

I’m on my phone so forgive me for not being able to format the code in my response. The solution to your problem is something I actually just encountered myself while syncing data from an azure function and third party app and into a sql database.
Depending on your table schema, you would need one column with a unique identifier. Make this column a primary key (first step to preventing duplicates). Here’s a resource for that: https://www.w3schools.com/sql/sql_primarykey.ASP
The next step you want to take care of is your stored procedure. You’ll need to perform what’s commonly referred to as an UPSERT. To do this you’ll need to merge a table with the incoming data...on a specified column (whichever is your primary key).
That would look something like this:
MERGE
Table_1 AS T1
USING
Incoming_Data AS source
ON
T1.column1 = source.column1
/// you can use an AND / OR operator in here for matching on additional values or combinations
WHEN MATCHED THEN
UPDATE SET T1.column2= source.column2
//// etc for more columns
WHEN NOT MATCHED THEN
INSERT (column1, column2, column3) VALUES (source.column1, source.column2, source.column3);

First of all, you should decouple the format in which you get your data from your actual data handling. In your case: get rid of the JSon before you actually interpret the data.
Alas, I haven't got a clue what your data represents, so Let's assume your data is a sequence of Customer Orders. When you get new data, you want to Add all new orders, and you want to update changed orders.
So somewhere you have a method with input your json data, and as output a sequence of Orders:
IEnumerable<Order> InterpretJsonData(string jsonData)
{
...
}
You know Json better than I do, besides this conversion is a bit beside your question.
You wrote:
So, if the same data is present in the table it should ignore that record, else if there are any changes it should update that record or insert a new record
You need an Equality Comparer
To detect whether there are Added or Changed Customer Orders, you need something to detect whether Order A equals Order B. There must be at least one unique field by which you can identify an Order, even if all other values are of the Order are changed.
This unique value is usually called the primary key, or the Id. I assume your Orders have an Id.
So if your new Order data contains an Id that was not available before, then you are certain that the Order was Added.
If your new Order data has an Id that was already in previously processed Orders, then you have to check the other values to detect whether it was changed.
For this you need Equality comparers: one that says that two Orders are equal if they have the same Id, and one that says checks all values for equality.
A standard pattern is to derive your comparer from class EqualityComparer<Order>
class OrderComparer : EqualityComparer<Order>
{
public static IEqualityComparer<Order> ByValue = new OrderComparer();
... // TODO implement
}
Fist I'll show you how to use this to detect additions and changes, then I'll show you how to implement it.
Somewhere you have access to the already processed Orders:
IEnumerable<Order> GetProcessedOrders() {...}
var jsondata = FetchNewJsonOrderData();
// convert the jsonData into a sequence of Orders
IEnumerable<Order> orders = this.InterpretJsonData(jsondata);
To detect which Orders are added or changed, you could make a Dictonary of the already Processed orders and check the orders one-by-one if they are changed:
IEqualityComparer<Order> comparer = OrderComparer.ByValue;
Dictionary<int, Order> processedOrders = this.GetProcessedOrders()
.ToDictionary(order => order.Id);
foreach (Order order in Orders)
{
if(processedOrders.TryGetValue(order.Id, out Order originalOrder)
{
// order already existed. Is it changed?
if(!comparer.Equals(order, originalOrder))
{
// unequal!
this.ProcessChangedOrder(order);
// remember the changed values of this Order
processedOrder[order.Id] = Order;
}
// else: no changes, nothing to do
}
else
{
// Added!
this.ProcessAddedOrder(order);
processedOrder.Add(order.Id, order);
}
}
Immediately after Processing the changed / added order, I remember the new value, because the same Order might be changed again.
If you want this in a LINQ fashion, you have to GroupJoin the Orders with the ProcessedOrders, to get "Orders with their zero or more Previously processed Orders" (there will probably be zero or one Previously processed order).
var ordersWithTPreviouslyProcessedOrder = orders.GroupJoin(this.GetProcessedOrders(),
order => order.Id, // from every Order take the Id
processedOrder => processedOrder.Id, // from every previously processed Order take the Id
// parameter resultSelector: from every Order, with its zero or more previously
// processed Orders make one new:
(order, previouslyProcessedOrders) => new
{
Order = order,
ProcessedOrder = previouslyProcessedOrders.FirstOrDefault(),
})
.ToList();
I use GroupJoin instead of Join, because this way I also get the "Orders that have no previously processed orders" (= new orders). If you would use a simple Join, you would not get them.
I do a ToList, so that in the next statements the group join is not done twice:
var addedOrders = ordersWithTPreviouslyProcessedOrder
.Where(orderCombi => orderCombi.ProcessedOrder == null);
var changedOrders = ordersWithTPreviouslyProcessedOrder
.Where(orderCombi => !comparer.Equals(orderCombi.Order, orderCombi.PreviousOrder);
Implementation of "Compare by Value"
// equal if all values equal
protected override bool Equals(bool x, bool y)
{
if (x == null) return y == null; // true if both null, false if x null but y not null
if (y == null) return false; // because x not null
if (Object.ReferenceEquals(x, y) return true;
if (x.GetType() != y.GetType()) return false;
// compare all properties one by one:
return x.Id == y.Id
&& x.Date == y.Date
&& ...
}
For GetHashCode is one rule: if X equals Y then they must have the same hash code. If not equal, then there is no rule, but it is more efficient for lookups if they have different hash codes. Make a tradeoff between calculation speed and hash code uniqueness.
In this case: If two Orders are equal, then I am certain that they have the same Id. For speed I don't check the other properties.
protected override int GetHashCode(Order x)
{
if (x == null)
return 34339d98; // just a hash code for all null Orders
else
return x.Id.GetHashCode();
}

Related

Using LINQ to SELECT the SUM() of a subquery

I am trying to learn how to use LINQ to perform a query that yields the same result as this:
SELECT (
SELECT SUM(point)
FROM communitymemberpointfeature
WHERE communitymemberpointfeature.communitymemberid = communitymember.id
) AS points, communitymember.*
FROM communitymember
After browsing around the Internet, I constructed the following statement:
var list = (from pointFeature in communityMemberPointFeatureList
join member in communityMemberList on pointFeature.CommunityMemberId equals member.Id
group pointFeature by new { pointFeature.CommunityMemberId }
into grouping
select new
{
grouping,
points = grouping.Sum(row => row.Point)
}).ToList();
But this yielded a result like
[
{
points:7200,
grouping:[
{Id:1,Point:5000,FeatureId:1,CommunityMemberId:1},
{Id:2,Point:2200,FeatureId:1,CommunityMemberId:1},
],
}
...
]
What I really want is a result set like:
[
{points:7200,CommunityMemberId:1,firstname:'john',lastname:'blah' ....},
...
]
Can someone tell me what I did wrong?
Edit after comment added to the end
I can imagine you have problems translating your SQL into LINQ. When trying to write LINQ statements it is usually a lot easier to start from your requirements, instead of starting from a SQL statement.
It seems to me that you have a table with CommunityMembers. Every CommunityMember has a primary key in property Id.
Furthermore, every CommunityMember has zero or more CommunityMemberPointFeatures, namely those CommunityMemberPointFeatures with a foreign key CommunityMemberId that equals the primary key of the CommunityMember that it belongs to.
For example: CommunityMember [14] has all CommunityMemberPointFeatures that have a value CommunityMemberId equal to 14.
Requirement
If I look at your SQL, it seems to me that you want to query all CommunityMembers, each with the sum of property Point of all CommunityMemberPointFeatures of this CommunityMember.
Whenever you want to query "items with their zero or more subitems", like "Schools with their Students", "Customers with their Orders", "CommunityMembers with their PointFeatures", consider using GroupJoin.
A GroupJoin is in fact a Left Outer Join, followed by a GroupBy to make Groups of the Left item with all its Right items.
var result = dbContext.CommunityMembers // GroupJoin CommunityMembers
.GroupJoin(CommunityMemberPointFeatures, // With CommunityMemberPointFeatures
communityMember => communityMember.Id, // from every CommunityMember take the Id
pointFeature => pointFeature.CommunityMemberId, // from every CommunityMemberPointFeature
// take the CommunityMemberId
// Parameter ResultSelector: take every CommunityMember, with all its matching
// CommunityMemberPointFeatures to make one new object:
(communityMember, pointFeaturesOfThisCommunityMember) => new
{
// Select the communityMember properties that you plan to use:
Id = communityMember.Id,
Name = communityMember.Name,
...
// From the point features of this CommunityMember you only want the sum
// or property Point:
Points = pointFeaturesOfThisCommunityMember
.Select(pointFeature => pointFeature.Point)
.Sum(),
// However, if you want more fields, you can use:
PointFeatures = pointFeaturesOfThisCommunityMember.Select(pointFeature => new
{
Id = pointFeature.Id,
Name = pointFeature.Name,
...
// not needed, you know the value:
// CommunityMemberId = pointFeature.CommunityMemberId,
})
.ToList(),
});
Edit after comment
If you want, you can omit Selecting the values that you plan to use.
// Parameter ResultSelector:
(communityMember, pointFeaturesOfThisCommunityMember) => new
{
CommunityMember = communityMember,
PointFeatures = pointFeaturesOfThisCommunityMember.ToList(),
),
However, I would strongly advise against this. If CommunityMember [14] has a thousand PointFeatures, then every PointFeature will have a foreign key with a value 14. So you are transporting this value 14 1001 times. What a waste of processing power, not to mention all the other fields you plan not to use.
Besides: if you do this you violate against information hiding: whenever your tables changes internally, the result of this function changes. Is that what you want?

dynamic asc desc sort

I am trying to create table headers that sort during a back end call in nhibernate. When clicking the header it sends a string indicating what to sort by (ie "Name", "NameDesc") and sending it to the db call.
The db can get quite large so I also have back end filters and pagination built into reduce the size of the retrieved data and therefore the orderby needs to happen before or at the same time as the filters and skip and take to avoid ordering the smaller data. Here is an example of the QueryOver call:
IList<Event> s =
session.QueryOver<Event>(() => #eventAlias)
.Fetch(#event => #event.FiscalYear).Eager
.JoinQueryOver(() => #eventAlias.FiscalYear, () => fyAlias, JoinType.InnerJoin, Restrictions.On(() => fyAlias.Id).IsIn(_years))
.Where(() => !#eventAlias.IsDeleted);
.OrderBy(() => fyAlias.RefCode).Asc
.ThenBy(() => #eventAlias.Name).Asc
.Skip(numberOfRecordsToSkip)
.Take(numberOfRecordsInPage)
.List();
How can I accomplish this?
One way how to achieve this (one of many, because you can also use some fully-typed filter object etc or some query builder) could be like this draft:
Part one and two:
// I. a reference to our query
var query = session.QueryOver<Event>(() => #eventAlias);
// II. join, filter... whatever needed
query
.Fetch(#event => #event.FiscalYear).Eager
var joinQuery = query
.JoinQueryOver(...)
.Where(() => !#eventAlias.IsDeleted)
...
Part three:
// III. Order BY
// Assume we have a list of strings (passed from a UI client)
// here represented by these two values
var sortBy = new List<string> {"Name", "CodeDesc"};
// first, have a reference for the OrderBuilder
IQueryOverOrderBuilder<Event, Event> order = null;
// iterate the list
foreach (var sortProperty in sortBy)
{
// use Desc or Asc?
var useDesc = sortProperty.EndsWith("Desc");
// Clean the property name
var name = useDesc
? sortProperty.Remove(sortProperty.Length - 4, 4)
: sortProperty;
// Build the ORDER
order = order == null
? query.OrderBy(Projections.Property(name))
: query.ThenBy(Projections.Property(name))
;
// use DESC or ASC
query = useDesc ? order.Desc : order.Asc;
}
Finally the results:
// IV. back to query... call the DB and get the result
IList<Event> s = query
.List<Event>();
This draft is ready to do sorting on top of the root query. You can also extend that to be able to add some order statements to joinQuery (e.g. if the string is "FiscalYear.MonthDesc"). The logic would be similar, but built around the joinQuery (see at the part one)

LINQ return records where string[] values match Comma Delimited String Field

I am trying to select some records using LINQ for Entities (EF4 Code First).
I have a table called Monitoring with a field called AnimalType which has values such as
"Lion,Tiger,Goat"
"Snake,Lion,Horse"
"Rattlesnake"
"Mountain Lion"
I want to pass in some values in a string array (animalValues) and have the rows returned from the Monitorings table where one or more values in the field AnimalType match the one or more values from the animalValues. The following code ALMOST works as I wanted but I've discovered a major flaw with the approach I've taken.
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
var result = from m in db.Monitorings
where animalValues.Any(c => m.AnimalType.Contains(c))
select m;
return result;
}
To explain the problem, if I pass in animalValues = { "Lion", "Tiger" } I find that three rows are selected due to the fact that the 4th record "Mountain Lion" contains the word "Lion" which it regards as a match.
This isn't what I wanted to happen. I need "Lion" to only match "Lion" and not "Mountain Lion".
Another example is if I pass in "Snake" I get rows which include "Rattlesnake". I'm hoping somebody has a better bit of LINQ code that will allow for matches that match the exact comma delimited value and not just a part of it as in "Snake" matching "Rattlesnake".
This is a kind of hack that will do the work:
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
var values = animalValues.Select(x => "," + x + ",");
var result = from m in db.Monitorings
where values.Any(c => ("," + m.AnimalType + ",").Contains(c))
select m;
return result;
}
This way, you will have
",Lion,Tiger,Goat,"
",Snake,Lion,Horse,"
",Rattlesnake,"
",Mountain Lion,"
And check for ",Lion," and "Mountain Lion" won't match.
It's dirty, I know.
Because the data in your field is comma delimited you really need to break those entries up individually. Since SQL doesn't really support a way to split strings, the option that I've come up with is to execute two queries.
The first query uses the code you started with to at least get you in the ballpark and minimize the amount of data you're retrieving. It converts it to a List<> to actually execute the query and bring the results into memory which will allow access to more extension methods like Split().
The second query uses the subset of data in memory and joins it with your database table to then pull out the exact matches:
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
// execute a query that is greedy in its matches, but at least
// it's still only a subset of data. The ToList()
// brings the data into memory, so to speak
var subsetData = (from m in db.Monitorings
where animalValues.Any(c => m.AnimalType.Contains(c))
select m).ToList();
// given that subset of data in the List<>, join it against the DB again
// and get the exact matches this time
var result = from data in subsetData
join m in db.Monitorings on data.ID equals m.ID
where data.AnimalType.Split(',').Intersect(animalValues).Any ()
select m;
return result;
}

Best Practice Checking for duplicate rows before inserting list of items

I have a an array of objects that I want to enter into the database.
My method call looks like this.
public void Add(CardElement[] cardElements){
foreach (var cardElement in cardElements)
{
Data.Entry(cardElement).State = System.Data.EntityState.Added;
}
Data.SaveChanges();
}
The database table resembles this
MS SQL = Table mytable Columns a,b,c,d,e,f
Unique Constraint a,b,c
The data I want to insert resembles this.
var obj [] = new [] {
new MyObject () { a = 1, b =1, c = 1 },
new MyObject () { a = 1, b =1, c = 2 }
new MyObject () { a = 1, b =1, c = 3 }
};
So, I want to check the database for these three rows before I add them to the database.
I could do something like but I assume this should cause some extra trips to the database.
private bool checkExists()...
foreach (var cardElement in cardElements)
{
var exists = (from ce in Data.CardElements
where ce.CardId == cardElement.CardId
where ce.Area == cardElement.Area
where ce.ElementName == cardElement.ElementName
select ce).Any();
if(exists return true)
}
return false
So, how could I handle this more gracefully?
Is it even worth trying to accomplish this using linq?
Should I write some stored procedures for performance?
I agree that you should let the db make the decision.
Please have a look at using UPSERT as stated in this post
Why not just attempt the insert and let the database tell you if any unique constraint violations have occurred (using try/catch)?
The problem is that even if you query data somebody else can insert the record between your query and saving changes. You will still have to handle exception for violating unique constraint despite your additional queries - and yes, every check will do additional trip to database.
If your main concern is performance use stored procedure where you can additionally use table hint to lock table for inserts during initial check for existence.

Multiple Counts within a single query

I want a list of counts for some of my data (count the number of open.closed tasks etc), I want to get all counts inside 1 query, so I am not sure what I do with my linq statement below...
_user is an object that returns info about the current loggedon user
_repo is am object that returns an IQueryable of whichever table I want to select
var counters = (from task in _repo.All<InstructionTask>()
where task.AssignedToCompanyID == _user.CompanyID || task.CompanyID == _user.CompanyID
join instructions in _repo.GetAllMyInstructions(_user) on task.InstructionID equals
instructions.InstructionID
group new {task, instructions}
by new
{
task
}
into g
select new
{
TotalEveryone = g.Count(),
TotalMine = g.Count(),
TotalOpen = g.Count(x => x.task.IsOpen),
TotalClosed = g.Count(c => !c.task.IsOpen)
}).SingleOrDefault();
Do I convert my object to single or default? The exception I am getting is, this sequence contains more than one element
Note: I want overall stats, not for each task, but for all tasks - not sure how to get that?
You need to dump everything into a single group, and use a regular Single. I am not sure if LINQ-to-SQL would be able to translate it correctly, but it's definitely worth a try.
var counters = (from task in _repo.All<InstructionTask>()
where task.AssignedToCompanyID == _user.CompanyID || task.CompanyID == _user.CompanyID
join instructions in _repo.GetAllMyInstructions(_user) on task.InstructionID == instructions.InstructionID
group task by 1 /* <<=== All tasks go into one group */ into g select new {
TotalEveryone = task.Count(),
TotalMine = task.Count(), // <<=== You probably need a condition here
TotalOpen = task.Count(x => x.task.IsOpen),
TotalClosed = task.Count(c => !c.task.IsOpen)
}).Single();
From MSDN
Returns the only element of a sequence, or a default value if the
sequence is empty; this method throws an exception if there is more
than one element in the sequence.
You need to use FirstOrDefault. SingleOrDefault is designed for collections that contains exactly 1 element (or none).

Resources