Linq - How to prevent locks when bulk delete - linq

In my code logic, firstly i am deleting large records with multiple queries & then doing bulk insert.
Here is the Code :-
using (var scope = new TransactionScope())
{
using (var ctx = new ApplicationDbContext(schemaName))
{
// Delete
foreach (var item in queries)
{
// Delete queries - more than 30 - optimized already
ctx.Database.ExecuteSqlCommand(item);
}
// Bulk Insert
BulkInsert(ConnectionString, "Entry", "Entry", bulkTOEntry);
BulkInsert(ConnectionString, "WrongEntry", "WrongEntry", bulkWrongEntry);
}
scope.Complete();
}
The problem here is in the delete part. The delete queries are taking around 10 minutes. This results in the locking of the records, so this is holding the other users from fetching or manipulating records.
I have my code in the TransactionScope as if there is any error while deleting then it will roll back the whole transaction.
I have tried to delete the records in chunks through stored procedures, but that didn't helped here as there is still lock on the records due to the TransactionScope.
How to prevent locks on the records?
Sample of Delete Queries :-
DELETE FROM [Entry]
WHERE CompanyId = 1
AND EmployeeId IN (3, 4, 6, 7, 14, 17, 20, 21, 22,....100 more)
AND Entry_Date = '2016-12-01'
AND Entry_Method = 'I'

if you need to delete the employees in chunk you can split the list of employee with this
public static List<IEnumerable<T>> Partition<T>(this IEnumerable<T> source, int length)
{
var count = source.Count();
var numberOfPartitions = count / length + ( count % length > 0 ? 1 : 0);
List<IEnumerable<T>> result= new List<IEnumerable<T>>();
for (int i = 0; i < numberOfPartitions; i++)
{
result.Add(source.Skip(length*i).Take(length));
}
return result;
}
you can use this method to split the list to small chunks and delete them one chunk at a time so other user can use the table between chunks

Related

Ruby ActiveRecord retrieving new inserted records in batches?

This pertains to my previous query.
I am able to get batches of records from ActiveRecord using the following query:
Client.offset(15 * iteration).first(15)
I'm running into an issue whereby a user inputs a new record.
Say there are 100 records, and batches are of 15.
The first 6 iterations (90 records) and the last iteration (10 records) works. However, if a new entry comes in (making it a total of 101 records), the program fails as it will either:
a) If the iteration counter is set to increment, it will query for records beyond the range of 101, resulting in nothing being returned.
b) If the iteration counter is modified to increment only when an entire batch of 15 items are complete, then repeat the last 14 items plus 1 new item.
How do i go about getting newly posted records?
dart code:
_getData() async {
if (!isPerformingRequest) {
setState(() {
isPerformingRequest = true;
});
//_var represents the iteration counter - 0, 1, 2 ...
//fh is a helper method returning records in List<String>
List<String> newEntries = await fh.feed(_var);
//Count number of returned records.
newEntries.forEach((i) => _count++);
if (newEntries.isEmpty) {
..
}
} else {
//if number of records is 10, increment _var to go to the next batch of 10.
if(_count == 10) {
_var++;
_count = 0;
}
//but if count is not 10, stay with the same batch (but previously counted/display records will be shown again)
}
setState(() {
items.addAll(newEntries);
isPerformingRequest = false;
});
}
}

Slow loop over data table collection

I am updating values in an Excel workbook using values from a MySQL database. There are just eleven rows in the WorkbookMap list, and six DataTables in the RptValueSet DataSet. I've proven that the problem is in this loop, not in the communication with the database. Getting the results is fast, but writing them to the workbook is slow. The code below works fine when the DataTables in the DataSet are small - such as a 3-column, 7-row results set, and I receive the updated Excel workbook almost instantly. However, the loop slows down noticeably when the results set increases; a 3-column, 50-row DataTable results in a 7 - 10 second delay in returning the updated Excel workbook. I'm not sure I really need to put the DataTables into a collection, but that's the only way I could figure out how to iterate over them. Any tips on optimizing this loop would be much appreciated!
// Create a list to contain the destination for the data in the workbook
List<WorkbookMap> wbMap = new List<WorkbookMap>();
// Create a new data set to contain results from database
DataSet RptValuesSet = new DataSet();
// RptValuesSet populated from database here....
// Create a collection so we can loop thru the dataset
DataTableCollection RptValuesColl = RptValuesSet.Tables;
for (int i = 0; i < RptValuesColl.Count; i++)
{
DataTable tbl = RptValuesColl[i];
// Find the correct entry in the workbook map
for (int j = 0; j < wbMap.Count; j++)
{
if (wbMap[j].SPCall == tbl.TableName)
{
// Write the results to the correct location in the workbook
MovingColumnRef = wbMap[j].StartColumn;
for (int c = 1; c < tbl.Columns.Count; c++)
{
row = wbMap[j].StartRow; // start at the top row for each new column
for (int r = 0; r < tbl.Rows.Count; r++)
{
// Write the database value to the workbook given the sheetName and cell address
UpdateValue(wbMap[j].SheetName, MovingColumnRef + row, tbl.Rows[r][c].ToString(), 0, wbMap[j].String);
row++;
}
MovingColumnRef = IncrementColRef(MovingColumnRef);
}
}
}
}
Without delving deeply into your code. I noticed you said that you believe the slowness is coming from writing to the sheet. Try putting the data into an Array first before updating the workbook. E.g. you would write to the sheet like this. anchorRangeName is the name of a range which would be only one cell in the workbook.
private void WriteResultToRange(Excel.Workbook wb, string anchorRangeName, object[,] resultArray)
{
Excel.Range resultRange = GetRange(anchorRangeName, wb).get_Resize(resultArray.GetLength(0), resultArray.GetLength(1));
resultRange.Value2 = resultArray;
}
You'll still need to get the data from the database into an array.

How can I get the "actual" count of element in a IEnumerable?

If I wrote :
for (int i = 0; i < Strutture.Count(); i++)
{
}
and Strutture is an IEnumerable with 200 elements, IIS crash. That's because I see every time I do Strutture.Count() it executes all LINQ queries linked with that IEnumerable.
So, how can I get the "current" number of elements? I need a list?
"That's because I see every time I do Strutture.Count() it executes all LINQ queries linked with that IEnumerable."
Without doing such, how is it going to know how many elements there are?
For example:
Enumerable.Range(0,1000).Where(i => i % 2==0).Skip(100).Take(5).Count();
Without executing the LINQ, how could you know how many elements there are?
If you want to know how many elements there are in the source (e.g. Enumerable.Range) then I suggest you use a reference to that source and query it directly. E.g.
var numbers = Enumerable.Range(0,1000);
numbers.Count();
Also keep in mind some data sources don't really have a concept of 'Count' or if they do it involves going through every single item and counting them.
Lastly, if you're using .Count() repetitively [and you don't expect the value to actually change] it can be a good idea to cache:
var count = numbers.Count();
for (int i =0; i<count; i++) // Do Something
Supplemental:
"At first Count(), LINQ queries are executes. Than, for the next, it just "check" the value :) Not "execute the LINQ query again..." :)" - Markzzz
Then why don't we do that?
var query = Enumerable.Range(0,1000).Where(i => i % 2==0).Skip(100).Take(5).Count();
var result = query.ToArray() //Gets and stores the result!
result.Length;
:)
"But when I do the first "count", it should store (after the LINQ queries) the new IEnumerable (the state is changed). If I do again .Count(), why LINQ need to execute again ALL queries." - Markzzz
Because you're creating a query that gets compiled down into X,Y,Z. You're running the same query twice however the result may vary.
For example, check this out:
static void Main(string[] args)
{
var dataSource = Enumerable.Range(0, 100).ToList();
var query = dataSource.Where(i => i % 2 == 0);
//Run the query once and return the count:
Console.WriteLine(query.Count()); //50
//Now lets modify the datasource - remembering this could be a table in a db etc.
dataSource.AddRange(Enumerable.Range(100, 100));
//Run the query again and return the count:
Console.WriteLine(query.Count()); //100
Console.ReadLine();
}
This is why I recommended storing the results of the query above!
Materialize the number:
int number = Strutture.Count();
for (int i = 0; i < number; i++)
{
}
or materialize the list:
var list = Strutture.ToList();
for (int i = 0; i < list.Count; i++)
{
}
or use a foreach
foreach(var item in Strutture)
{
}

Best Practice Checking for duplicate rows before inserting list of items

I have a an array of objects that I want to enter into the database.
My method call looks like this.
public void Add(CardElement[] cardElements){
foreach (var cardElement in cardElements)
{
Data.Entry(cardElement).State = System.Data.EntityState.Added;
}
Data.SaveChanges();
}
The database table resembles this
MS SQL = Table mytable Columns a,b,c,d,e,f
Unique Constraint a,b,c
The data I want to insert resembles this.
var obj [] = new [] {
new MyObject () { a = 1, b =1, c = 1 },
new MyObject () { a = 1, b =1, c = 2 }
new MyObject () { a = 1, b =1, c = 3 }
};
So, I want to check the database for these three rows before I add them to the database.
I could do something like but I assume this should cause some extra trips to the database.
private bool checkExists()...
foreach (var cardElement in cardElements)
{
var exists = (from ce in Data.CardElements
where ce.CardId == cardElement.CardId
where ce.Area == cardElement.Area
where ce.ElementName == cardElement.ElementName
select ce).Any();
if(exists return true)
}
return false
So, how could I handle this more gracefully?
Is it even worth trying to accomplish this using linq?
Should I write some stored procedures for performance?
I agree that you should let the db make the decision.
Please have a look at using UPSERT as stated in this post
Why not just attempt the insert and let the database tell you if any unique constraint violations have occurred (using try/catch)?
The problem is that even if you query data somebody else can insert the record between your query and saving changes. You will still have to handle exception for violating unique constraint despite your additional queries - and yes, every check will do additional trip to database.
If your main concern is performance use stored procedure where you can additionally use table hint to lock table for inserts during initial check for existence.

Row number in LINQ

I have a linq query like this:
var accounts =
from account in context.Accounts
from guranteer in account.Gurantors
where guranteer.GuarantorRegistryId == guranteerRegistryId
select new AccountsReport
{
recordIndex = ?
CreditRegistryId = account.CreditRegistryId,
AccountNumber = account.AccountNo,
}
I want to populate recordIndex with the value of current row number in collection returned by the LINQ. How can I get row number ?
Row number is not supported in linq-to-entities. You must first retrieve records from database without row number and then add row number by linq-to-objects. Something like:
var accounts =
(from account in context.Accounts
from guranteer in account.Gurantors
where guranteer.GuarantorRegistryId == guranteerRegistryId
select new
{
CreditRegistryId = account.CreditRegistryId,
AccountNumber = account.AccountNo,
})
.AsEnumerable() // Moving to linq-to-objects
.Select((r, i) => new AccountReport
{
RecordIndex = i,
CreditRegistryId = r.CreditRegistryId,
AccountNumber = r.AccountNo,
});
LINQ to objects has this builtin for any enumerator:
http://weblogs.asp.net/fmarguerie/archive/2008/11/10/using-the-select-linq-query-operator-with-indexes.aspx
Edit: Although IQueryable supports it too (here and here) it has been mentioned that this does unfortunately not work for LINQ to SQL/Entities.
new []{"aap", "noot", "mies"}
.Select( (element, index) => new { element, index });
Will result in:
{ { element = aap, index = 0 },
{ element = noot, index = 1 },
{ element = mies, index = 2 } }
There are other LINQ Extension methods (like .Where) with the extra index parameter overload
Try using let like this:
int[] ints = new[] { 1, 2, 3, 4, 5 };
int counter = 0;
var result = from i in ints
where i % 2 == 0
let number = ++counter
select new { I = i, Number = number };
foreach (var r in result)
{
Console.WriteLine(r.Number + ": " + r.I);
}
I cannot test it with actual LINQ to SQL or Entity Framework right now. Note that the above code will retain the value of the counter between multiple executions of the query.
If this is not supported with your specific provider you can always foreach (thus forcing the execution of the query) and assign the number manually in code.
Because the query inside the question filters by a single id, I think the answers given wont help out. Ofcourse you can do it all in memory client side, but depending how large the dataset is, and whether network is involved, this could be an issue.
If you need a SQL ROW_NUMBER [..] OVER [..] equivalent, the only way I know is to create a view in your SQL server and query against that.
This Tested and Works:
Amend your code as follows:
int counter = 0;
var accounts =
from account in context.Accounts
from guranteer in account.Gurantors
where guranteer.GuarantorRegistryId == guranteerRegistryId
select new AccountsReport
{
recordIndex = counter++
CreditRegistryId = account.CreditRegistryId,
AccountNumber = account.AccountNo,
}
Hope this helps.. Though its late:)

Resources